How to Count the Number of Buildings in an Area by Category using OpenStreetMap API?
How many shops are there in the area? What’s the number of restaurants in 250 meters radius from here? Are there any office buildings around? What about the hospitals? These questions are crucial for many industries such as real estate and this article shows how to answer them using OpenStreetMap data in Python.
Using OpenStreetMap data from Python is actually quite easy because of the API called Overpass Turbo that can be used to query geospatial data from OpenStreetMap. It can be used in a browser on this website but there is also a library called overpy that allows executing Overpass queries from Python.
The only issue with these tools is that they use a specific syntax, which one needs to learn to use them. There are many articles on OpenStreetMap wiki teaching it but in this article, I’ll simply show you how to use the API for a very specific use-case: counting buildings in an area.
To start with, we only need two details: location and radius. The location is expressed in geographic coordinates, namely latitude and longitude. In case you start with an address or a name of a place, you’ll need to perform a process called geocoding first, i.e. changing it to geographic coordinates. There are many geocoding APIs and libraries, my favourite one is the free API called OpenCage. The radius is then expressed in meters away from the given location.
What’s also important to know is how the data are stored in OpenStreetMaps. In general, all the elements on their maps are expressed using a collection of points, each one with a single latitude and longitude. These points are called nodes and ways where nodes are individual points generally used to mark places such as individual shops while ways are sets of points that describe shapes of buildings, roads etc. The images below show the difference between nodes and ways.
We will generally be interested in nodes rather than ways because nodes are usually enough when it comes to counting buildings. Below is a sample code showing how to return all nodes within 500 meters of a certain location and count them in Python. Note that the entire query needs to be passed as a string.
We have the number of nodes but there is a problem with this approach. As you can see in the images above, nodes can describe many things. Sometimes it’s shops but in other cases, it can be edges of buildings or even trees (can you see the nodes with green shading?). What I found to be a better approach is to filter the nodes leaving only those which have any tags. Usually, tags are added only for more important nodes i.e. nodes with a name. This approach isn’t 100% accurate but is pretty good, straight-forward and simple to implement:
Counting by category
Let’s get to the point, how to count, e.g. the number of shops in the area? See, if a node has tags, these tags have a certain structure. They are called Map Features and can be found in this huge wiki article. They are generally divided into categories (keys) and sub-categories (values) and using them we can filter the nodes based on which buildings we’re interested in. Sometimes it’s enough to use just one feature while for other types of places we may need several of them. The code snippets below present some examples of queries that I used.
However, whenever using the OpenStreetMap data remember that they are community-created. Similar to Wikipedia, anyone can edit points in OpenStreetMap and while it doesn’t necessarily mean that the data are wrong, the thing is that not everyone sticks to the tagging conventions. This is why you need to be very careful and firstly see how the nodes are usually tagged in your area. A good way to do it is to use Overpass API in the browser, query all the nodes in a certain area and then click on them to see the tags.
And before we move to the examples, you may be wondering about something else. Why use OpenStreetMap and not e.g. Google Maps. After all the latter is not fully community based and surely more consistent. The key problem is that Google Maps API that could be used for this purpose isn’t nearly as powerful as Overpass API. The one I’m talking about is called Google Maps Places API, in particular, its part called Nearby Search Request. The three main limitations of it are that it’s paid (Overpass API is completely free), it only supports very few categories of places, significantly less than Overpass (it can be partially solved by using correct keywords), and finally, it only returns up to 60 results (spread across several pages), which frequently is way too few to count all relevant places.
Shops
Shops are quite straightforward to query because most of them are captured by a single key ‘shop’. There are many values (sub-categories) that we can add to this key to specify types of shops, but to count all the shops, we can just use the key and not specify the value at all. However, I found a few other tagging conventions rarely used for shops, in particular, ‘building = retail’, ‘building = supermarket’ and a specific one for pharmacies (if you want to count them as shops) ‘healthcare = pharmacy’. The complete query and code look like this:
Restaurants / Cafes
I wanted to capture all places where one can get food as one category but based on how the tags are constructed, it is possible to count restaurants separately from cafes, pubs or even bars or fast foods.
Offices
I found counting the offices and office buildings to be the hardest task. It’s just something that you don’t find that easy on Google Maps or OpenStreetMap and while you can find it using the name of a company, this approach doesn’t work when counting because you never know if the name of a company in a certain location denotes its office or maybe a shop. This is unfortunate because, in some industries, such as parking, it is crucial to know where and how many offices there are. Using OpenStreetMap for this purpose is rather inaccurate because only a small number of offices are tagged. The thing is that it’s similar to Google Maps only there you can also use the keyword ‘office’ (in local language) to improve the results. Nevertheless, I still use OpenStreetMap for this and whenever I can, I switch to national databases of offices (it’s for another story). Here are tagging conventions I found (I’m also using ways to raise my chances of finding everything).
Other categories
There are also a few more categories of buildings and places that were interesting to me and so I found the tagging conventions for them. Here is a list of sample queries for them.
API Limitations
I talked already about the limitations of the Google Maps Places API and the inconsistencies in Overpass API, but are there any other limitations? I said also that Operpass is completely free, so is using OpenStreetMap data and this is true. However, it doesn’t mean that you can use the API without any limits because it has a maximum capacity. Normally, Overpass API assigns a certain limit of queries to each user but from what I know this number depends on the overall demand for the API and changes with time. If one exceeds their limit, the API tends to stop returning data and the solution I found the best in these cases is to wait for 30 seconds before calling the API again. This is why whenever I write a script where I count the buildings for multiple locations, I always add a delay for when the API stops working.