How to download data from OpenStreetMap?
Maybe do you already know about my project of building a bike journey planner for the EuroVelo network? I am starting with route number 3, The Pilgrims’ Route, since I will test it myself this summer. And the first thing to do is to collect data about the itinerary.
I have just discussed about the different sources for the cycle track in Denmark and I mentionned that, compared to the other sources, OpenStreetMap has many advantages. OpenStreetMap? It is a sort of open-source Google Map, that anyone can modifify, and whose data is accessible to all. But how do you access to OpenStreetMap data exactly? It is actually not as simple as what I thought.
Easy access to raw data
Let me contradict myself immediately. Extracting all the data available on a small zone is astonishingly simple: go on OpenStreetMap ; navigate to the place you are interested in ; zoom until the city level (level 11) ; export. Damned simple, isn’t it?
Extracting all the data available of predefined region comes out quite handy too. You just connect to geofabrik.de, navigate until the level of your choiceThe main levels are continent and country. Some tailored zones exist, such Alps or British Isles. ; chose your favorite format ; download. But maybe we do not really need 17 Go of information for Europe.
Indeed, retreiving specific information – say, all the traffic lights, all the fast-foods, all the trees, etc. – over a wide region – say, bigger than a city – is much more difficult. And this is, sadly, my case: I want all the roads that form the Pilgrims’ Route over Danmark, or even better, Europe. Luckily, OpenStreetMap’s wiki is rich. Its Downloading data page lists all the possibilities for retreiving (un)filtered information. And in my case they advice the so-called Overpass API and one of its interface, overpass turbo.
Open Street Map basics
Let me skip the complexities of OpenStreetMap as well as of those of the language used inside the Overpass APIIf you are interested however, you are welcome to dive in OpenStreetMap’s wiki or to have a look at the Overpass API manual.. Instead, let me focus 2 minutes on what you have to know to understand the rest of this post.
First, you must accept that it is not so easy to organise information. OpenStreetMap has chosen one way, out of many. In this scheme, a cycle route is an object made of several road section (or lines), and a line is in turn a set of pointsIf the line was straight, two points would be enough to describe it: the beginning and the end. But in the general case, each route section is somewhat curved, and the feeling of roundness is given by the association of many, tiny straight lines. A line is then usually made of a lot of points.. But that would be too simple to call a spade a spade! In OpenStreetMap’s parlance, a point is called a node, a line a way and an object a relation.
Secondly, in OpenStreetMap’s scheme, each object, each line and each node has a unique number associated to them. It is done so for identifying exaclty what we are talking about. Cleverly, this number is called an identifier. In the data, it is shortenned to id
.
The third thing you must realise is that a point (a node) can belong to several lines (ways) and that, in turn, an object (a relation) is often made of several lines and points. For instance, a country is a set of borders (ways/lines) plus a capital (node/point). It would be absurd – and inefficient – to copy the points as many times as they appear in different objects, and OpenStreetMap cleverly decided to store only the relationship between the items of different categoriesWhen you create a way (a line), OpenStreetMap in reality only creates a set of nodes (points), and stores only the fact that the way is made of those..
The last thing you have to understand, is that information can be found only at the highest possible relevant level. In my case, the name of the cycle route is obviously stored at the relation level – it would make no sense to duplicate it on every single road section of the itinerary – whereas the presence of a dedicated cycle lane and its condition is stored at the way level – indeed it may change from one section to the next. And how is it stored? It is a tag, attached the object, with a label and a value.
Advanced access to filtered information
So now, how would you ask overpass turbo to select all the relations with name “EuroVelo 3”? You would simply look for all relations possessing a tag with label name
and with value EuroVelo 3
. Under Overpass API’s conventions, this is written like this:
relation[name~"EuroVelo 3"]; out;
Magic or not, the first item in the list is the track of The Pilgrims’ Route, or at least what exists of it on OpenStreetMap. The complete route is a relation with identifier 299546
, and it has its own page on the OpenStreetMap’s wiki.
According to the automatic page for relation 299546
, the European track is made of one sub-track by country crossed (Norway, Danmark, Sweden, Germany, Belgium, France and Spain), except for France where 11 sections (!) compete each other in the greatest confusionFour of them are sub-relations of relation 299546
. One of them (number 2345035
) contains 8 tiny sub-sub-relations created by the same matryoshkaholic mapper, while an other (number 2888487
) is perfectly redundant to the one of the last two (number 2345035
)..
But so far, we still did not access the actual itinerary. We just accessed the highest layer, the layer containing relations. To actually retreive coordinates, we must explicitly ask overpass turbo to get down to the way level, then to the node level, and return the information at this level. Using the API’s esoteric punctuation, such a request looks like the following:
relation(299546);
(._;>>;);
out;
That’s it, we’re done! We can either watch the data in place, or download the data as geoJSON
using the export button, transform it slightly with and display it locally as a mapWe will see the details in an other post.. Obviously, some clean-up in France in needed.