User Tools

Site Tools



OpenStreetMap2R with osmar

Things we cover

There are a lot of different methods to extract data from the increasing OpenStreetMap (OSM) spatial database. To decide which one is reliable depends basically on what you want do do with the data in R. Or more technically spoken, how you need the data to be represented in R. Additionally you have to take care how to deal with factor variables, missing values of attributes according to the hierarchy of the OSM data model.

The tutorial is focused on the osmar package and shows a common way to extract, project, and convert point data (it is very similar for lines and polygons) from a arbitrary area to data frames and spatial data objects. If you want to use OSM data you are highly recommended to first get an idea how the OSM data structure looks like.

We will cover:

  • how to initialize and use osmar
  • how to identify a region of interest
  • how to identify the correct tags (pair of valid key and value) of our interest (peak)
  • how to extract and manipulate point data
  • how to convert this to the common R structure

Things you need

  • sp is a package that provides classes and methods for spatial data manipulation. Utility functions are provided, e.g. for plotting data as maps, spatial selection, as well as methods for retrieving coordinates, for subsetting, print, summary, etc.

    if (!require(sp)){install.packages(sp)}
  • The osmar package integrates the OpenStreetMap project into the R world. It provides methods to access OpenStreetMap data from different sources, to enable working with the OSM data in the familiar R idiom, and to convert the data for using with existing R packages.

    if (!require(osmar)){install.packages(osmar)}
  • The raster package is the major player for dealing with spatial rasterized data. It is for all purposes the agent of choice. It is incredible powerful and comprehensive dealing with “reading, writing, manipulating, analyzing and modeling of gridded spatial data. The package implements basic and high-level functions and processing of very large files is supported”

    if (!require(raster)){install.packages(raster)}

Things to do

OpenStreetMap is an growing an incredible useful source for spatial vector data. In this tut we want to use the available information of peaks. For a deeper understanding of why peaks, you may have a look at the Advanced GIS Course especially lecture nine.

Get OSM data of a roi

Let's retrieve all OSM data within a region of interest (roi) using the osmar package. To do so it is helpful to understand the basic data model that is used for the OSM data. OSM data is organized according to the vector data model with the well known data types as nodes (=points), ways (=lines), relations (=polygons). As you probably already know because of the dimensionality of the data you have to deal with each of the vector classes by its own at the same time.

To derive data we need a boxed area or a center point and extend. Due to the fact that OSM data is stored without projection in “geographic coordinates” (that means as spherical coordinates) we have to provide geographical coordinates (i.e. longitude and latitude) for the required region. In this example we choose a region within the “Stubaier Alpen”.

# define the spatial extend of the OSM data we want to retrieve
osm.extend <- corner_bbox(11.35149, 47.10107, 11.39610, 47.13505)
# download all osm data inside this area, note we have to declare the api interface with source
osm <- get_osm(osm.extend, source = osmsource_api())
# have a look at what we have got so far nodes(=points)
# you may also want to plot the generic osm download

Short excursion into the OSM world

If we look at the OSM tagging system, we find that id is the element's id, k is the the OSM variables key and v is the assigned value of that key/id combination.

Now we want to extract the available peak data. If we dive a bit deeper into the mapping world of OSM you will find excellent help. You may look at he landforms or if you are coming from the the outdoor user community have a look at the hiking page. You see that natural is the key for all tags dealing with “natural physical land features, including the structures that have been modified by humans.1)”. The value peak is described as “The top (summit) of a hill or mountain.”2). You will also find the suggested useful combination with

  • name=*
  • ele=*

That means the key terms ele and name are usually combined with natural=peak for identifying the name and altitude of the peak.

Getting the correct keys and values

To extract this data we use the find function. Let us take a subset of peak names and altitudes in the area and plot them.

## extract the data you want to get hold of in our case 'peaks'
# two steps for identifying all tupels of a required item are obligatory
# if we look for peaks we have to to it like this:
# find the first attribute "peak" <- find(osm, node(tags(k == "natural" & v == "peak")))
# find downwards (according to the osmar object level hierarchy) 
# all other items that have the same attributes
# NOTE for points only this is not necessary at all but we do it to keep in the concept
all.peak <- find_down(osm, node(
### to keep clear we make subsets of all needed data
# main subset contains all peak data
p.all <- subset(osm, node_ids = all.peak$node_ids)
# plot this subset locations 
plot_nodes(p.all, add = T, col = "red", lwd = 3)
# now we need to extract the corresponding variables and values separately
# create sub-subsets of the tags 'name' and 'ele' and attrs 'lon' , 'lat' <- subset(p.all$nodes$tags,(k=='name' ))
peak.alt <- subset(p.all$nodes$tags,(k=='ele' ))
peak.coords <- subset(p.all$nodes$attrs[c('id',"lon", "lat")],)

Converting the data as we need it

Now we merge subsets to a data frame. Next step is to convert them it into a SpatialPointsDataFrame, to project it towards the official Tyrolean projection system (EPSG 31254). Additionally we will plot the data and reconvert the projected data back to a data frame. Note this is just for demonstration purposes and not all of the steps are necessary.

# merge the data into a consistent data frames
tmp.merge<- merge(,peak.coords, by="id",all.x=TRUE)
tmp.merge<- merge(tmp.merge,peak.alt, by="id",all.x=TRUE)
# clean the df and rename the cols
p.list <- tmp.merge[c('lon','lat','v.x','v.y')]
colnames(p.list) <- c('lon','lat','name','altitude')
# convert the altitudes values from level to numeric
# convert the df to a SpatialPoints object
osm.peak<-SpatialPointsDataFrame(data.frame(p.list$lon,p.list$lat),data.frame(p.list$name,p.list$altitude),proj4string =  CRS("+init=epsg:4326"))
# have a look at the structure
# project the  SpatialPoints from geographical coordinates to EPSG 31254
osm.peak<-spTransform(osm.peak, CRS("+init=epsg:31254"))
# if you need the spatial data object as a data frame again you can easily reconvert it
df.peak <-

Plotting the vector data

There are a lot of possibilities to visualize, plot and map the the data. You can have a look at visualization , Advanced visualization, Spatial data I/O, Maps and the related examples for further information. Below you will find some simple basics.

# you can use the plot function from the raster package
# additionally you can use the generic osmar plot functions 
# in this case "plot_nodes" for the osmar subset p.all
plot_nodes(p.all, add = T, col = "red", lwd = 3)
# for more options you can also use spplot package
spplot(osm.peak, zcol="p.list.altitude", colorkey = TRUE)
# or more styled
 spplot(osm.peak, zcol = "p.list.altitude" ,col.regions = c("#F7FCF5", "#E5F5E0", "#C7E9C0", "#A1D99B", "#74C476", "#41AB5D", "#238B45", "#005A32"), cuts = c(2000, 2200, 2400, 2600, 2800, 3000, 3200), colorkey = TRUE)

Things of further interest

For deeper knowledge and further examples please have a look at the following resources:

Christoph Reudenbach 2014/12/27 11:39

r/r-tutorials/osmar.txt · Last modified: 2018/12/23 19:46 (external edit)