Importing and Visualizing World Bank Indicator Data in R
The World Bank aggregates a vast array of country-level data as indicators available from the World Bank Open Data portal. By joining that data with country polygons, you can create choropleth maps and perform geospatial analysis.
- Search for an indicator of interest on data.worldbank.org. The variable used in this example is Fertilizer consumption (kilograms per hectare of arable land)
- Click the Download button, download a CSV file, and unzip it to your hard drive
- Review the CSV file in a spreadsheet program to find the latest year with adequate data for your purposes
- Download and unzip this shapefile of country polygons
- Modify the script below to join your World Bank data to the country polygons and save to a new shapefile:
# Import attribute CSV file # Note that the first four row of World Bank CSV files are # metadata that can be skipped with the "skip=4" parameter. # The year column names will have an X prepended to them # (e.g. X2014) to avoid confusing the column names with numbers. attributes = read.csv("API_AG.CON.FERT.ZS_DS2_en_csv_v2.csv", skip=4, stringsAsFactors=F) # Import tract polygons # The "dsn" parameter is the directory where the shapefile is located. # A dot "." means the current directory. # The "layer" parameter is the shapefile name (without the .shp suffix) library(rgdal) countries = readOGR(dsn=".", layer="2017-world-data", stringsAsFactors=F) # Reproject the polygons to an cartographically-valid Robinson projection robinson = CRS("+proj=robin +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs") countries = spTransform(countries, robinson) # Merge (join) the attributes to the polygons. # The World Bank country code in the polygon file is "WBA2016." countries = merge(countries, attributes, by.x = "WBA2016", by.y = "Country.Code") # Convert the appropriate column to a new integer column. # Data read from a CSV file may be characters rather than numbers # and need to be converted to numbers using as.double(). # The name of this column wih vary depending on what variable(s) you use countries$FERTILIZER = as.double(countries$X2014) # Keep only needed columns countries@data = countries@data[,c("NAME", "WBA2016", "FERTILIZER")] # Write shapefile writeOGR(countries, dsn=".", layer="fertilizer", driver="ESRI Shapefile", overwrite_layer=T) # Optional diagnostic plot # The log() transformation is needed because there are a handful # of countries with very high fertilizer use that skew the distribution library(sp) countries$FERTILIZER = log(countries$FERTILIZER) spplot(countries, zcol="FERTILIZER")