download notebook
view notebook w/o solutions

Maps

files needed = ('cities_4269.zip')
we will download = ('ne_50m_admin_0_countries.zip', 'cb_2021_us_county_5m.zip')

Okay, we now know enough about figures and pandas to get a map off the ground.

We will use the geopandas package to plot our maps. Maps are really quite complicated. We are trying to project a spherical surface onto a flat figure, which is inherently a complicated endeavor. Luckily, geopandas will handle most of it for us. I would also add that geopandas is a notoriously difficult package to install. It relies on other software to be installed and set up correctly. (Try a google search for "geopandas install error.") This is another time when having winstat helps a lot.

Geopandas extends pandas to include geospatial data. It provides plotting routines that work with matplotlib to create visualizations.

import pandas as pd                         # pandas for data management
import geopandas                            # geopandas for maps work
from shapely.geometry import Point          # shapely handles the coordinate references for plotting shapes
import matplotlib.pyplot as plt             # matplotlib for plotting details

Setting up the GeoDataFrame

Let's start by plotting some cities. The DataFrame below holds longitudes and latitudes of major South American cities. Our goal is to turn them into something we can plot—in this case, a GeoDataFrame.

cities = pd.DataFrame(
    {'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
     'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
     'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
     'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})
cities.head()
City Country Latitude Longitude
0 Buenos Aires Argentina -34.58 -58.66
1 Brasilia Brazil -15.78 -47.91
2 Santiago Chile -33.45 -70.66
3 Bogota Colombia 4.60 -74.08
4 Caracas Venezuela 10.48 -66.86

We need tuples of coordinates to map the cities. We zip together lat and long to create the tuples and store them in a column named 'Coordinates'.

cities['Coordinates'] = list(zip(cities.Longitude, cities.Latitude))
cities.head()
City Country Latitude Longitude Coordinates
0 Buenos Aires Argentina -34.58 -58.66 (-58.66, -34.58)
1 Brasilia Brazil -15.78 -47.91 (-47.91, -15.78)
2 Santiago Chile -33.45 -70.66 (-70.66, -33.45)
3 Bogota Colombia 4.60 -74.08 (-74.08, 4.6)
4 Caracas Venezuela 10.48 -66.86 (-66.86, 10.48)

So far, we haven't done anything that requires new packages. We just have a DataFrame full of numbers and strings.

Next, we turn the tuple into a Point object. Notice that we imported Point from the Shapely package in the first code cell. We use the .map() method of Series to apply the Point function to each row of the Coordinates column.

cities['Coordinates'] = cities['Coordinates'].map(Point)
cities.head()
City Country Latitude Longitude Coordinates
0 Buenos Aires Argentina -34.58 -58.66 POINT (-58.66 -34.58)
1 Brasilia Brazil -15.78 -47.91 POINT (-47.91 -15.78)
2 Santiago Chile -33.45 -70.66 POINT (-70.66 -33.45)
3 Bogota Colombia 4.60 -74.08 POINT (-74.08 4.6)
4 Caracas Venezuela 10.48 -66.86 POINT (-66.86 10.48)

We now have a column of Point objects.

We turn the DataFrame into a GeoDataFrame, which is a data structure that understands how to plot maps. The important part here is that we specify the column that contains the geometery data. From the docs:

The most important property of a GeoDataFrame is that it always has one GeoSeries column that holds a special status. This GeoSeries is referred to as the GeoDataFrame’s “geometry”. When a spatial method is applied to a GeoDataFrame (or a spatial attribute like area is called), this commands will always act on the “geometry” column.

In our case, the geometry data are the points in the 'Coordinates' column.

gcities = geopandas.GeoDataFrame(cities, geometry='Coordinates')
gcities.head()
City Country Latitude Longitude Coordinates
0 Buenos Aires Argentina -34.58 -58.66 POINT (-58.66000 -34.58000)
1 Brasilia Brazil -15.78 -47.91 POINT (-47.91000 -15.78000)
2 Santiago Chile -33.45 -70.66 POINT (-70.66000 -33.45000)
3 Bogota Colombia 4.60 -74.08 POINT (-74.08000 4.60000)
4 Caracas Venezuela 10.48 -66.86 POINT (-66.86000 10.48000)
# Doesn't look different than a vanilla DataFrame...let's make sure we have what we want.
print('gdf is of type:', type(gcities))

# And how can we tell which column is the geometry column? It is an attribute!
print('\nThe geometry column is:', gcities.geometry.name)
gdf is of type: <class 'geopandas.geodataframe.GeoDataFrame'>

The geometry column is: Coordinates

Plotting the map

Okay, we have our points in the GeoDataFrame. Let's plot the locations on a map. We proceed in three steps: 1. Get the map 2. Plot the map 3. Plot the points on the map

1. Get the map

We will get our map data from Natural Earth. The file provides the outlines of countries over which we will plot the locations of the cities in our GeoDataFrame.

Go to Natural Earth and get the medium scale (1:50m) "cultural" dataset: Admin-0. The name of the file should be "ne_50m_admin_0_countries.zip". Direct link to the data.

Two ways to deal with this file. 1. Unzip the folder. Keep the folder and all of the files that are in the folder. Then load the file 'ne_50m_admin_0_countries.shp'

  1. Leave the file zipped. This usually works, but if it doesn't try method 1.

Then we load the shapefile using the .read_file method of geopandas.

# Step 1: Get the map.

# Method 1: Leave it zipped
world = geopandas.read_file('ne_50m_admin_0_countries.zip')

# Method 2: Unzipped file
# world = geopandas.read_file('ne_50m_admin_0_countries/ne_50m_admin_0_countries.shp')
world.head(2)
featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE TLC ADMIN ... FCLASS_TR FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA geometry
0 Admin-0 country 1 3 Zimbabwe ZWE 0 2 Sovereign country 1 Zimbabwe ... None None None None None None None None None POLYGON ((31.28789 -22.40205, 31.19727 -22.344...
1 Admin-0 country 1 3 Zambia ZMB 0 2 Sovereign country 1 Zambia ... None None None None None None None None None POLYGON ((30.39609 -15.64307, 30.25068 -15.643...

2 rows × 169 columns

# Which column holds the geometry data?
world.geometry.name
'geometry'

This is another GeoDataFrame. The geometry data are the column named 'geometry'.

A quick word about polygons

Instead of Points, the geometery are POLYGONs. The polygons are the shapes of countries.

world = world.set_index('SOVEREIGNT')

# Hello Albania
world.loc['Albania','geometry']

svg

Wow, that was cool.

A polygon is a loop of points connected by straight lines (e.g., triangle or rectangle). The more points we have, the closer the polygon can approximate non-linear shapes. So Albania's shape is defined by many points connected by lines.

# Returns two arrays that hold the x and y coordinates of the points that define the polygon's exterior.
x, y = world.loc['Albania','geometry'].exterior.coords.xy

# How many points?
print('Points in the exterior of Albania:', len(x))
Points in the exterior of Albania: 113
# Afghanistan is a more complicated shape
world.loc['Afghanistan','geometry']

svg

# Returns two arrays that hold the x and y coordinates of the points that define the polygon's exterior.
x, y = world.loc['Afghanistan', 'geometry'].exterior.coords.xy

# How many points?
print('Points in the exterior of Afghanistan:', len(x))
Points in the exterior of Afghanistan: 410

2. Ploting the map

Here is the code. Details in the cell below it.

# Step 2: Plot the map

fig, gax = plt.subplots(figsize=(10,10))

# I only want to plot South America, so I create a list of the country codes
sa = ['ARG', 'BRA', 'COL', 'GUY', 'SUR', 'VEN', 'BOL', 'CHL', 'ECU', 'PER', 'URY']

world[world['SOV_A3'].isin(sa)].plot(ax = gax, edgecolor='black',color='white')

gax.set_xlabel('longitude')  # By the way, if you haven't read the book 'longitude' by Dava Sobel, you should...
gax.set_ylabel('latitude')

gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

plt.show()

png

GeoDataFrame plots

Nice one! That was easy, and now we have a map.

Note the different syntax for plot. We have been using the plot() method of matplotlib axes objects, so we usually called

gax.plot(x, y)

which plotted x against y on the axis gax.

With a GeoDataFrame, we invoke the plot() method of a GeoDataFrame object with

gdf.plot(ax = gax)

which will plot the geometry data in gdf on the axis gax. This is similar to the syntax that seaborn uses.

Other options

Notice that lots of the regular matplotlib options still work. I can still turn off the top and right spines (do I want to?) and I can add x and y axes labels. The parameter 'edgecolor' sets the line colors, etc.

It looks like I didn't put a title on my plot. Poor form. Let's fix it when we add the cities.

3. Plot the cities on the map

# Step 3: plot the cities onto the map
# We mostly use the code from before --- we still want the country borders ploted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))

world[world['SOV_A3'].isin(sa)].plot(ax = gax, edgecolor='black', color='white')

# This plots the cities. It's the same sytax, but we are plotting from a different GeoDataFrame.
# I want the cities as pale red dots.
gcities.plot(ax=gax, color='red', alpha = 0.5)

gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('South America')

gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

# or get rid of all the axis
# gax.axis('off')

plt.show()

png

This looks pretty nice. What else might we want to do?

  • Maybe add some labels to the points? We can add labels to the figure using text() just like we would in a regular figure. We need to be a little careful about specifying where to put the label. We do this below if you are interested.
  • I would probably remove all the axes, unless knowing the latitude and longitude is important.

Practice: Maps

Let's plot Wisconsin and all of its counties. Along the way, we will learn where to find 'shape files' for U.S. states and counties. Shape files hold the polygons of areas.

The steps:

  1. Plot the county borders
  2. Plot the cities

We'll do the first step together.

1. Plot the county borders

Go to https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html

Here we find shapefiles for all kinds of maps: states, counties, congressional districts, zip codes. These are the cartographic boundary files. They are simplified maps. For example, they do not include the tiny islands off the Wisconsin coast, even though those islands are part of the state. These simplified files are not meant for navigation, but for "thematic" mapping. Which is what we are doing. Here is some background on the files.

We are going to work with the '5m' county border file. This means the resolution of the map is 1:5,000,000. There are 1:500,000 and 1:20,000,000 files, too. Different resolutions carry different levels of detail. You can experiment with the different shapefiles and see the differences.

Again, either leave the file zipped, or unzip it and keep all the files in the folder.

1. Plot the county borders

Let's add the county borders. To do so, we first need to get the shape files. Use the '5m' files.

  1. Like before, read the file in to a GeoDataFrame.

  2. This GeoDataFrame has all the counties for all the states in it. We only want the ones from Wisconsin. The federal information processing standard (fips) code for Wisconsin is 55. Keep only counties from Wisconsin. (A chance to practice our subsetting!)

  3. Plot the Wisconsin counties. You might try adapting the code from our example above plotting South America.

counties = geopandas.read_file('cb_2021_us_county_5m.zip')
counties.head(2)
STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME NAMELSAD STUSPS STATE_NAME LSAD ALAND AWATER geometry
0 01 059 00161555 0500000US01059 01059 Franklin Franklin County AL Alabama 06 1641845708 32639621 POLYGON ((-88.16591 34.38093, -88.16563 34.383...
1 06 057 01682927 0500000US06057 06057 Nevada Nevada County CA California 06 2480587301 41531993 POLYGON ((-121.27953 39.23054, -121.25918 39.2...
# What do we have here...
print(type(counties))
print(counties.geometry.name)
print(counties.crs)
<class 'geopandas.geodataframe.GeoDataFrame'>
geometry
epsg:4269
# Sconie's fips code is 55. Keep only those counties
wi_counties = counties[counties['STATEFP']=='55']   # the state codes came in as text. should look into this...

# Give it a quick sort so we can take a look
wi_counties = wi_counties.sort_values('NAME')       # is Adams county here?
wi_counties.head(2)
STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME NAMELSAD STUSPS STATE_NAME LSAD ALAND AWATER geometry
1813 55 001 01581060 0500000US55001 55001 Adams Adams County WI Wisconsin 06 1672216054 108348982 POLYGON ((-90.02595 44.09175, -90.00480 44.102...
2475 55 003 01581061 0500000US55003 55003 Ashland Ashland County WI Wisconsin 06 2706458787 3230728318 MULTIPOLYGON (((-90.45673 47.01691, -90.45713 ...
fig, gax = plt.subplots(figsize=(10,10))

# Plot the counties
wi_counties.plot(ax=gax, edgecolor='black', color = 'white')

plt.axis('off')

plt.show()

png

2. Plot some cities

The file "cities_4269.zip" contains a shapefile with some Wisconsin cities.

Load the data, like you did with the county borders. The data include the location of the cities and their populations.

Add the cities to your plot. I made my cities blue dots.

#wi_cities = geopandas.read_file('cities_4269/cities_4269/cities_4269.shp')
wi_cities = geopandas.read_file('cities_4269.zip')
wi_cities.head()
city rank state growth_fro population geometry
0 West Allis 582.0 Wisconsin -0.6 60697.0 POINT (-88.00703 43.01668)
1 Waukesha 475.0 Wisconsin 8.0 71016.0 POINT (-88.23148 43.01168)
2 Wausau 945.0 Wisconsin 1.7 39309.0 POINT (-89.63012 44.95914)
3 Fond du Lac 857.0 Wisconsin 1.7 42970.0 POINT (-88.44705 43.77304)
4 Brookfield 971.0 Wisconsin -1.9 37999.0 POINT (-88.10648 43.06057)
fig, gax = plt.subplots(figsize=(10,10))

# Plot the counties
wi_counties.plot(ax=gax, edgecolor='black', color = 'white')

# Plot the cities
wi_cities.plot(ax=gax, color='blue')

plt.axis('off')

plt.show()

png

Labeling maps

Label the points

Let's return to the South American map from earlier and label the points with the city names.

Each label is our usual .text() method. It would be tedious to type all of those in, so let's automate this. Here is the code, we describe it below.

# Step 3: plot the cities onto the map
# We mostly use the code from before --- we still want the country borders ploted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))


world[world['SOV_A3'].isin(sa)].plot(ax = gax, edgecolor='black',color='white')

# This plot the cities. It's the same sytax, but we are plotting from a different GeoDataFrame. I want the
# cities as pale red dots.
gcities.plot(ax=gax, color='red', alpha = 0.5)

gax.axis('off')

# Label the cities
for x, y, label in zip(gcities['Coordinates'].x, gcities['Coordinates'].y, gcities['City']):
    gax.text(x, y, label)

plt.show()

png

Adding labels to points

That took more work than I expected. Let's talk through that code. The first bit of code is

for x, y, label in zip(gdf['Coordinates'].x, gdf['Coordinates'].y, gdf['City']):
  1. for is looping over each city in the GeoDataFrame.
  2. gdf['Coordinates'].x, gdf['Coordinates'].y, gdf['City'] takes the x part of the coordinate, the y part of the coordinate and the name of the city for each city.
  3. zip() combines the x-coord, the y-coord, and the name together (zip is a handy function that takes multiple iterable objects and aggregates them together)
  4. x, y, label will hold the three values.

So, for each row the for loops over, x is the x-coord, y is the y-coord, and label is the city name for city defined in that row. We use this data with .text() to apply the label at point (x,y)

gax.text(x , y, label)

Improving the labels

The labels get the job done, but they are a bit ugly. In particular, they are sitting on top of the dot.

We can use annotate() to fix this up. We have used annotate() before to add arrows connecting the text to a point. Here, we will use the ability to specify an offset of the text from the point. Here is the syntax

gax.annotate(label, xy=(x,y), xytext=(3,3), textcoords='offset points')

The parameter 'xy' is the point we are referencing. The parameter 'xytext' holds the number of points to offset the text from the point. The argument 'offset points' tells annotate that the (3,3) tuple we passed to 'xytext' is full of points to offset the label from the text.

# Step 3: plot the cities onto the map
# We mostly use the code from before --- we still want the country borders ploted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))

world[world['SOV_A3'].isin(sa)].plot(ax = gax, edgecolor='black',color='white')

# This plot the cities. It's the same sytax, but we are plotting from a different GeoDataFrame. I want the
# cities as pale red dots.
gcities.plot(ax=gax, color='red', alpha = 0.5)

gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('South America')

# Kill the spines...
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)

# ...or get rid of all the axis. Is it important to know the lat and long?
# plt.axis('off')


# Label the cities
for x, y, label in zip(gcities['Coordinates'].x, gcities['Coordinates'].y, gcities['City']):
    gax.annotate(label, xy=(x,y), xytext=(3,3), textcoords='offset points')

plt.show()

png

Practice: Labeling points

Add the names of the Wisconsin cities to your map.

fig, gax = plt.subplots(figsize=(10,10))

# Plot the counties
wi_counties.plot(ax=gax, edgecolor='black', color = 'white', alpha=0.25)

# Plot the cities
wi_cities.plot(ax=gax, color='blue')

# Label the cities.
for x, y, label in zip(wi_cities['geometry'].x, wi_cities['geometry'].y, wi_cities['city']):
    gax.annotate(label, xy=(x,y), xytext=(4,4), textcoords='offset points')

plt.axis('off')

plt.show()

png

This is a good start, but the cities around Milwaukee are a mess. These would have to be tweaked by hand.