download notebook
view notebook w/o solutions
Maps
files needed = ('cities_4269.zip') we will download = ('ne_50m_admin_0_countries.zip', 'cb_2021_us_county_5m.zip')
Okay, we now know enough about figures and pandas to get a map off the ground.
We will use the geopandas package to plot our maps. Maps are really quite complicated. We are trying to project a spherical surface onto a flat figure, which is inherently a complicated endeavor. Luckily, geopandas will handle most of it for us. I would also add that geopandas is a notoriously difficult package to install. It relies on other software to be installed and set up correctly. (Try a google search for "geopandas install error.") This is another time when having winstat helps a lot.
Geopandas extends pandas to include geospatial data. It provides plotting routines that work with matplotlib to create visualizations.
import pandas as pd # pandas for data management
import geopandas # geopandas for maps work
from shapely.geometry import Point # shapely handles the coordinate references for plotting shapes
import matplotlib.pyplot as plt # matplotlib for plotting details
Setting up the GeoDataFrame
Let's start by plotting some cities. The DataFrame below holds longitudes and latitudes of major South American cities. Our goal is to turn them into something we can plot—in this case, a GeoDataFrame.
cities = pd.DataFrame(
{'City': ['Buenos Aires', 'Brasilia', 'Santiago', 'Bogota', 'Caracas'],
'Country': ['Argentina', 'Brazil', 'Chile', 'Colombia', 'Venezuela'],
'Latitude': [-34.58, -15.78, -33.45, 4.60, 10.48],
'Longitude': [-58.66, -47.91, -70.66, -74.08, -66.86]})
cities.head()
City | Country | Latitude | Longitude | |
---|---|---|---|---|
0 | Buenos Aires | Argentina | -34.58 | -58.66 |
1 | Brasilia | Brazil | -15.78 | -47.91 |
2 | Santiago | Chile | -33.45 | -70.66 |
3 | Bogota | Colombia | 4.60 | -74.08 |
4 | Caracas | Venezuela | 10.48 | -66.86 |
We need tuples of coordinates to map the cities. We zip together lat and long to create the tuples and store them in a column named 'Coordinates'.
cities['Coordinates'] = list(zip(cities.Longitude, cities.Latitude))
cities.head()
City | Country | Latitude | Longitude | Coordinates | |
---|---|---|---|---|---|
0 | Buenos Aires | Argentina | -34.58 | -58.66 | (-58.66, -34.58) |
1 | Brasilia | Brazil | -15.78 | -47.91 | (-47.91, -15.78) |
2 | Santiago | Chile | -33.45 | -70.66 | (-70.66, -33.45) |
3 | Bogota | Colombia | 4.60 | -74.08 | (-74.08, 4.6) |
4 | Caracas | Venezuela | 10.48 | -66.86 | (-66.86, 10.48) |
So far, we haven't done anything that requires new packages. We just have a DataFrame full of numbers and strings.
Next, we turn the tuple into a Point object. Notice that we imported Point from the Shapely package in the first code cell. We use the .map()
method of Series to apply the Point function to each row of the Coordinates column.
cities['Coordinates'] = cities['Coordinates'].map(Point)
cities.head()
City | Country | Latitude | Longitude | Coordinates | |
---|---|---|---|---|---|
0 | Buenos Aires | Argentina | -34.58 | -58.66 | POINT (-58.66 -34.58) |
1 | Brasilia | Brazil | -15.78 | -47.91 | POINT (-47.91 -15.78) |
2 | Santiago | Chile | -33.45 | -70.66 | POINT (-70.66 -33.45) |
3 | Bogota | Colombia | 4.60 | -74.08 | POINT (-74.08 4.6) |
4 | Caracas | Venezuela | 10.48 | -66.86 | POINT (-66.86 10.48) |
We now have a column of Point objects.
We turn the DataFrame into a GeoDataFrame, which is a data structure that understands how to plot maps. The important part here is that we specify the column that contains the geometery
data. From the docs:
The most important property of a GeoDataFrame is that it always has one GeoSeries column that holds a special status. This GeoSeries is referred to as the GeoDataFrame’s “geometry”. When a spatial method is applied to a GeoDataFrame (or a spatial attribute like area is called), this commands will always act on the “geometry” column.
In our case, the geometry data are the points in the 'Coordinates' column.
gcities = geopandas.GeoDataFrame(cities, geometry='Coordinates')
gcities.head()
City | Country | Latitude | Longitude | Coordinates | |
---|---|---|---|---|---|
0 | Buenos Aires | Argentina | -34.58 | -58.66 | POINT (-58.66000 -34.58000) |
1 | Brasilia | Brazil | -15.78 | -47.91 | POINT (-47.91000 -15.78000) |
2 | Santiago | Chile | -33.45 | -70.66 | POINT (-70.66000 -33.45000) |
3 | Bogota | Colombia | 4.60 | -74.08 | POINT (-74.08000 4.60000) |
4 | Caracas | Venezuela | 10.48 | -66.86 | POINT (-66.86000 10.48000) |
# Doesn't look different than a vanilla DataFrame...let's make sure we have what we want.
print('gdf is of type:', type(gcities))
# And how can we tell which column is the geometry column? It is an attribute!
print('\nThe geometry column is:', gcities.geometry.name)
gdf is of type: <class 'geopandas.geodataframe.GeoDataFrame'>
The geometry column is: Coordinates
Plotting the map
Okay, we have our points in the GeoDataFrame. Let's plot the locations on a map. We proceed in three steps: 1. Get the map 2. Plot the map 3. Plot the points on the map
1. Get the map
We will get our map data from Natural Earth. The file provides the outlines of countries over which we will plot the locations of the cities in our GeoDataFrame.
Go to Natural Earth and get the medium scale (1:50m) "cultural" dataset: Admin-0. The name of the file should be "ne_50m_admin_0_countries.zip". Direct link to the data.
Two ways to deal with this file. 1. Unzip the folder. Keep the folder and all of the files that are in the folder. Then load the file 'ne_50m_admin_0_countries.shp'
- Leave the file zipped. This usually works, but if it doesn't try method 1.
Then we load the shapefile using the .read_file
method of geopandas.
# Step 1: Get the map.
# Method 1: Leave it zipped
world = geopandas.read_file('ne_50m_admin_0_countries.zip')
# Method 2: Unzipped file
# world = geopandas.read_file('ne_50m_admin_0_countries/ne_50m_admin_0_countries.shp')
world.head(2)
featurecla | scalerank | LABELRANK | SOVEREIGNT | SOV_A3 | ADM0_DIF | LEVEL | TYPE | TLC | ADMIN | ... | FCLASS_TR | FCLASS_ID | FCLASS_PL | FCLASS_GR | FCLASS_IT | FCLASS_NL | FCLASS_SE | FCLASS_BD | FCLASS_UA | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | Admin-0 country | 1 | 3 | Zimbabwe | ZWE | 0 | 2 | Sovereign country | 1 | Zimbabwe | ... | None | None | None | None | None | None | None | None | None | POLYGON ((31.28789 -22.40205, 31.19727 -22.344... |
1 | Admin-0 country | 1 | 3 | Zambia | ZMB | 0 | 2 | Sovereign country | 1 | Zambia | ... | None | None | None | None | None | None | None | None | None | POLYGON ((30.39609 -15.64307, 30.25068 -15.643... |
2 rows × 169 columns
# Which column holds the geometry data?
world.geometry.name
'geometry'
This is another GeoDataFrame. The geometry data are the column named 'geometry'.
A quick word about polygons
Instead of Points, the geometery are POLYGONs. The polygons are the shapes of countries.
world = world.set_index('SOVEREIGNT')
# Hello Albania
world.loc['Albania','geometry']
Wow, that was cool.
A polygon is a loop of points connected by straight lines (e.g., triangle or rectangle). The more points we have, the closer the polygon can approximate non-linear shapes. So Albania's shape is defined by many points connected by lines.
# Returns two arrays that hold the x and y coordinates of the points that define the polygon's exterior.
x, y = world.loc['Albania','geometry'].exterior.coords.xy
# How many points?
print('Points in the exterior of Albania:', len(x))
Points in the exterior of Albania: 113
# Afghanistan is a more complicated shape
world.loc['Afghanistan','geometry']
# Returns two arrays that hold the x and y coordinates of the points that define the polygon's exterior.
x, y = world.loc['Afghanistan', 'geometry'].exterior.coords.xy
# How many points?
print('Points in the exterior of Afghanistan:', len(x))
Points in the exterior of Afghanistan: 410
2. Ploting the map
Here is the code. Details in the cell below it.
# Step 2: Plot the map
fig, gax = plt.subplots(figsize=(10,10))
# I only want to plot South America, so I create a list of the country codes
sa = ['ARG', 'BRA', 'COL', 'GUY', 'SUR', 'VEN', 'BOL', 'CHL', 'ECU', 'PER', 'URY']
world[world['SOV_A3'].isin(sa)].plot(ax = gax, edgecolor='black',color='white')
gax.set_xlabel('longitude') # By the way, if you haven't read the book 'longitude' by Dava Sobel, you should...
gax.set_ylabel('latitude')
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)
plt.show()
GeoDataFrame plots
Nice one! That was easy, and now we have a map.
Note the different syntax for plot. We have been using the plot()
method of matplotlib axes objects, so we usually called
gax.plot(x, y)
which plotted x against y on the axis gax.
With a GeoDataFrame, we invoke the plot()
method of a GeoDataFrame object with
gdf.plot(ax = gax)
which will plot the geometry data in gdf on the axis gax. This is similar to the syntax that seaborn uses.
Other options
Notice that lots of the regular matplotlib options still work. I can still turn off the top and right spines (do I want to?) and I can add x and y axes labels. The parameter 'edgecolor' sets the line colors, etc.
It looks like I didn't put a title on my plot. Poor form. Let's fix it when we add the cities.
3. Plot the cities on the map
# Step 3: plot the cities onto the map
# We mostly use the code from before --- we still want the country borders ploted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))
world[world['SOV_A3'].isin(sa)].plot(ax = gax, edgecolor='black', color='white')
# This plots the cities. It's the same sytax, but we are plotting from a different GeoDataFrame.
# I want the cities as pale red dots.
gcities.plot(ax=gax, color='red', alpha = 0.5)
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('South America')
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)
# or get rid of all the axis
# gax.axis('off')
plt.show()
This looks pretty nice. What else might we want to do?
- Maybe add some labels to the points? We can add labels to the figure using
text()
just like we would in a regular figure. We need to be a little careful about specifying where to put the label. We do this below if you are interested. - I would probably remove all the axes, unless knowing the latitude and longitude is important.
Practice: Maps
Let's plot Wisconsin and all of its counties. Along the way, we will learn where to find 'shape files' for U.S. states and counties. Shape files hold the polygons of areas.
The steps:
- Plot the county borders
- Plot the cities
We'll do the first step together.
1. Plot the county borders
Go to https://www.census.gov/geographies/mapping-files/time-series/geo/cartographic-boundary.html
Here we find shapefiles for all kinds of maps: states, counties, congressional districts, zip codes. These are the cartographic boundary files. They are simplified maps. For example, they do not include the tiny islands off the Wisconsin coast, even though those islands are part of the state. These simplified files are not meant for navigation, but for "thematic" mapping. Which is what we are doing. Here is some background on the files.
We are going to work with the '5m' county border file. This means the resolution of the map is 1:5,000,000. There are 1:500,000 and 1:20,000,000 files, too. Different resolutions carry different levels of detail. You can experiment with the different shapefiles and see the differences.
Again, either leave the file zipped, or unzip it and keep all the files in the folder.
1. Plot the county borders
Let's add the county borders. To do so, we first need to get the shape files. Use the '5m' files.
-
Like before, read the file in to a GeoDataFrame.
-
This GeoDataFrame has all the counties for all the states in it. We only want the ones from Wisconsin. The federal information processing standard (fips) code for Wisconsin is 55. Keep only counties from Wisconsin. (A chance to practice our subsetting!)
-
Plot the Wisconsin counties. You might try adapting the code from our example above plotting South America.
counties = geopandas.read_file('cb_2021_us_county_5m.zip')
counties.head(2)
STATEFP | COUNTYFP | COUNTYNS | AFFGEOID | GEOID | NAME | NAMELSAD | STUSPS | STATE_NAME | LSAD | ALAND | AWATER | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 01 | 059 | 00161555 | 0500000US01059 | 01059 | Franklin | Franklin County | AL | Alabama | 06 | 1641845708 | 32639621 | POLYGON ((-88.16591 34.38093, -88.16563 34.383... |
1 | 06 | 057 | 01682927 | 0500000US06057 | 06057 | Nevada | Nevada County | CA | California | 06 | 2480587301 | 41531993 | POLYGON ((-121.27953 39.23054, -121.25918 39.2... |
# What do we have here...
print(type(counties))
print(counties.geometry.name)
print(counties.crs)
<class 'geopandas.geodataframe.GeoDataFrame'>
geometry
epsg:4269
# Sconie's fips code is 55. Keep only those counties
wi_counties = counties[counties['STATEFP']=='55'] # the state codes came in as text. should look into this...
# Give it a quick sort so we can take a look
wi_counties = wi_counties.sort_values('NAME') # is Adams county here?
wi_counties.head(2)
STATEFP | COUNTYFP | COUNTYNS | AFFGEOID | GEOID | NAME | NAMELSAD | STUSPS | STATE_NAME | LSAD | ALAND | AWATER | geometry | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1813 | 55 | 001 | 01581060 | 0500000US55001 | 55001 | Adams | Adams County | WI | Wisconsin | 06 | 1672216054 | 108348982 | POLYGON ((-90.02595 44.09175, -90.00480 44.102... |
2475 | 55 | 003 | 01581061 | 0500000US55003 | 55003 | Ashland | Ashland County | WI | Wisconsin | 06 | 2706458787 | 3230728318 | MULTIPOLYGON (((-90.45673 47.01691, -90.45713 ... |
fig, gax = plt.subplots(figsize=(10,10))
# Plot the counties
wi_counties.plot(ax=gax, edgecolor='black', color = 'white')
plt.axis('off')
plt.show()
2. Plot some cities
The file "cities_4269.zip" contains a shapefile with some Wisconsin cities.
Load the data, like you did with the county borders. The data include the location of the cities and their populations.
Add the cities to your plot. I made my cities blue dots.
#wi_cities = geopandas.read_file('cities_4269/cities_4269/cities_4269.shp')
wi_cities = geopandas.read_file('cities_4269.zip')
wi_cities.head()
city | rank | state | growth_fro | population | geometry | |
---|---|---|---|---|---|---|
0 | West Allis | 582.0 | Wisconsin | -0.6 | 60697.0 | POINT (-88.00703 43.01668) |
1 | Waukesha | 475.0 | Wisconsin | 8.0 | 71016.0 | POINT (-88.23148 43.01168) |
2 | Wausau | 945.0 | Wisconsin | 1.7 | 39309.0 | POINT (-89.63012 44.95914) |
3 | Fond du Lac | 857.0 | Wisconsin | 1.7 | 42970.0 | POINT (-88.44705 43.77304) |
4 | Brookfield | 971.0 | Wisconsin | -1.9 | 37999.0 | POINT (-88.10648 43.06057) |
fig, gax = plt.subplots(figsize=(10,10))
# Plot the counties
wi_counties.plot(ax=gax, edgecolor='black', color = 'white')
# Plot the cities
wi_cities.plot(ax=gax, color='blue')
plt.axis('off')
plt.show()
Labeling maps
Label the points
Let's return to the South American map from earlier and label the points with the city names.
Each label is our usual .text()
method. It would be tedious to type all of those in, so let's automate this. Here is the code, we describe it below.
# Step 3: plot the cities onto the map
# We mostly use the code from before --- we still want the country borders ploted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))
world[world['SOV_A3'].isin(sa)].plot(ax = gax, edgecolor='black',color='white')
# This plot the cities. It's the same sytax, but we are plotting from a different GeoDataFrame. I want the
# cities as pale red dots.
gcities.plot(ax=gax, color='red', alpha = 0.5)
gax.axis('off')
# Label the cities
for x, y, label in zip(gcities['Coordinates'].x, gcities['Coordinates'].y, gcities['City']):
gax.text(x, y, label)
plt.show()
Adding labels to points
That took more work than I expected. Let's talk through that code. The first bit of code is
for x, y, label in zip(gdf['Coordinates'].x, gdf['Coordinates'].y, gdf['City']):
for
is looping over each city in the GeoDataFrame.gdf['Coordinates'].x, gdf['Coordinates'].y, gdf['City']
takes the x part of the coordinate, the y part of the coordinate and the name of the city for each city.zip()
combines the x-coord, the y-coord, and the name together (zip is a handy function that takes multiple iterable objects and aggregates them together)x, y, label
will hold the three values.
So, for each row the for
loops over, x is the x-coord, y is the y-coord, and label is the city name for city defined in that row. We use this data with .text()
to apply the label at point (x,y)
gax.text(x , y, label)
Improving the labels
The labels get the job done, but they are a bit ugly. In particular, they are sitting on top of the dot.
We can use annotate()
to fix this up. We have used annotate()
before to add arrows connecting the text to a point. Here, we will use the ability to specify an offset of the text from the point. Here is the syntax
gax.annotate(label, xy=(x,y), xytext=(3,3), textcoords='offset points')
The parameter 'xy' is the point we are referencing. The parameter 'xytext' holds the number of points to offset the text from the point. The argument 'offset points' tells annotate that the (3,3) tuple we passed to 'xytext' is full of points to offset the label from the text.
# Step 3: plot the cities onto the map
# We mostly use the code from before --- we still want the country borders ploted --- and we add a command to plot the cities
fig, gax = plt.subplots(figsize=(10,10))
world[world['SOV_A3'].isin(sa)].plot(ax = gax, edgecolor='black',color='white')
# This plot the cities. It's the same sytax, but we are plotting from a different GeoDataFrame. I want the
# cities as pale red dots.
gcities.plot(ax=gax, color='red', alpha = 0.5)
gax.set_xlabel('longitude')
gax.set_ylabel('latitude')
gax.set_title('South America')
# Kill the spines...
gax.spines['top'].set_visible(False)
gax.spines['right'].set_visible(False)
# ...or get rid of all the axis. Is it important to know the lat and long?
# plt.axis('off')
# Label the cities
for x, y, label in zip(gcities['Coordinates'].x, gcities['Coordinates'].y, gcities['City']):
gax.annotate(label, xy=(x,y), xytext=(3,3), textcoords='offset points')
plt.show()
Practice: Labeling points
Add the names of the Wisconsin cities to your map.
fig, gax = plt.subplots(figsize=(10,10))
# Plot the counties
wi_counties.plot(ax=gax, edgecolor='black', color = 'white', alpha=0.25)
# Plot the cities
wi_cities.plot(ax=gax, color='blue')
# Label the cities.
for x, y, label in zip(wi_cities['geometry'].x, wi_cities['geometry'].y, wi_cities['city']):
gax.annotate(label, xy=(x,y), xytext=(4,4), textcoords='offset points')
plt.axis('off')
plt.show()
This is a good start, but the cities around Milwaukee are a mess. These would have to be tweaked by hand.