download notebook
view notebook w/o solutions

Choropleth maps

files needed = ('ne_50m_admin_0_counties', 'cb_2021_us_county_5m.shp', 'results.csv')

We have figured out how to plot geographic data using geopandas. We needed shape files loaded into geoDataFrames with one column assigned as the geoDataFrame's geometry. Once that is set up, plotting is easy.

In this notebook we are going to learn how to assign colors to maps based on a variable — a choropleth. We looked at an example of this kind of figure on the first day of class: Voting patterns in Wisconsin. By the end of this workbook, we will have made a similar map.

import pandas as pd                         # pandas for data management
import geopandas                            # geopandas for maps work
import matplotlib.pyplot as plt             # matplotlib for plotting details

Choropleth maps

A choropleth map is one with regions assigned colors based on a variable. * 'column' is the column name that holds the variable to color by * 'cmap' is the color scheme. There are many. Review the vizualization notebook for a refresher on what types of color map are appropriate for different applications.

Let's return to the world map from the maps notebook.

# Load the shape file
world = geopandas.read_file('ne_50m_admin_0_countries.zip')
world.head(2)
featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE TLC ADMIN ... FCLASS_TR FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA geometry
0 Admin-0 country 1 3 Zimbabwe ZWE 0 2 Sovereign country 1 Zimbabwe ... None None None None None None None None None POLYGON ((31.28789 -22.40205, 31.19727 -22.344...
1 Admin-0 country 1 3 Zambia ZMB 0 2 Sovereign country 1 Zambia ... None None None None None None None None None POLYGON ((30.39609 -15.64307, 30.25068 -15.643...

2 rows × 169 columns

The dataset includes measures of population and GDP. Let's create a map where the color of each country depends on GDP per capita.

world['gdp_cap'] = world['GDP_MD']/world['POP_EST']
world['gdp_cap'].describe()
count    240.000000
mean       0.019169
std        0.030987
min       -0.120000
25%        0.002526
50%        0.007635
75%        0.023559
max        0.200000
Name: gdp_cap, dtype: float64
world[world['NAME']=='United States of America'][['NAME','gdp_cap','GDP_MD','POP_EST']]
NAME gdp_cap GDP_MD POP_EST
16 United States of America 0.065298 21433226 328239523.0

These numbers don't make much sense. There must be a scaling issue. This is why we always do sanity checks. Do the data look like we expect?

The GDP data are in millions (US GDP is on the order of trillions). The population data are not adjusted. (US population is around 330 million).

world['gdp_cap'] = world['GDP_MD']/world['POP_EST']*1000000
print('US GDP per capita is {0:,.0f}'.format(float(world[world['NAME']=='United States of America']['gdp_cap'])))
US GDP per capita is 65,298

That looks better. Let's check it out once more.

world = world.sort_values('gdp_cap', ascending=False)
world.loc[:, ['NAME', 'gdp_cap']]
NAME gdp_cap
239 Antarctica 200000.000000
108 Monaco 184477.979674
121 Liechtenstein 180856.939951
24 Bermuda 117087.518383
119 Luxembourg 114703.111490
... ... ...
17 S. Geo. and the Is. 0.000000
20 Pitcairn Is. 0.000000
5 Vatican -120000.000000
227 Heard I. and McDonald Is. NaN
229 Ashmore and Cartier Is. NaN

242 rows × 2 columns

The data for small places is crazy. Antarctica's GDP per capita doesn't really make sense. Let's only plot countries with populations greater than 50,000. I'm sorry to do this to you, Falkland Islands.

world[world['POP_EST']<50000].head(10)
featurecla scalerank LABELRANK SOVEREIGNT SOV_A3 ADM0_DIF LEVEL TYPE TLC ADMIN ... FCLASS_ID FCLASS_PL FCLASS_GR FCLASS_IT FCLASS_NL FCLASS_SE FCLASS_BD FCLASS_UA geometry gdp_cap
239 Admin-0 country 3 4 Antarctica ATA 0 2 Indeterminate 1 Antarctica ... None None None None None None None None MULTIPOLYGON (((-45.71777 -60.52090, -45.49971... 200000.000000
108 Admin-0 country 6 6 Monaco MCO 0 2 Sovereign country 1 Monaco ... None None None None None None None None POLYGON ((7.43867 43.75044, 7.37773 43.73174, ... 184477.979674
121 Admin-0 country 1 6 Liechtenstein LIE 0 2 Sovereign country 1 Liechtenstein ... None None None None None None None None POLYGON ((9.58027 47.05737, 9.50234 47.06274, ... 180856.939951
167 Admin-0 country 3 6 France FR1 1 2 Dependency 1 French Southern and Antarctic Lands ... None None None None None None None None MULTIPOLYGON (((69.18486 -49.10957, 69.26514 -... 114285.714286
22 Admin-0 country 1 5 United Kingdom GB1 1 2 Disputed 1 Falkland Islands ... None None None None None None None None MULTIPOLYGON (((-58.85020 -51.26992, -58.69751... 82989.994114
182 Admin-0 country 3 6 Denmark DN1 1 2 Dependency 1 Faroe Islands ... None None None None None None None None MULTIPOLYGON (((-6.62319 61.80596, -6.64277 61... 64012.490242
168 Admin-0 country 3 6 Finland FI1 1 2 Country 1 Aland ... None None None None None None None None MULTIPOLYGON (((19.98955 60.35117, 20.02021 60... 52302.235310
69 Admin-0 country 3 6 San Marino SMR 0 2 Sovereign country 1 San Marino ... None None None None None None None None POLYGON ((12.48525 43.90142, 12.42637 43.89409... 48877.731837
18 Admin-0 country 3 5 United Kingdom GB1 1 2 Disputed 1 British Indian Ocean Territory ... None None None None None None None None POLYGON ((72.49199 -7.37744, 72.46875 -7.41719... 40000.000000
161 Admin-0 country 4 4 France FR1 1 2 Dependency 1 Saint Pierre and Miquelon ... None None None None None None None None MULTIPOLYGON (((-56.15073 46.76240, -56.17168 ... 35851.258963

10 rows × 170 columns

Time to create the choropleth. The only difference from plotting a "regular" map is that we use the cmap option instead of the color option. For this example, I will use the Blues colormap.

fig, gax = plt.subplots(figsize=(15,5))

world[world['POP_EST']>50000].plot(ax = gax, column='gdp_cap', edgecolor='black', cmap = 'Blues', legend=True)

plt.axis('off')

plt.show()

png

That legend is atrocious. Never trust the defaults.

The legend is taking up the entire vertical span of the figure, which I sized at (15,10), even though the map does not. There are two ways to fix this.

  1. Shrink the vertical size. (15,5) starts to look pretty good.

  2. Add a separate legend to our plot. This will require digging into the guts of matplotlib. It's a good thing we are already experienced with matplotlib. First, we add a separate axis to our plot. Then we will tell the geopandas plot command to add the legend to the new axis. [This method works on other types of plots as well. It is sometimes useful for heat maps and creating insets.]

Let's take a look at the second way. It will teach us a bit more about axes, which will come in handy if we ever want to make map insets.

# import the make_axes_locatable method from matplotlib
from mpl_toolkits.axes_grid1 import make_axes_locatable
fig, map_ax = plt.subplots(figsize=(15,10))

# Adapted from https://geopandas.org/en/stable/docs/user_guide/mapping.html

# Create an axis divider object. It allows us to add another axis to the current one (map_ax).
divider = make_axes_locatable(map_ax)

# Add a second axis to the right of the original axes. 
# Make the new axes 3% of the original and add some padding between them.
legend_ax = divider.append_axes('right', size='3%', pad=0.1)


# The plot command works like normal, but: we plot the map on the map_ax (using the ax option)
# and we plot the legend on the legend_ax (using the cax option).
world[world['POP_EST']>50000].plot(
                                   ax = map_ax,                          # give ax the axes for the map
                                   column='gdp_cap',                     # the quantitative column
                                   edgecolor='black',                    # set the edge color
                                   cmap = 'Reds',                        # set the color map
                                   legend=True, cax=legend_ax)           # give cax the legend axes


# Here I ony turn off the axis of the main plot
map_ax.axis('off')

plt.show()

png

You can experiment with different padding, widths, locations, and color maps.

Practice

At the end of the maps notebook we created a map of Wisconsin's counties. We learned not only how to plot shape files, but where to find the shape files.

Now let's color the map to reflect voting patterns in the 2020 presidential election.

The steps:

  1. Plot the county borders
  2. Merge data on votes with geographical data
  3. Color the map

We have already figured out the first step. The code below reproduces the map we finished with in the maps notebook.

1. Plot the county borders

To read the data

# Read in the counties shape file.
counties = geopandas.read_file('cb_2021_us_county_5m.zip')

# WI fips code is 55. Keep just the WI counties.
wi_counties = counties[counties['STATEFP']=='55'].copy()
wi_counties.head(2)

To make the plot

fig, gax = plt.subplots(figsize=(10,10))
wi_counties.plot(ax=gax, edgecolor='black', color = 'white')
plt.axis('off')
plt.show()
# Read in the counties shape file.
counties = geopandas.read_file('cb_2021_us_county_5m.zip')

# WI fips code is 55. Keep just the WI counties.
wi_counties = counties[counties['STATEFP']=='55'].copy()
wi_counties.head(2)
STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME NAMELSAD STUSPS STATE_NAME LSAD ALAND AWATER geometry
21 55 139 01581129 0500000US55139 55139 Winnebago Winnebago County WI Wisconsin 06 1125859770 372631716 POLYGON ((-88.88667 44.24262, -88.76620 44.243...
48 55 017 01581068 0500000US55017 55017 Chippewa Chippewa County WI Wisconsin 06 2611628507 85233303 POLYGON ((-91.66565 45.20799, -91.54223 45.206...
fig, gax = plt.subplots(figsize=(10,10))
wi_counties.plot(ax=gax, edgecolor='black', color = 'white')
plt.axis('off')
plt.show()

png

2. Get the vote totals and merge them

Time to add the voter totals. I downloaded the results from the Wisconsin Elections Commission website. The file is 'County by County Report all offices.xlsx'. Go ahead and open up the file. It's a mess!

I saved a cleaned-up version of the file to 'results.csv' which we can use to save the hassle with cleaning the data. For fun, you should load the raw data and try beating it into shape. That's what you normally would have to do...and, it's fun.

2(a). Load 'results.csv' to a DataFrame named results. Note, this is not a GeoDataFrame, this data doesn't have a geometery to it, it just has county names and vote counts.

Check your dtypes!

results = pd.read_csv('results.csv', thousands=',')
results.head()
county total biden trump
0 ADAMS 11818 4329 7362
1 ASHLAND 8757 4801 3841
2 BARRON 25346 9194 15803
3 BAYFIELD 10880 6147 4617
4 BROWN 144017 65511 75871
results.dtypes
county    object
total      int64
biden      int64
trump      int64
dtype: object

We need to standardize the county names to that we can merge the vote totals with the county shapefile. Let's make the county names 'title case'. We should also remove any extra whitespace. We know how to fix this up.

2(b). Convert the county names in results to title case. Strip any leading/trailing whitespace in the county names.

# First letter cap only
results['county'] = results['county'].str.title()
results['county'] = results['county'].str.strip()
results.head(1)
county total biden trump
0 Adams 11818 4329 7362

2(c). Convert the county names in wi_counties to title case. Strip any leading/trailing whitespace in the county names.

# Kill any leading or trailing whitespace
wi_counties['NAME']=wi_counties['NAME'].str.title()
wi_counties['NAME']=wi_counties['NAME'].str.strip()

2(d). Merge wi_counties and results. Use the indicator option in the merge, and check that everything matched up using .value_counts(). There are 72 counties in Wisconsin.

# Merge the two data sets. 
res_w_states = pd.merge(left=wi_counties, right=results, left_on='NAME', right_on='county', how='outer', indicator=True)
res_w_states.head(2)
STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME NAMELSAD STUSPS STATE_NAME LSAD ALAND AWATER geometry county total biden trump _merge
0 55 139 01581129 0500000US55139 55139 Winnebago Winnebago County WI Wisconsin 06 1125859770 372631716 POLYGON ((-88.88667 44.24262, -88.76620 44.243... Winnebago 94032 44060 47796 both
1 55 017 01581068 0500000US55017 55017 Chippewa Chippewa County WI Wisconsin 06 2611628507 85233303 POLYGON ((-91.66565 45.20799, -91.54223 45.206... Chippewa 35938 13983 21317 both
res_w_states['_merge'].value_counts()
both          72
left_only      0
right_only     0
Name: _merge, dtype: int64

2(e). Create a variable called 'trump_share' that is the share of trump votes out of the total vote count.

res_w_states['trump_share'] =res_w_states['trump']/res_w_states['total']
res_w_states.head(2)
STATEFP COUNTYFP COUNTYNS AFFGEOID GEOID NAME NAMELSAD STUSPS STATE_NAME LSAD ALAND AWATER geometry county total biden trump _merge trump_share
0 55 139 01581129 0500000US55139 55139 Winnebago Winnebago County WI Wisconsin 06 1125859770 372631716 POLYGON ((-88.88667 44.24262, -88.76620 44.243... Winnebago 94032 44060 47796 both 0.508295
1 55 017 01581068 0500000US55017 55017 Chippewa Chippewa County WI Wisconsin 06 2611628507 85233303 POLYGON ((-91.66565 45.20799, -91.54223 45.206... Chippewa 35938 13983 21317 both 0.593160

4. Color the map

Create a chloropeth map with colors that correspond to Trump's share of the vote.

Remember the extra arguments.

  1. column is set to the column name of the data we want to be 'colored'
  2. cmap determines the color scheme. Try 'Reds' or 'bwr' or 'seismic'.
  3. legend turn on the legend

If you use a diverging color map, like 'bwr,' you might want to specify the min and max values for the legend. (Why?)

fig, gax = plt.subplots(figsize = (10,8))

# Plot the counties and pass 'trump_share' as the data to color
res_w_states.plot(ax=gax, edgecolor='black', column='trump_share',  legend=True, cmap='bwr', vmin=0, vmax=1)

plt.axis('off')

plt.show()

png

  1. Try changing the edge color of the counties to white.
  2. Add a title to your legend. Use the legend_kwds={'label': 'Legend title text'} option of the plot command. (docs).
fig, gax = plt.subplots(figsize = (10,8))

# Plot the counties and pass 'trump_share' as the data to color
res_w_states.plot(ax=gax, edgecolor='white', column='trump_share',  legend=True, cmap='bwr',
                 legend_kwds={'label':'Republican vote share'}, vmin=0, vmax=1)

plt.axis('off')

plt.show()

png

Finish early?

  1. Use the 'cities_4269' file on the course website and add Kenosha, Green Bay, Madison, and Milwaukee to your map. Try a white color and a black outline for the circle. Experiment with different combinations of colors and edge colors. What works best?
  2. Label the cities.
cities = geopandas.read_file('cities_4269.zip')
cities.head(1)
city rank state growth_fro population geometry
0 West Allis 582.0 Wisconsin -0.6 60697.0 POINT (-88.00703 43.01668)
fig, gax = plt.subplots(figsize = (10,8))

# Plot the counties and pass 'trump_share' as the data to color
res_w_states.plot(ax=gax, edgecolor='white', column='trump_share',  legend=True, cmap='bwr',
                  legend_kwds={'label':'Republican vote share'}, vmin=0, vmax=1)


cities_for_labels = cities[cities['city'].isin(['Madison', 'Green Bay', 'Milwaukee', 'Kenosha'])]

# Plot the points
cities_for_labels.plot(ax=gax, edgecolor='black', color='white')

# Label the points
for x, y, label in zip(cities_for_labels['geometry'].x, cities_for_labels['geometry'].y, cities_for_labels['city']):
    gax.annotate(label, xy=(x,y), xytext=(3,3), textcoords='offset points')

plt.axis('off')

plt.show()

png

Even more!

The labels on the map on cnn have a white "shadow" around them. It's not too hard to add this to our labels. This technique works for any matplotlib text (and other objects, too). See this documentation for path effects if you are interested in learning more.

  1. Start by importing the patheffects module from matplotlib. This gives us some more tools.
import matplotlib.patheffects as PathEffects
  1. The patheffects option in .annotate() takes a list of values. Try this:
.annotate(label, xy=(x,y), xytext=(3,3), textcoords='offset points',  
          path_effects=[PathEffects.withStroke(linewidth=3, foreground='white')]
          )
Experiment with different line widths, colors...
import matplotlib.patheffects as PathEffects
fig, gax = plt.subplots(figsize = (10,8))

# Plot the counties and pass 'trump_share' as the data to color
res_w_states.plot(ax=gax, edgecolor='white', column='trump_share',  legend=True, cmap='bwr',
                  legend_kwds={'label':'Republican vote share'}, vmin=0, vmax=1)


cities_for_labels = cities[cities['city'].isin(['Madison', 'Green Bay', 'Milwaukee', 'Kenosha'])]

# Plot the points
cities_for_labels.plot(ax=gax, edgecolor='black', color='white')

# Label the points
for x, y, label in zip(cities_for_labels['geometry'].x, cities_for_labels['geometry'].y, cities_for_labels['city']):
    gax.annotate(label, xy=(x,y), xytext=(3,3), textcoords='offset points',  
                 path_effects=[PathEffects.withStroke(linewidth=5, foreground='white')]
                 )

plt.axis('off')

plt.show()

png