download notebook
view notebook w/ solutions
Interactive figures with plotly
Files needed = us_agg_data.csv
We have been using maplotlib (and packages built on top of matplotlib) to create plots. matplotlib creates static plots — they can only tell the reader what's shown on the screen.
Today, we will introduce interactive visualizations. Interactive visualizations are great because we can include more information in the plots without the figure becoming messy. Readers choose when they want to see more details by by hovering with the mouse.
There are many packages that create interactive figures. We will use package plotly in this notebook. plotly is developed by a company which makes interactive web-based visualizations and web-applications—also called plotly. If you are interested in learning to make more complex interactive figures, you should google "dash python figures." Dash is another package that allows you to create interactive web apps.
We install plotly just like any other package. Open an anaconda prompt and type:
pip install --user plotly==5.6.0
...but, this takes a long time (like, 25 minutes long) to install. So start the install, go for a long walk, and come back ready to work.
import pandas as pd
import datetime as dt
# The new stuff
import plotly.express as px # high-level plotly module
import plotly.graph_objects as go # low-level plotly module with more functions
# If youy are using google colab
from google.colab import drive
drive.mount('/content/drive')
1. Interactive line plots
Let's start with QOQ ('quarter over quarter') growth in U.S. gross domestic product. The file 'us_agg_data.csv' contains data on the growth rate of gdp (%), the unemployment rate (%), the number of employed persons (thousands), and the government deficit as a share of gdp (%).
us = pd.read_csv('/content/drive/MyDrive/Data Class/7-interactive/us_agg_data.csv', index_col='DATE')
us.head()
The basics
The syntax is
fig_gdp = px.line(gdp, x=gdp.index, y='GDP', title='Percent Change in GDP (Previous Quarter)')
We pass px.line()
- the DataFrame/Series (gdp)
- the x variable (gdp.index)
- the y variable ('GDP')
- other options (e.g., the title)
We get back fig_gdp
which is a plotly figure object. We call fig_gdp.show()
to, well, show the figure.
gdp = us['GDP']
# Some of the index frequency is monthly and the GDP data are quarterly so drop observations with "nan".
gdp = gdp.dropna()
fig_gdp = px.line(gdp, x=gdp.index, y='GDP', title='Percent Change in GDP (Previous Quarter)')
fig_gdp.show()
Hover over the figure. What happens?
A 'trace' is a layer that sits on top of the plot and, in this case, is dipslayed when the cursor hovers over the line. We can modify the information displayed in this trace using the .update_traces()
method of the figure object and the hovertemplate
argument.
fig_gdp.update_traces(hovertemplate='Percent Change in GDP: %{y:.1f}')
We are instructing the computer to display the words "Percent Change in GDP:" and then the y-value (i.e., "GDP"). This bit of the string has a lot in it:
%{y:.1f}
The y
indicates that we want the y-value printed (instead of the x value). The .1f
is our usual string format syntax: output the value as a fixed-width number with one place to the right of the decimal.
fig_gdp.update_traces(hovertemplate='Percent Change in GDP: %{y:.1f}')
fig_gdp.show()
As usual, the defaults are awful. I do not need a background color. The gridlines are not necessary if the reader can hover the cursor and learn the value exactly! Let's emphasize the zero value so the reader can see when growth is negative.
- Change background color to white
- Center the title and change the font size
- Label the x axis and remove the gridlines
- Label the y axis; remove the grid line; show the zero line; format the zero line
This operates a bit differently than matplotlib. We do not have to recreate the graph. Instead, we just need to update it using .update_layout()
and then call .show()
again. (This was true with .update_traces()
, too.) The .update_layout()
method has a lot of options.
# Update the look of the figure
# 'x unified' draws a dashed line down to the x-axis which gives the user perspective
fig_gdp.update_layout(
hovermode='x unified',
plot_bgcolor='white',
title={'x':0.5, 'xanchor':'center', 'font':{'size': 24}},
xaxis={'title':'Year', 'showgrid':False},
yaxis={
'title':'Change in Quarterly GDP (Percent)',
'showgrid':False,
'zeroline':True,
'zerolinecolor':'LightGray'
}
)
fig_gdp.show()
Now that we have eliminated the background color, we can see that there are no spines on the x and y axis. Some people like this! It is the extreme elimination of clutter. I am not quite there yet...
Again, we just call the update functions. In this case, the .update_xaxes()
and .update_yaxes()
methods. Again, there are lots of options.
# Change format of x and y axes
fig_gdp.update_xaxes(showline=True, linewidth=1, linecolor='black')
fig_gdp.update_yaxes(showline=True, linewidth=1, linecolor='black')
Of course, I would do all of this at once, normally. Here is the whole thing.
fig_gdp = px.line(gdp, x=gdp.index, y='GDP', title='Percent Change in GDP (Previous Quarter)')
fig_gdp.update_traces(hovertemplate='Percent Change in GDP: %{y:.1f}')
fig_gdp.update_layout(hovermode='x unified',
plot_bgcolor='white',
title={'x':0.5, 'xanchor':'center', 'font':{'size': 24}},
xaxis={'title':"Year", 'showgrid':False},
yaxis={
'title':"Change in Quarterly GDP (Percent)",
'showgrid':False,
'zeroline':True,
'zerolinecolor':'LightGray',
}
)
fig_gdp.update_xaxes(showline=True, linewidth=1, linecolor='black')
fig_gdp.update_yaxes(showline=True, linewidth=1, linecolor='black')
fig_gdp.show(config={'displayModeBar': False})
Practice
- See the icons in the upper right corner of the figure above?
- Try zooming in and out
- Pan the figure
- Reset the axes to start over
Now change the show command to
fig_gdp.show(config={'displayModeBar': False})
What happened?
- Try changing the
hovermode
to- 'x' instead of 'x unified'
- 'y' or 'y unified'
How does the hover property change? Which seems most useful in this case?
-
Copy the code from the GDP figure into a cell below. Modify it to plot the unemployment rate (which is in the
us
dataframe).- Change the hovertemplate text to reflect that this is the unemployment rate.
- Try adding
'tickformat':'.1f'
to the yaxis dict. What does this do? - Try adding
'dtick':'M48'
to the xaxis dict. What does this do? - Use the
fig.add_annotation()
method to add an annotation for the covid pandemic. Note that the x-axis data are datetime variables. The documentation is pretty hard to find. Here is the syntax:python fig.add_annotation(x=x coord, y=y coord, text=text to add to figure)
Interactive bar plots
The above line graphs are based on quarterly data so there are a lot of data to work with. Let's look at the federal budget surplus/deficit, which is an annual statistic. Using a line graph would be misleading because it draws the viewer to conclude there are more observations than there actually are. Let's try a bar plot, instead.
The syntax is similar, except that we want a bar, not a line plot.
fig_budget = px.bar()
gdef = us['Deficit']
gdef = gdef.dropna()
# The plot
fig_budget = px.bar(gdef, x=gdef.index, y='Deficit',
title="Federal budget surplus (+) or deficit (-) as percent of GDP")
# Change the color of the bars. Opacity works like alpha in matplotlib.
fig_budget.update_traces(marker_color='lightskyblue',
marker_line_color='black',
marker_line_width=1, opacity=0.5)
fig_budget.update_traces(hovertemplate='Surplus (+), Deficit (-): %{y:.1f}')
# Update the look of the figure
fig_budget.update_layout(hovermode="x unified", plot_bgcolor='white',
title={'x':0.5, 'xanchor':'center', 'font':{'size': 18}},
xaxis={'title':'', 'showgrid':False, 'dtick':'M12'},
yaxis={'title':'',
'showgrid':False,
'zeroline':True,
'zerolinecolor':'Black',
'tickformat':'.1f'}
)
fig_budget.update_xaxes(showline=True, linewidth=1, linecolor='black')
fig_budget.update_yaxes(showline=True, linewidth=1, linecolor='black')
fig_budget.show()
The x-axis is too cramped and we should try to avoid making the reader tilt their head to read the figure. We can modify the intervals between the x-axis labels using dtick
.
Let's add the values of the bars to the figure, too. Another call to .update_traces()
.
gdef = us['Deficit']
gdef = gdef.dropna()
fig_budget = px.bar(gdef, x=gdef.index, y='Deficit', text='Deficit',
title="Federal Budget Surplus (+) or Deficit (-) as Percent of GDP")
fig_budget.update_traces(marker_color='lightskyblue', marker_line_color='lightskyblue',
marker_line_width=0, opacity=1)
fig_budget.update_traces(hovertemplate='Surplus (+), Deficit (-): %{y:.1f}') # Hover text
# Put values on top of each bar
fig_budget.update_traces(texttemplate='%{text:.1f}', textposition='inside',
textfont=dict(
size=14,
color='white'
)
)
# Update the look of the figure
# "x unified" draws a dashed line down to the x-axis which gives the user perspective
fig_budget.update_layout(hovermode='x unified',plot_bgcolor='white',
title={'x':0.5,'xanchor': 'center','font':{'size': 18}},
xaxis=dict(
title='',
showgrid=False,
dtick= 'M48'
),
yaxis=dict(
title='',
showgrid=False,
zeroline=True,
zerolinecolor='gray',
tickformat = '.1f',
ticksuffix = '%'
))
fig_budget.update_xaxes(showline=True, linewidth=1, linecolor='black')
fig_budget.update_yaxes(showline=True, linewidth=1, linecolor='black')
Convert figure to HTML
Interactive figures only work when they can be displayed in a medium where they can be interacted with (duh). So that rules out printing these to pdfs or svgs.
What we need is HTML that we can incorporate into a webpage.
We use the .write_html()
method of the plotly.io
module.
import plotly.io as pio
pio.write_html(fig_budget,
file='/content/drive/MyDrive/Data Class/7-interactive/fig_budget.html',
full_html = True,
auto_open=False,
config={'displayModeBar': False, 'showTips': False, 'responsive': True}
)
In the above code we're using the write_html
method to convert the figure to an html file.
full_html
indicates whether the html file should be written as an independent file (i.e., True) complete with headers or just as a container<div>
(i.e., False) that you could incorporate into a webpage.auto_open
tells the computer whether it should open the html file when complete.full_html
needs to be set to True for this to work.config
includes the commands to turn off some of the interactions like the menu bar.
Open your current working directory and open up the file "fig_budget.html" to reveal a working webpage of your figure. You could then export the source code to your own webpage to share.
from google.colab import drive
drive.mount('/content/drive')
Maps
We'll make a choropleth. I'm following the example from the plotly documentation.
Maps in plotly work differently than in geopandas. The biggest difference is the required map data. geopandas used shapefiles, which worked well with DataFrames. Plotly uses geojson.
from urllib.request import urlopen
import json
res = urlopen('https://raw.githubusercontent.com/plotly/datasets/master/geojson-counties-fips.json')
counties = json.load(res)
A json file is basically a dict. 'features' are the unit of observation. In this case, a county.
counties['features']
We will plot the county-level unemployment data. I'll get the data from the plotly repo, but we could have gotten it from BLS or FRED, too. This is just a regular DataFrame.
df = pd.read_csv('https://raw.githubusercontent.com/plotly/datasets/master/fips-unemp-16.csv', dtype={'fips': str})
df.head(2)
Plot the map
The basic syntax:
fig = px.choropleth(df, geojson=counties, locations='fips', color='unemp', color_continuous_scale='reds', scope='usa')
df
is the DataFrame with the quantitative datageojson
is the geojson object with geometry data to be plotted on the basemaplocations
defines the column in the DataFrame that holds the identifying variable. In our case, the county ids.color
is the column in the DataFrame with the quantitative datacolor_continuous_scale
specifies the color palette. Plotly has built-in color maps.- plotly has its own basemap of the world. We choose which parts to display using
scope
. Try commenting it out in the code below and rerun.
fig = px.choropleth(df, geojson=counties, locations='fips', color='unemp',
color_continuous_scale='reds',
range_color=(0, 12),
scope='usa',
labels={'unemp':'Unemployment rate'} # label for the legend
)
fig.show()
counties
Practice: Wisconsin
Let's isolate Wisconsin in our unemployment map. The steps
-
Create a geojson with only the WI counties. We will filter the
counties
file we have been working with. -
Plot the choropleth
-
Hide the other states and zoom in on Wisconsin.
1. Create the geojson file
We will remove all states but Wisconsin from the geojson file counties
.
Start by inspecting the structure of counties. It is basically a dictionary with keys 'type'
and 'features'
.
The value associated with 'features'
is a list. Each item in the list is a dict representing a county.
We will keep the ones with 'STATE'=='55'.
Try out the code below. How does it work?
wicounties = counties
temp = []
for i in wicounties['features']:
if i['properties']['STATE']=='55':
temp.append(i)
wicounties['features']= temp
This took more work than I imagined. My solution is based on https://stackoverflow.com/a/70953644
2. Map it
Starting from the code we used for our country-level map, plot the new geojson file. The rest of the code is the same, just change the code to reflect the new geojason file we created.
3. Hide everything else
A. Add the following code below to your code from part 2. What does fitbounds do? What does visible do?
fig.update_geos(fitbounds="locations", visible=False)
B. Add this code, too. What does it do?
fig.update_layout(margin={"r":0,"t":0,"l":0,"b":0})
You can learn more about this stuff in the docs.
Finish early? Try modifying the hover template to display the county name. Try changing the CRS...I couldn't figure it out.