Fifteen-minute Friday #7
files needed = 'industrial_production.xlsx', 'VIXCLSD.csv', 'osk.csv'
Fifteen-minute Fridays are brief, nongraded workbooks that provide some extra practice and introduce new topics for self-guided study. This week we are working on
- Exporting figures and using them in MS Word
- Rolling-window calculations
The solutions to the practice problems are at the end of the notebook.
import pandas as pd
import matplotlib.pyplot as plt
Adding figures to MS Word.
Part of your final project is an executive summary formatted as a PDF. A natural way to create the PDF is to create an MS Word document and export it as a PDF at the end.
This means you will need to get your figures from a Jupyter notebook to an MS Word document. There are many ways to do this. The best way to do this is to first save your figure from inside your notebook, and then add it to your Word document.
Do not use a screenshot or the snipping tool to take a picture of the figure in your notebook and paste it into your document. This will result in blurry figures that can be difficult to read.
ind_prod = pd.read_excel('industrial_production.xlsx', sheet_name='Quarterly', index_col=0)
ind_prod = ind_prod.rename(columns={'IPB51000SQ':'ind prod consumer'})
ind_prod.head(2)
Let's plot consumer products production. We will save the figure in two formats.
The first is PNG, which is a raster graphics format. A raster graphic is essentially made up of many tiny dots.
The second is SVG, which is a scalable vector graphics format. The image is essentially a set of equations that describe the shape.
matplotlib handles both these formats easily.
fig, ax = plt.subplots(figsize=(5,3))
ax.plot(ind_prod.index, ind_prod['ind prod consumer'], color='black')
ax.set_ylabel('index (2012=100)')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.savefig('ind_prod.svg')
plt.savefig('ind_prod.png')
plt.show()
No title?
Notice that I did not add a title to the figure. This seems like malpractice.
I did this because I will add the title in MS Word. The title is integrated into the caption and figure number object in Word.
Practice: Adding figures to Word
-
Open a new MS Word document.
-
Add the two figures we created in cell above. To add a figure to Word, use the
Insert
tab at the top of the document and choosePictures
. -
Add a caption. Right click on the figure and choose
Insert caption
. Set theLabel
field toFigure
. I like my captions above the figure, but you can put them below if you choose. Make your captions consistent in their appearance and location.
The importance of vector graphics
- Zoom in on your document. The zoom control is in the lower-right corner. Do you see how blurry the PNG figure has become? The SVG figure scales smoothly with the figure's size.
Be sure to use SVG files when creating your project documents.
Rolling window calculations
Once we have set up our time series data with a datetime index, pandas can take care of the details. We have already seen how to resample the data, slice with dates, and use dates in plots. Very nice. Here, we take a look at rolling-window calculations.
Suppose we have yearly data from 1800 to 2000. A ten-year rolling window would create 190 windows (200 years of data minus 10) of the form:
1800-1810, 1801-1811, 1802-1812, . . ., 1988-1998, 1989-1999, 1990-2000.
We can then compute a statistics for each window. For example, we can compute the mean of each window and have 190 observations — the moving or rolling average. We often use moving averages to extract trends from noisy data.
Let's use the vix data we used in class. I'm using the parse_dates
option to tell pandas to try and convert the index to a DatetimeIndex, rather than do the conversion myself.
vix = pd.read_csv('VIXCLSD-1.csv', na_values='.', index_col=0, parse_dates=True)
print(vix.head(), '\n\n', vix.shape)
print('\n\n', vix.index)
The rolling()
method of DataFrame allows methods like mean()
and var()
to be calculated over a rolling window of the data.
Calling
vix['VIXCLS'].rolling(30)
generates a Series of rolling observations. In this case, I am asking for a 30-day window. We can directly call a method like mean()
or var()
to compute rolling calculations.
vix['VIXCLS'].rolling(30).mean()
Notice in the code below that I have to strip out the missing values using dropna()
. Otherwise, rolling()
will only work on the complete 30-day windows.
# Drop missing observations and create the moving average
vix['vix_ma'] = vix['VIXCLS'].dropna().rolling(30).mean()
# Plot it
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(vix.index, vix['VIXCLS'], color = 'red', label = 'daily close', alpha=0.25)
ax.plot(vix.index, vix['vix_ma'], color = 'blue', label = '30-day moving average')
ax.set_title('CBOE Volatility Index: VIX')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.legend(frameon=False)
plt.show()
Note that this figure looks similar to the resampled data figures we made in class, but the two are conceptually different. Resampling to, say, the monthly levels returns the mean computed at each non-overlapping month in the data. The rolling window calculation is computing overlapping 30 day means.
Practice: Rolling windows
We did the first three parts of this practice in class. You can copy your solutions over, but trying this again will make you a better programmer.
-
Read the file 'osk.csv' into a DataFrame. The file contains daily closing prices for OshKosh Corp. and the S&P 500.
-
Set the index to the date. Make sure the dates are datetime objects!
-
Normalize each series (osk, sp500) by dividing every observation by the value at 01-02-2018. Try
prices['osk_n'] = prices['osk'] / prices.loc['2018-01-02,','osk']
and something similar for the S&P 500.
-
Compute and plot the 90-day rolling standard deviation of the Osh Kosh and S&P 500 normalized daily prices.
-
Plot the two standard deviation series again, but only for the first half of the year (Jan 2018 through June 2018). Use a slice on the index to do it.
-
Do the x-axis labels look cluttered? Try
fig.autofmt_xdate()
and let matplotlib try to help.
Adding files to MS Word: Solutions
You can find my word document on the course webpage. It is named 'inserting_figures.docx'.
Rolling windows: Solutions
# Parts 1, 2, and 3
prices = pd.read_csv('osk.csv', index_col=0, parse_dates=True)
prices['osk_n'] = prices['osk'] / prices.loc['2018-01-02,','osk']
prices['sp500_n'] = prices['sp500'] / prices.loc['2018-01-02,','sp500']
prices.head(3)
# Part 4
prices['std_osk'] = prices['osk_n'].rolling(90).std()
prices['std_sp500'] = prices['sp500_n'].rolling(90).std()
fig, ax = plt.subplots(figsize=(10,5))
ax.plot(prices.index, prices['std_osk'], color = 'red', label = 'OSK: 90-day std')
ax.plot(prices.index, prices['std_sp500'], color = 'blue', label = 'S&P500: 90-day std')
ax.set_title('Rolling standard deviations')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.legend(frameon=False)
plt.show()
# Parts 5 and 6
fig, ax = plt.subplots(figsize=(10,5))
# Notice that I am not passing the x-variable explicitly. Matplotlib will default to the index.
ax.plot(prices['std_sp500'].loc['01-01-2018':'06-30-2018'], color = 'blue', label = 'S&P 500: 90-day std')
ax.plot(prices['std_osk'].loc['01-01-2018':'06-30-2018'], color = 'red', label = 'OSK: 90-day std')
ax.set_title('Rolling standard deviations')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.legend(frameon=False)
# Let matplotlib choose how to rotate the labels
fig.autofmt_xdate()
plt.show()