download notebook
view notebook w/o solutions
Exam 1 Fall 2022 (100 pts total)
files needed = ('interest.csv', 'dogs.xlsx', 'messy1.xlsx', 'messy2.xlsx', 'messy3.csv'), which can be found in exam1_data.zip
You have 75 minutes to complete this exam.
Answer all the questions below in this notebook. You should insert as many cells into the notebook as you need. When you are finished, upload your finished notebook to Canvas.
- You may use your notes and the Internet, but you cannot work with others.
- Import any packages you need to complete this exam.
- Do not modify the data files directly. All data manipulation should happen in your code.
Remember, jupyter notebooks and python have lots of built in help facilities.
import matplotlib.pyplot as plt
import pandas as pd
Question 0 (5 points): Last, First
Replace 'Last, First' above with your actual name. Enter it as: last name, first name.
Question 1 (30 points): Plotting
The file 'interest.csv' contains annual data on the average 30-year mortgage interest rate (MORTGAGE30US
) and the CPI inflation rate (CPIAUCSL_PC1
).
- Create a line plot with the date on the x-axis. Plot the mortgage rate in black and the inflation rate in red.
- The figure size should be 12 inches wide and 8 inches tall.
- Both lines should have weight 2.
- The tick labels on both axes should have font size 18.
- The x-axis should range from 1971 to 2022.
Make any further adjustments you find necessary.
interest = pd.read_csv('interest.csv', index_col='DATE')
print(interest.head(1), '\n')
print(interest.dtypes)
MORTGAGE30US CPIAUCSL_PC1
DATE
1948 NaN 7.67221
MORTGAGE30US float64
CPIAUCSL_PC1 float64
dtype: object
fig, ax = plt.subplots(figsize=(12,8))
ax.plot(interest.index, interest['MORTGAGE30US'], color='black', lw=2.0)
ax.plot(interest.index, interest['CPIAUCSL_PC1'], color='red', lw=2.0)
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.tick_params(axis='both', labelsize=18)
ax.set_ylabel('percent', fontsize=20)
ax.set_title('Mortgage rates and inflation', fontsize=20)
ax.text(1988, 12, '30-year mortgage rate', fontsize=18)
ax.text(1988, 0, 'CPI inflation rate', fontsize=18)
ax.set_xlim(1971, 2022)
plt.show()
Question 2 (5 points): Graphical excellence
Insert a markdown cell below this cell and type your answer to the following question. You do not need to write any code.
The figure you created above suggests that inflation and interest rates are related.
- I want my message to be "mortgage rates go up when interest goes up." How would you change the figure you created in question 1 to better communicate that message? Explain why.
Solution
There is room for more than one answer here. My answer:
I would create a scatter plot with the inflation rate on the x-axis and the interest rate on the y-axis. We want to show a positive correlation: When inflation rises, mortgage rates rise. The scatter plot will make is easier to see the positive correlation and will allow the reader to see the strength of the correlation.
Some students wrote about another good way:
- Compute the growth rates of the two variables and plot those.
- Use the lagged mortgage rate to understand the dynamic correlation.
- Compute the correlation or plot the regression line.
Question 3 (20 points): Working with DataFrames
The file 'dogs.xlsx' contains data about three dogs: Buster, Subee, and Jax. For each dog, the dataset records the number of snacks and walks each dog had on a given day.
Use the data to answer the following questions.
d = pd.read_excel('dogs.xlsx')
d.head(2)
dog | time | snacks | walks | |
---|---|---|---|---|
0 | Buster | 10/1/2018 | 4 | 2.0 |
1 | Buster | 10/2/2018 | 3 | 2.0 |
- What is the average number of snacks that Buster eats per day? Print the answer as
"Buster averages ?.? snacks per day."
Replace ?.? with the answer. Match the formatting of the number.
buster_snack_avg = d[d['dog']=='Buster']['snacks'].mean()
print('Buster averages {0:2.1f} snacks per day.'.format(buster_snack_avg))
Buster averages 3.6 snacks per day.
- Create a DataFrame that contains only the data for Buster and Jax.
How many columns and rows are in this DataFrame? Print the answer as
"The Buster and Jax DataFrame has ? columns and ?? rows."
Replace the ? and ?? with the answers.
jb_df = d[d['dog'].isin(['Buster', 'Jax'])]
jb_shape = jb_df.shape
print('The Buster and Jax DataFrame has {1} columns and {0} rows.'.format(jb_shape[0], jb_shape[1]))
# A better way is to use * to unpack the tuple returned by .shape.
#print('The Buster and Jax DataFrame has {1} columns and {0} rows.'.format(*jb_df.shape))
The Buster and Jax DataFrame has 4 columns and 11 rows.
Question 4 (15 points): Loading messy data
Let's load three datasets. Each one has something wrong with it.
- [Footer.] Load the file 'messy1.xlsx' into a DataFrame. Make sure that the variables (v1 through v8) are all of type
float64
. Print out the types of the variables.
pd.read_excel('messy1.xlsx', skipfooter=4).dtypes
v1 float64
v2 float64
v3 float64
v4 float64
v5 float64
v6 float64
v7 float64
v8 float64
dtype: object
- [Missing data.] Load the file 'messy2.xlsx' into a DataFrame. Make sure that the variables (v1 through v8) are all of type
float64
. Print out the types of the variables.
pd.read_excel('messy2.xlsx', na_values=['N/A', 'MISSING']).dtypes
v1 float64
v2 float64
v3 float64
v4 float64
v5 float64
v6 float64
v7 float64
v8 float64
dtype: object
- [Separator not a comma.] Load the file 'messy3.csv' into a DataFrame. While the suffix of the file is
.csv
, the data are not separated by commas. Use pd.read_csv() to load the data. Print out the shape of the DataFrame.
Hint: Open the file in a text editor such as notepad++ and check the delimiter type.
pd.read_csv('messy3.csv', sep=';').shape
(999, 8)
Question 5 (20 points): Slicing and types
For each value in the following list, extract the year (the 4 numeric digits) and store them in a list. Print out the list, sorted from smallest to largest.
strlist = ['yy2022yy', 'yy1980yy', 'yy1991yy', 'yy1793yy', 'yy1288yy']
The output should be [1288, 1793, 1980, 1991, 2022]
.
strlist = ['yy2022yy', 'yy1980yy', 'yy1991yy', 'yy1793yy', 'yy1288yy']
dates = []
# As a loop
for k in strlist:
dates.append( int(k[2:6]) )
# Or as a list comprehension
# dates = [int(k[2:6]) for k in strlist]
dates.sort()
dates
[1288, 1793, 1980, 1991, 2022]
Question 6 (5 points): Using error messages
The code below contains two errors. Correct them both so that the code runs without error.
df = pd.DataFrame({'names':['Bucky', 'Brutus', 'Sparty'], 'school':['UW', 'OSU', 'MSU']})
print( df.shape() )
df.drop('school', inplace=True)
print(df)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-11-05219d5cc2eb> in <module>
1 df = pd.DataFrame({'names':['Bucky', 'Brutus', 'Sparty'], 'school':['UW', 'OSU', 'MSU']})
----> 2 print( df.shape() )
3 df.drop('school', inplace=True)
4 print(df)
TypeError: 'tuple' object is not callable
# Solution
df = pd.DataFrame({'names':['Bucky', 'Brutus', 'Sparty'], 'school':['UW', 'OSU', 'MSU']})
print( df.shape ) # Shape is an attribute, not a method. No ().
df.drop('school', inplace=True, axis=1) # We need to search the columns, not the rows.
print(df)
(3, 2)
names
0 Bucky
1 Brutus
2 Sparty
You are finished!
Upload your completed notebook to Canvas.