download notebook
view notebook w/ solutions

Exam 2 Fall 2023 (100 pts total)

files needed = ('fish.csv', 'room_char.xlsx', 'cpi.csv', 'cb_2022_55_tract_500k.zip', 'wdi.csv'), which can be found in exam2_data.zip

You have 75 minutes to complete this exam.

Answer all the questions below in this notebook. You should insert as many cells into the notebook as you need. When you are finished, upload your finished notebook to Canvas.

You may use your notes and the Internet.
You cannot work with others on the exam. You cannot post questions online and solicit answers, e.g., through Chegg or chat GPT.
Import any packages you need to complete this exam.
Do not modify the data files directly. All data manipulation should happen in your code.

Question 0 (5 pts): Last, First

Replace 'Last, First' above with your actual name. Enter it as: last name, first name.

Question 1 (15 points):

Below are two DataFrames, X and Y. Insert a markdown cell below this one, and use the DataFrames to answer the following questions. You do not need to write any code for this question.

How many rows and columns does Z have after running the following code?

Z = pd.merge(left=X, right=Y, on='id', how='inner')

How many rows and columns does Z have after running the following code? How many missing values are in Z['price']?

Z = pd.merge(left=X, right=Y, on='id', how='right')

DataFrame X

id	qty
pear	2
orange	13
Banana	89
apple	1

DataFrame Y

id	price
orange	0.5
pear	1.2
Bannana	0.1
apple	1.25
kiwi	1.75

Question 2 (10 points): MultiIndex

Load the file fish.csv. Set up the indexes so that:
- The column index should be a multiIndex with 'Family' at level 0 and 'Type' at level 1
- The row index should be a multiIndex with date at level 0 and state at level 1
Print out a DataFrame that contains only the observations from 2023

Question 3 (20 points):

Load the file room_char.xlsx
Create a DataFrame that holds the number of rooms with windows and without windows for each building
- The row index should be a multiIndex with BLDG at level 0 and WINDOWS at level 1
- There is only one column, which holds the room counts. Name the column ROOM COUNT
Print out your DataFrame

For example, the first two rows of this DataFrame would be


CHAMBERLIN	NO	7
CHAMBERLIN	YES	5

Question 4 (15 points): Time series

Load the file cpi.csv. It contains data on the consumer price index.
Create a new DataFrame containing the CPI at a quarterly frequency
Plot the quarterly CPI. Make your figure graphically excellent.
Add a vertical line to your plot at August 1, 2020

Question 5 (15 points):

The file cb_2022_55_tract_500k.zip contains the shapefile for the census tracts in Wisconsin, downloaded from here.

Create a map of only the census tracts in Dane County. Use a white background and black edges.
Lake Mendota is tract 9917.02 and Lake Monona is tract 9917.03. Color these two tracts skyblue.

Question 6 (10 points):

Use the code below to create a DataFrame

ads = pd.DataFrame({'address':['1180 Observatory Dr.;Madison;WI;53706', '1220 Linden Dr.;Madison;WI;53706']})

Extract the street, city, state, and zip out of the address variable and save them in columns named named Street, City, State, and Zip.

The first row should look like:

address	Street	City	State	Zip
1180 Observatory Dr.;Madison;WI;53706	1180 Observatory Dr.	Madison	WI	53706

Question 7 (10 points):

The file wdi.csv contains a panel of data. It covers the United States and Germany from 1998 to 2022. For each country and year, it measures service exports and GDP.

Insert a markdown cell below this one and write which part of the data should be columns and which should be rows in order to make the data "tidy."

Insert a code cell below your answer from part 1 and

Load the wdi.csv file into a DataFrame.
Reshape the DataFrame to match your answer from part 1.
Sort your row index and print out your DataFrame.