download notebook
view notebook w/o solutions

Exam 2 Fall 2023 (100 pts total)

files needed = ('fish.csv', 'room_char.xlsx', 'cpi.csv', 'cb_2022_55_tract_500k.zip', 'wdi.csv'), which can be found in exam2_data.zip

You have 75 minutes to complete this exam.

Answer all the questions below in this notebook. You should insert as many cells into the notebook as you need. When you are finished, upload your finished notebook to Canvas.

You may use your notes and the Internet.
You cannot work with others on the exam. You cannot post questions online and solicit answers, e.g., through Chegg or chat GPT.
Import any packages you need to complete this exam.
Do not modify the data files directly. All data manipulation should happen in your code.

Question 0 (5 pts): Last, First

Replace 'Last, First' above with your actual name. Enter it as: last name, first name.

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime as dt
import geopandas as gpd

C:\Users\kimru\anaconda3\envs\geo3\lib\site-packages\pyproj\__init__.py:91: UserWarning: Valid PROJ data directory not found. Either set the path using the environmental variable PROJ_LIB or with `pyproj.datadir.set_data_dir`.
  warnings.warn(str(err))

Question 1 (15 points):

Below are two DataFrames. Insert a markdown cell below this one, and use the DataFrames to answer the following questions. You do not need to write any code for this question.

How many rows and columns does Z have after running the following code?

z = pd.merge(left=X, right=Y, on='id', how='inner')

How many rows and columns does Z have after running the following code? How many missing values are in Z['price']?

z = pd.merge(left=X, right=Y, on='id', how='right')

DataFrame X

id	qty
pear	2
orange	13
Banana	89
apple	1

DataFrame Y

id	price
orange	0.5
pear	1.2
Bannana	0.1
apple	1.25
kiwi	1.75

columns = 3, rows = 3 (pear, orange, apple)
columns = 3, rows = 5; missing values in Z['price'] = 2 (unmatched Bannana and kiwi)

Question 2 (10 points): MultiIndex

Load the file fish.csv
The column index should be a multiIndex with 'Family' at level 0 and 'Type' at level 1
The row index should be a multiIndex with date at level 0 and state at level 1
Print out a DataFrame that contains only the observations from 2023

fish = pd.read_csv('fish.csv', header=[0,1], index_col=[0,1], parse_dates=[0])
fish.sort_index(inplace=True)

fish.loc['2023']

		Bass			Snapper
		Large mouth	Small mouth	Striped	Nothern red	Mutton	Lane	Mangrove red
2023-01-01	FL	1	0	0	0	0	0	3
2023-02-15	WI	0	0	0	0	0	0	0
2023-04-04	AK	0	0	0	0	0	0	0
2023-05-07	FL	0	2	0	0	0	0	0
2023-08-09	WI	0	0	0	0	0	0	0
2023-09-05	GA	4	0	0	0	0	0	0
2023-09-06	WI	2	0	0	0	0	0	0
2023-10-07	AL	0	0	0	0	0	0	0

Question 3 (20 points): Grouping

Load the file room_char.xlsx
Create a DataFrame named that holds the number of rooms with windows and without windows for each building
- The row index should have BLDG at level 0 and WINDOWS at level 1
- There is only one column, which holds the room counts. Name the column ROOM COUNT
Print out your DataFrame

For example, the first two rows of this DataFrame would be


CHAMBERLIN	NO	7
CHAMBERLIN	YES	5

room = pd.read_excel('room_char.xlsx')
room.head(2)

	BLDG	ROOM	CAPACITY	WINDOWS
0	SEWELL	1	3	YES
1	SEWELL	2	3	YES

room_counts = room[['BLDG', 'WINDOWS', 'ROOM']].groupby(['BLDG', 'WINDOWS']).count()
room_counts.rename(columns={'ROOM':'ROOM COUNT'}, inplace=True)
room_counts

		ROOM COUNT
BLDG	WINDOWS
CHAMBERLIN	NO	7
CHAMBERLIN	YES	5
SEWELL	NO	4
SEWELL	YES	6
VILAS	YES	4

Question 4 (15 points): Time series

Load the file cpi.csv. It contains data on the consumer price index.
Create a CPI variable at the quarterly frequency
Plot the quarterly CPI. Make your figure graphically excellent.
Add a vertical line to your plot at August 1, 2020

cpi = pd.read_csv('cpi.csv', parse_dates=['DATE'], index_col='DATE')
cpiq = cpi.resample('Q').mean()

fig, ax = plt.subplots()
ax.plot(cpiq.index, cpiq['CPIAUCSL'], color='black')
ax.axvline(x=dt.datetime(2020,8,1), color='black', ls='--')
ax.set_title('Consumer Price Index (1982-84 = 100)')
ax.set_ylabel('index')

sns.despine()

png

Question 5 (15 points):

The file cb_2022_55_tract_500k.zip contains the shapefile for the census tracts in Wisconsin, downloaded from here.

Create a map of only the census tracts in Dane County. Use a white background and black edges.
Lake Mendota is tract 9917.02 and Lake Monona is tract 9917.03. Color these two tracts skyblue.

dane = gpd.read_file('cb_2022_55_tract_500k.zip')

fig, ax = plt.subplots(figsize=(15,8))

dane[dane['NAMELSADCO']=='Dane County'].plot(color='white', edgecolor='black', ax=ax)
dane[dane['NAME'].isin(['9917.03','9917.02'])].plot(color='skyblue', edgecolor='black', ax=ax)

ax.set_title('Dane County census tracts')
ax.axis('off')
plt.show()

png

Question 6 (10 points):

Use the code below to create a DataFrame

ads = pd.DataFrame({'address':['1180 Observatory Dr.;Madison;WI;53706', '1220 Linden Dr.;Madison;WI;53706']})

Extract the street, city, state, and zip out of the address variable and save them in columns named named Street, City, State, and Zip.

The first row should look like:

address	Street	City	State	Zip
1180 Observatory Dr.;Madison;WI;53706	1180 Observatory Dr.	Madison	WI	53706

ads = pd.DataFrame({'address':['1180 Observatory Dr.;Madison;WI;53706', '1220 Linden Dr.;Madison;WI;53706']})

ads[['Street', 'City', 'State', 'Zip']] = ads['address'].str.split(';', expand=True)
ads

	address	Street	City	State	Zip
0	1180 Observatory Dr.;Madison;WI;53706	1180 Observatory Dr.	Madison	WI	53706
1	1220 Linden Dr.;Madison;WI;53706	1220 Linden Dr.	Madison	WI	53706

Question 7 (10 points):

The file wdi.csv contains a panel of data. It covers the United States and Germany from 1998 to 2022. For each country and year, it measures service exports and GDP.

Insert a markdown cell below this one and write which part of the data should be columns and which should be rows in order to make the data "tidy."

Insert a code cell below your answer from part 1 and

Load the wdi.csv file into a DataFrame.
Reshape the DataFrame to match your answer from part 1.
Sort your row index and print out your DataFrame.

Tidy data has observations as rows and variables as columns. In this case, I want the country and time to be the rows and service exports and gdp to be the columns.

wdi = pd.read_csv('wdi.csv', index_col=['Country Name', 'Country Code', 'Series Name', 'Series Code']).stack()
wdi.unstack(['Series Name', 'Series Code']).sort_index()

		Series Name	Commercial service exports (current US$)	GDP (current US$)
		Series Code	TX.VAL.SERV.CD.WT	NY.GDP.MKTP.CD
Country Name	Country Code
Germany	DEU	1998	7.716637e+10	2.238990e+12
		1999	7.648444e+10	2.194950e+12
		2000	7.785096e+10	1.947980e+12
		2001	7.843019e+10	1.945790e+12
		2002	9.281096e+10	2.078480e+12
		2003	1.127450e+11	2.501640e+12
		2004	1.438780e+11	2.814350e+12
		2005	1.563170e+11	2.846860e+12
		2006	1.782880e+11	2.994700e+12
		2007	2.080620e+11	3.425580e+12
		2008	2.392090e+11	3.745260e+12
		2009	2.216440e+11	3.411260e+12
		2010	2.259540e+11	3.399670e+12
		2011	2.505440e+11	3.749310e+12
		2012	2.508420e+11	3.527140e+12
		2013	2.713350e+11	3.733800e+12
		2014	2.979800e+11	3.889090e+12
		2015	2.758300e+11	3.357590e+12
		2016	2.883330e+11	3.469850e+12
		2017	3.173810e+11	3.690850e+12
		2018	3.511200e+11	3.974440e+12
		2019	3.610550e+11	3.888230e+12
		2020	3.266040e+11	3.889670e+12
		2021	4.013260e+11	4.259930e+12
		2022	4.226560e+11	4.072190e+12
United States	USA	1998	2.404860e+11	9.062820e+12
		1999	2.697780e+11	9.631170e+12
		2000	2.888670e+11	1.025090e+13
		2001	2.758420e+11	1.058190e+13
		2002	2.804080e+11	1.092910e+13
		2003	2.887030e+11	1.145640e+13
		2004	3.325520e+11	1.221720e+13
		2005	3.629070e+11	1.303920e+13
		2006	4.038650e+11	1.381560e+13
		2007	4.747370e+11	1.447420e+13
		2008	5.217060e+11	1.476990e+13
		2009	5.019200e+11	1.447810e+13
		2010	5.628350e+11	1.504900e+13
		2011	6.231990e+11	1.559970e+13
		2012	6.626740e+11	1.625400e+13
		2013	6.975640e+11	1.684320e+13
		2014	7.373560e+11	1.755070e+13
		2015	7.493160e+11	1.820600e+13
		2016	7.646530e+11	1.869510e+13
		2017	8.175490e+11	1.947730e+13
		2018	8.434140e+11	2.053310e+13
		2019	8.686460e+11	2.138100e+13
		2020	7.043200e+11	2.106050e+13
		2021	7.781880e+11	2.331510e+13
		2022	9.000040e+11	2.546270e+13