download notebook
view notebook w/ solutions
Exam 2 Fall 2023 (100 pts total)
files needed = ('fish.csv', 'room_char.xlsx', 'cpi.csv', 'cb_2022_55_tract_500k.zip', 'wdi.csv'), which can be found in exam2_data.zip
You have 75 minutes to complete this exam.
Answer all the questions below in this notebook. You should insert as many cells into the notebook as you need. When you are finished, upload your finished notebook to Canvas.
- You may use your notes and the Internet.
- You cannot work with others on the exam. You cannot post questions online and solicit answers, e.g., through Chegg or chat GPT.
- Import any packages you need to complete this exam.
- Do not modify the data files directly. All data manipulation should happen in your code.
Question 0 (5 pts): Last, First
Replace 'Last, First' above with your actual name. Enter it as: last name, first name.
Question 1 (15 points):
Below are two DataFrames, X and Y. Insert a markdown cell below this one, and use the DataFrames to answer the following questions. You do not need to write any code for this question.
- How many rows and columns does Z have after running the following code?
Z = pd.merge(left=X, right=Y, on='id', how='inner')
- How many rows and columns does Z have after running the following code? How many missing values are in
Z['price']
?
Z = pd.merge(left=X, right=Y, on='id', how='right')
DataFrame X
id | qty |
---|---|
pear | 2 |
orange | 13 |
Banana | 89 |
apple | 1 |
DataFrame Y
id | price |
---|---|
orange | 0.5 |
pear | 1.2 |
Bannana | 0.1 |
apple | 1.25 |
kiwi | 1.75 |
Question 2 (10 points): MultiIndex
- Load the file
fish.csv
. Set up the indexes so that:- The column index should be a multiIndex with 'Family' at level 0 and 'Type' at level 1
- The row index should be a multiIndex with date at level 0 and state at level 1
- Print out a DataFrame that contains only the observations from 2023
Question 3 (20 points):
- Load the file
room_char.xlsx
- Create a DataFrame that holds the number of rooms with windows and without windows for each building
- The row index should be a multiIndex with BLDG at level 0 and WINDOWS at level 1
- There is only one column, which holds the room counts. Name the column
ROOM COUNT
- Print out your DataFrame
For example, the first two rows of this DataFrame would be
CHAMBERLIN | NO | 7 |
CHAMBERLIN | YES | 5 |
Question 4 (15 points): Time series
- Load the file
cpi.csv
. It contains data on the consumer price index. - Create a new DataFrame containing the CPI at a quarterly frequency
- Plot the quarterly CPI. Make your figure graphically excellent.
- Add a vertical line to your plot at August 1, 2020
Question 5 (15 points):
The file cb_2022_55_tract_500k.zip
contains the shapefile for the census tracts in Wisconsin, downloaded from here.
- Create a map of only the census tracts in Dane County. Use a white background and black edges.
- Lake Mendota is tract 9917.02 and Lake Monona is tract 9917.03. Color these two tracts skyblue.
Question 6 (10 points):
- Use the code below to create a DataFrame
ads = pd.DataFrame({'address':['1180 Observatory Dr.;Madison;WI;53706', '1220 Linden Dr.;Madison;WI;53706']})
- Extract the street, city, state, and zip out of the address variable and save them in columns named named
Street
,City
,State
, andZip
.
The first row should look like:
address | Street | City | State | Zip |
---|---|---|---|---|
1180 Observatory Dr.;Madison;WI;53706 | 1180 Observatory Dr. | Madison | WI | 53706 |
Question 7 (10 points):
The file wdi.csv
contains a panel of data. It covers the United States and Germany from 1998 to 2022. For each country and year, it measures service exports and GDP.
- Insert a markdown cell below this one and write which part of the data should be columns and which should be rows in order to make the data "tidy."
Insert a code cell below your answer from part 1 and
-
Load the
wdi.csv
file into a DataFrame. -
Reshape the DataFrame to match your answer from part 1.
- Sort your row index and print out your DataFrame.