download notebook
view notebook w/ solutions

Exam 1 (100 pts total)

files needed = ('GFDEGDQ188S.csv', 'Property_Tax_Roll.csv', 'go-by-industry-82-97.csv'), which can be found in exam1_data.zip

You have 75 minutes to complete this exam.

Answer all the questions below in this notebook. You should insert as many cells into the notebook as you need. When you are finished, upload your finished notebook to Canvas.

You may use your notes and the Internet, but you cannot work with others.
Import any packages you need to complete this exam.
Do not modify the data files directly. All data manipulation should happen in your code.

Remember, jupyter notebooks and python have lots of built in help facilities.

Question 0 (5 pts): Last, First

Replace 'Last, First' above with your actual name. Enter it as: last name, first name.

Question 1 (10 pts): Visualization

Go to this link. It is the UW's covid dashboard.

Find the figure titled "Positive PCR Test Results on Campus By Day" and look at the "UW-Madison" tab.

Insert a markdown cell below, and answer the following questions.

What is the message you take away from this figure?
What audience is this figure created for? Is the figure appropriate for this audience? Why or why not?
Is the figure well-suited for the medium in which it is presented? Why or why not?

Question 2 (20 pts): Functions and flow control

Write a function named top_5_average that takes one argument: a list of numbers. Your function should do two things.

Check that the variable passed to the function is of the type list. If it is not a list, print out "The input is not a list." You do not need to check that the list is only made up of numbers.
If the variable passed to the function is a list, the function should return the average value of the five largest elements of the list. For y1, it would be the average of the numbers: 98, 124, 1632, 8715, 9815.

Test your code on these lists of numbers:

y1 = [2, 65, 8715, 12.5, 124, 77, 45.23, 1632, 0, 98, 9815]
y2 = [26, 48, 123.89, 78, 5894, 3654, 59, 12.7, 8994]

and report your answers as

"The 5-element average maximum for y1 is ????.??"

"The 5-element average maximum for y2 is ????.??"

Replace the ????.?? with the average of the five largest elements. Note the two numbers to the right of the decimal point.

Question 3 (20 pts): Selecting data from a DataFrame

Load the file 'Property_Tax_Roll.csv'. It contains property tax information for Madison properties.

Use python and pandas to answer the following questions.

How many properties have total tax growth (PctTaxChangeTotal) of more than 20 percent? Print the answer as

"There are ?,??? properties with a greater than 20 percent increase in property tax." (Note the comma in the reported number, which is an integer.)

Create a new DataFrame that contains only the parcels: 60801103172, 71010108118, and 81026423137 and only the columns address, TotalAssessedValue, and EstFairMkt.

Print out the DataFrame.

Question 4 (10 pts): Loading messy data

The file 'go-by-industry-82-97.csv' contains annual gross output by industry.

Load the file into a DataFrame. Print out only the first two rows from your DataFrame.
Print out:

"The private industries gross output average is ????.? billion dollars."

Replace the ????.? with the average (over all the years) of gross output for the "Private industries" sector.

Note the single number to the right of the decimal point.

Question 5 (30 pts): Plotting

The file 'GFDEGDQ188S.csv' contains the U.S. debt-gdp ratio, in precent. Create a line plot with the date on the x-axis and the debt-gdp ratio on the y-axis.

The figure size should be 12 inches wide and 8 inches tall.
The line should be black, solid, and have a width of 3.
Add a vertical line at 2020. Make the line black and dashed.
Next to the vertical line, place the text: 'Pandemic onset'.

Make any further adjustments you find neccessary.

Question 6 (5 pts): Calculating in DataFrames

The code below works—it will compute the total of the column y—but it is inefficient.

Insert a markdown cell below and, in a sentence or two, describe why the code is not efficient.

df = pd.DataFrame({'x':[100, 200, 300], 'y':[1,2,3]})

ytot = 0
for i in range(0,3):
    ytot = ytot + df.iloc[i, 1]

You are finished!

Upload your completed notebook to Canvas.