Fundamentals of Data Analytics
for Economists

Fall 2023: Delivered in person.
Meeting: Mondays and Wednesdays 4:00-5:15 | Education L196

Professor: Kim J. Ruhl // ruhl2@wisc.edu
Office hours: Tu 9:00AM-10:00AM & Tu 3:30PM-4:30PM | 7444 Soc Sci

Teaching assistant: Satyen Pandita // spandita@wisc.edu
Office hours: M 3:00PM-4:00PM | 6413 Soc Sci

Teaching assistant: Mitchell Valdes Bobes // valdsbobes@wisc.edu
Office hours: Th 3:00PM-4:00PM | 7308 Soc Sci

Grades and announcements are managed through Canvas. Everything else happens here.


Weekly schedule

This week-by-week schedule will be constantly updated. Some topics may take longer than scheduled (and others may take less) but exam, coding practice, and project due dates will not change.

Reading: McKinney refers to the book Python for Data Analysis (Third edition) by Wes McKinney. It is available as an ebook here for free. There is a lot more in this book than we will cover, but it is a good reference.

File availability: Links to Jupyter notebooks will open a view of the notebook. At the top right of the page there is a link to download the notebook as an '.ipynb' file.

Videos: Class will not be recorded as long as instruction is in-person. If the University moves our class online, then videos will be available.

Discussion sections: The discussion sections will be facilitated by Satyen Pandita and Mitchell Valdes Bobes, the course TAs. The discussion sections provide time to ask questions, review previous assignments, and cover topics that are complementary to those we cover in the lectures. Discussion sections are mandatory and the topics covered in the sections will be found in exams and assignments.

Figure of the week: A mixture of good, bad, and ugly visualizations from the "real world." Take a few minutes and study the figure. What is it supposed to communicate? Could it be improved? Does it inspire other research questions? I have included some of my own thoughts on the figures, too. If you come across an interesting visualization, send it my way!


Week 0: September 6
Introduction // Using data to communicate ideas

Examples: gapminder // election results // Inflation

Reading:
Syllabus // AI policy // Welcome announcement
Installing and using winstat (SSCC) // My notes on winstat

Slides: introduction
Data: None

Discussion section: Introduction, running python

Due September 8 (midnight): [Student survey]
Follow the link to complete the survey. Then log into canvas and complete the assignment. Enter "survey completed" in the text box for the assignment.

Due September 8 (midnight): Install Winstat and log on
Follow the instructions here. Then log into canvas and complete the assignment.


Week 1: September 11 & 13
Winstat // Jupyter notebooks // Markdown
Python: Assignment // Calculation // Types // Strings

Reading:
How to cite code you learned about online
McKinney Ch. 2.2 parts: "Tab Completion," "Introspection"
Markdown cheatsheet
McKinney Ch. 2.3: "Indentation," "Everything is an object," "Comments," "Attributes and methods," "Binary operators," "Numeric types," "Strings"

Jupyter notebooks: survey // notebooks and markdown // python 1
Data: None

Discussion section: Good coding practices, strings

Figure of the week: Stacked area plot [my thoughts]


Week 2: September 18 & 20 [No class September 20, video available]
Python: Lists // Tuples // Dicts // More on types
Python: Loops // Conditionals

Reading:
McKinney Ch. 3.1 parts: (skim stuff on tuples, skip stuff on sets, pay close attention to slicing)

Jupyter notebooks: python 2 // python 3
Data: None

Discussion section: Conditional statements, floating point numbers

Figure of the week: Line plot (from this article) [my thoughts]


Week 3: September 25 & 27
Python: Slicing // Functions // Packages
Pandas: Series and DataFrames // Selecting data

Reading:
McKinney Ch 3.2 up to "Anonymous functions"
McKinney Ch 5 pgs up to "Arithmetic and data alignment

Jupyter notebooks: python 4 // pandas fundamentals
Data: None

Discussion section: Functions

Figure of the week: Scatter plot [my thoughts]

Due September 29 (midnight): Coding practice #1 // [solutions]
Submit through canvas.


Week 4: October 2 & 4
Pandas: Calculations on DataFrames // Reading and writing files
Mini-project: Tuition inflation

Reading:
McKinney Ch 5.3 up to "Correlation and covariance"
McKinney Ch 6.1 up to "Working with other delimited formats. Skip "Reading text files in pieces"

Jupyter notebooks: pandas calculation // pandas io // inflation mini-project
Data: gdp_components.csv, gdp_parts.csv, debt.xlsx // inflation_food.csv, inflation_tuition.xlsx

Discussion section: DataFrames, finding help

Figure of the week: Getting carried away [my thoughts]


Week 5: October 9 & 11
Matplotlib: Figures and axes // line plots // histograms
Visualization: What makes a good visualization?

Reading: McKinney Ch 9.1 up to "matplotlib configuration"

Jupyter notebooks: matplotlib 1 // visualization
Data: gdp_components.csv // bea_gdp.csv, map_data.zip

Discussion section: File paths, terminal commands
Extras: Introduction to seaborn // seaborn // chile.xlsx, broadband_size.xlsx, auto_data.dta

Figure of the week: Heatmap [my thoughts]

Due October 13 (midnight): Coding practice #2 // [solutions]
banks_and_branches.csv, GDPCA.csv
Submit through canvas.

Due October 13 (midnight): Project team rosters
Only the corresponding author of each team should submit this assignment. A team of one person should also submit a roster. Use the form below to submit your roster.
[Roster submission form]


Week 6: October 16 & 18
Exam #1 // The data analyst's workflow

Reading: None

Slides: workflow

Jupyter notebooks: workflow example // UW Now slides
Data: None.

Discussion section: None.

Figure of the week: Sankey diagram (from this report) [my thoughts]

October 16: In-class exam #1
exam 1 info
practice exam A // practice exam data A
practice exam B // practice exam data B
exam 1 // exam data // solutions


Week 7: October 23 & 25

Datetime: datetime objects // Resampling // Plotting
Pandas Datareader: Retrieving data from APIs

Reading:
Dates and time: McKinney Ch 11.2 up to "Time series with duplicate indices", Ch 11.6 up to "Grouped time resampling"
APIs: pandas datareader docs

Jupyter notebooks: time series // apis
Data: vix.csv // osk.csv

Discussion section: Bar charts, exam #1 feedback

Figure of the week: Bar charts (figures 1&2) [my thoughts]


Week 8: October 30 & November 1
Pandas: MultiIndex // Reshaping with stack() and unstack() // Merging

Reading:
MultiIndex: McKinney Ch 8.1
Reshaping: McKinney Ch 8.3 (skim the stuff on pivot and melt)
Merging: McKinney Ch 8.2

Jupyter notebooks: multiIndex // reshaping // merging
Data: nipa.xlsx, CPS_March_2016.csv // dogs.csv, WEOOct2021all.csv, zillow.csv // steps.csv

Discussion section: Scatter plots, GitHub

Figure of the week: Bad bar charts [my thoughts]

Due Nov 3 (midnight) Coding practice #3 // [solutions]
two_digit_by_port.csv, toys.csv
Submit through canvas.


Week 9: November 6 & 8
Pandas: Groupby // Data transformations

Reading:
Groupby: McKinney Ch 10.1 & 10.2
Transforms: McKinney Ch 7

Jupyter notebooks: groupby // transforms
Data: Most-Recent-Cohorts-Institution.zip // movies.csv
Extras: Style sheets in matplotlib [bea_gdp.csv, paper.mplstyle]

Discussion section: Exporting figures, OLS introduction

Figure of the week: Lots-o-plots [my thoughts]

November 10 (midnight): Project proposal
Submit through canvas. (project information page)


Week 10: November 13 & 15
Geopandas: Thomas Paulson (placer.ai) guest lecture // Maps

Reading: geopandas docs

Jupyter notebooks: maps
Data: cities_4269.zip

Discussion section: Optimization
Extras: map insets

Figure of the week: Choropleth (figure 1) [my thoughts]

Due November 17 (midnight) Coding practice #4 // [solutions]
airline_products_2017.csv
Submit through canvas.


Week 11: November 20 & 22 [No class November 22, video available]
Exam #2 // Regex: Advanced string search

Reading: McKinney Ch 7.4 "Regular Expressions"
Jupyter notebooks: regex
Data: callcenterdatacurrent.zip
Video: regex (part 1, part 2)

Bonus Thanksgiving content: Thanksgiving inflation index

Discussion section: None.

November 20: In-class exam #2
Exam #2 information
practice exam // exam2_data_prac.zip
exam #2 // exam2_data.zip // solutions


Week 12: November 27 & 29

Geopandas: Choropleths
Statsmodels: Discrete regression
Course evaluations: Nov 29–Dec13

Reading: McKinney Ch 12 up to "Estimating Time Series", statsmodels docs // geopandas docs

Jupyter notebooks: choropleths // discrete
Data: cities_4269.zip, results_2020.xlsx // apple.dta, pntsprd.dta

Discussion section: Optimization

Figure of the week: Keeping track of complicated plots (from here) [my thoughts]


Week 13: December 4 & 6

BeautifulSoup: Webscraping
Statsmodels: Natural Language Processing
Course evaluations: Nov 29–Dec13

Reading: BeautifulSoup quickstart

Jupyter notebooks: scraping // NLP
Data: spam.csv // newsgroups.zip

Discussion section:

Figure of the week: Treemap [my thoughts]

Due December 8 (midnight) Coding practice #5
state_gdp.csv, state_unemp.xlsx
Submit through canvas.


Week 14: December 11 & 13 [No class December 13, video available]
Interactive figures // Where to go from here
Course evaluations: Nov 29–Dec13

Reading: plotly fundamentals

Slides: wrap up
Jupyter notebooks: interactive // gapminder
Data: us_agg_data.csv // conts.csv
Videos: gapminder // wrapup

Discussion section: Reflection and assessment (no meeting, self-guided material)

Figure of the week: Images [my thoughts]

December 15 (midnight) Final project files
Submit through canvas.
submission instructions // final checks // rubric // project information page