Fundamentals of Data Analytics for Economists
Fall 2023: Delivered in person. Meeting: Mondays and Wednesdays 4:00-5:15 | Education L196
Professor: Kim J. Ruhl // ruhl2@wisc.edu Office hours: Tu 9:00AM-10:00AM & Tu 3:30PM-4:30PM | 7444 Soc Sci
Teaching assistant: Satyen Pandita // spandita@wisc.edu Office hours: M 3:00PM-4:00PM | 6413 Soc Sci
Teaching assistant: Mitchell Valdes Bobes // valdsbobes@wisc.edu Office hours: Th 3:00PM-4:00PM | 7308 Soc Sci
Grades and announcements are managed through Canvas. Everything else happens here.
Weekly schedule
This week-by-week schedule will be constantly updated. Some topics may take longer than scheduled (and others may take less) but exam, coding practice, and project due dates will not change.
Reading: McKinney refers to the book Python for Data Analysis (Third edition) by Wes McKinney. It is available as an ebook here for free. There is a lot more in this book than we will cover, but it is a good reference.
File availability: Links to Jupyter notebooks will open a view of the notebook. At the top right of the page there is a link to download the notebook as an '.ipynb' file.
Videos: Class will not be recorded as long as instruction is in-person. If the University moves our class online, then videos will be available.
Discussion sections: The discussion sections will be facilitated by Satyen Pandita and Mitchell Valdes Bobes, the course TAs. The discussion sections provide time to ask questions, review previous assignments, and cover topics that are complementary to those we cover in the lectures. Discussion sections are mandatory and the topics covered in the sections will be found in exams and assignments.
Figure of the week: A mixture of good, bad, and ugly visualizations from the "real world." Take a few minutes and study the figure. What is it supposed to communicate? Could it be improved? Does it inspire other research questions? I have included some of my own thoughts on the figures, too. If you come across an interesting visualization, send it my way!
Week 0: September 6 Introduction // Using data to communicate ideas
Examples: gapminder // election results // Inflation
Reading: Syllabus // AI policy // Welcome announcement Installing and using winstat (SSCC) // My notes on winstat
Slides: introduction Data: None
Discussion section: Introduction, running python
Due September 8 (midnight): [Student survey] Follow the link to complete the survey. Then log into canvas and complete the assignment. Enter "survey completed" in the text box for the assignment.
Due September 8 (midnight): Install Winstat and log on Follow the instructions here. Then log into canvas and complete the assignment.
Week 1: September 11 & 13 Winstat // Jupyter notebooks // Markdown Python: Assignment // Calculation // Types // Strings
Reading: How to cite code you learned about online McKinney Ch. 2.2 parts: "Tab Completion," "Introspection" Markdown cheatsheet McKinney Ch. 2.3: "Indentation," "Everything is an object," "Comments," "Attributes and methods," "Binary operators," "Numeric types," "Strings"
Jupyter notebooks: survey // notebooks and markdown // python 1 Data: None
Discussion section: Good coding practices, strings
Figure of the week: Stacked area plot [my thoughts]
Week 2: September 18 & 20 [No class September 20, video available] Python: Lists // Tuples // Dicts // More on types Python: Loops // Conditionals
Reading: McKinney Ch. 3.1 parts: (skim stuff on tuples, skip stuff on sets, pay close attention to slicing)
Jupyter notebooks: python 2 // python 3 Data: None
Discussion section: Conditional statements, floating point numbers
Figure of the week: Line plot (from this article) [my thoughts]
Week 3: September 25 & 27 Python: Slicing // Functions // Packages Pandas: Series and DataFrames // Selecting data
Reading: McKinney Ch 3.2 up to "Anonymous functions" McKinney Ch 5 pgs up to "Arithmetic and data alignment
Jupyter notebooks: python 4 // pandas fundamentals Data: None
Discussion section: Functions
Figure of the week: Scatter plot [my thoughts]
Due September 29 (midnight): Coding practice #1 // [solutions] Submit through canvas.
Week 4: October 2 & 4 Pandas: Calculations on DataFrames // Reading and writing files Mini-project: Tuition inflation
Reading: McKinney Ch 5.3 up to "Correlation and covariance" McKinney Ch 6.1 up to "Working with other delimited formats. Skip "Reading text files in pieces"
Jupyter notebooks: pandas calculation // pandas io // inflation mini-project Data: gdp_components.csv, gdp_parts.csv, debt.xlsx // inflation_food.csv, inflation_tuition.xlsx
Discussion section: DataFrames, finding help
Figure of the week: Getting carried away [my thoughts]
Week 5: October 9 & 11 Matplotlib: Figures and axes // line plots // histograms Visualization: What makes a good visualization?
Reading: McKinney Ch 9.1 up to "matplotlib configuration"
Jupyter notebooks: matplotlib 1 // visualization
Data: gdp_components.csv // bea_gdp.csv, map_data.zip
Discussion section: File paths, terminal commands Extras: Introduction to seaborn // seaborn // chile.xlsx, broadband_size.xlsx, auto_data.dta
Figure of the week: Heatmap [my thoughts]
Due October 13 (midnight): Coding practice #2 // [solutions] banks_and_branches.csv, GDPCA.csv Submit through canvas.
Due October 13 (midnight): Project team rosters Only the corresponding author of each team should submit this assignment. A team of one person should also submit a roster. Use the form below to submit your roster. [Roster submission form]
Week 6: October 16 & 18 Exam #1 // The data analyst's workflow
Reading: None
Slides: workflow
Jupyter notebooks: workflow example // UW Now slides
Data: None.
Discussion section: None.
Figure of the week: Sankey diagram (from this report) [my thoughts]
October 16: In-class exam #1
exam 1 info
practice exam A // practice exam data A
practice exam B // practice exam data B
exam 1 // exam data // solutions
Week 7: October 23 & 25
Datetime: datetime objects // Resampling // Plotting
Pandas Datareader: Retrieving data from APIs
Reading: Dates and time: McKinney Ch 11.2 up to "Time series with duplicate indices", Ch 11.6 up to "Grouped time resampling" APIs: pandas datareader docs
Jupyter notebooks: time series
// apis
Data: vix.csv // osk.csv
Discussion section: Bar charts, exam #1 feedback
Figure of the week: Bar charts (figures 1&2) [my thoughts]
Week 8: October 30 & November 1
Pandas: MultiIndex // Reshaping with stack()
and unstack()
// Merging
Reading:
MultiIndex: McKinney Ch 8.1
Reshaping: McKinney Ch 8.3 (skim the stuff on pivot
and melt
)
Merging: McKinney Ch 8.2
Jupyter notebooks: multiIndex // reshaping // merging
Data: nipa.xlsx, CPS_March_2016.csv // dogs.csv, WEOOct2021all.csv, zillow.csv // steps.csv
Discussion section: Scatter plots, GitHub
Figure of the week: Bad bar charts [my thoughts]
Due Nov 3 (midnight) Coding practice #3 // [solutions] two_digit_by_port.csv, toys.csv Submit through canvas.
Week 9: November 6 & 8 Pandas: Groupby // Data transformations
Reading: Groupby: McKinney Ch 10.1 & 10.2 Transforms: McKinney Ch 7
Jupyter notebooks: groupby // transforms
Data: Most-Recent-Cohorts-Institution.zip // movies.csv
Extras: Style sheets in matplotlib [bea_gdp.csv, paper.mplstyle]
Discussion section: Exporting figures, OLS introduction
Figure of the week: Lots-o-plots [my thoughts]
November 10 (midnight): Project proposal Submit through canvas. (project information page)
Week 10: November 13 & 15 Geopandas: Thomas Paulson (placer.ai) guest lecture // Maps
Reading: geopandas docs
Jupyter notebooks: maps
Data: cities_4269.zip
Discussion section: Optimization Extras: map insets
Figure of the week: Choropleth (figure 1) [my thoughts]
Due November 17 (midnight) Coding practice #4 // [solutions] airline_products_2017.csv Submit through canvas.
Week 11: November 20 & 22 [No class November 22, video available] Exam #2 // Regex: Advanced string search
Reading: McKinney Ch 7.4 "Regular Expressions"
Jupyter notebooks: regex
Data: callcenterdatacurrent.zip
Video: regex (part 1, part 2)
Bonus Thanksgiving content: Thanksgiving inflation index
Discussion section: None.
November 20: In-class exam #2 Exam #2 information practice exam // exam2_data_prac.zip exam #2 // exam2_data.zip // solutions
Week 12: November 27 & 29
Geopandas: Choropleths Statsmodels: Discrete regression Course evaluations: Nov 29–Dec13
Reading: McKinney Ch 12 up to "Estimating Time Series", statsmodels docs // geopandas docs
Jupyter notebooks: choropleths // discrete Data: cities_4269.zip, results_2020.xlsx // apple.dta, pntsprd.dta
Discussion section: Optimization
Figure of the week: Keeping track of complicated plots (from here) [my thoughts]
Week 13: December 4 & 6
BeautifulSoup: Webscraping Statsmodels: Natural Language Processing Course evaluations: Nov 29–Dec13
Reading: BeautifulSoup quickstart
Jupyter notebooks: scraping // NLP Data: spam.csv // newsgroups.zip
Discussion section:
Figure of the week: Treemap [my thoughts]
Due December 8 (midnight) Coding practice #5 state_gdp.csv, state_unemp.xlsx Submit through canvas.
Week 14: December 11 & 13 [No class December 13, video available] Interactive figures // Where to go from here Course evaluations: Nov 29–Dec13
Reading: plotly fundamentals
Slides: wrap up Jupyter notebooks: interactive // gapminder Data: us_agg_data.csv // conts.csv Videos: gapminder // wrapup
Discussion section: Reflection and assessment (no meeting, self-guided material)
Figure of the week: Images [my thoughts]
December 15 (midnight) Final project files Submit through canvas. submission instructions // final checks // rubric // project information page