Self-guided study
Below is a rough outline of how I structure my courses. I have also added other notebooks that I have used in class in the past, but either have not updated or do not have time to cover. Many of these notebooks require data files. I think I have included them all at the bottom of the page. If one is missing, let me know.
The notebooks labeled "Friday" are short topics that provide a bit more practice and introduce some new material. They are meant to be worked through on a Friday afternoon over coffee.
This material is all "as is" but I would like to hear if you find mistakes or other issues. Hey, I'd like to hear if you think this information is useful, too. Drop me a line at ruhl2@wisc.edu to let me know. Thanks.
Preliminary
I teach using jupyter notebooks. They combine code and output into one easy-to-follow document.
Python basics
The goal here is to learn enough python that we can use tools like pandas and matplotlib to work with data.
- python fundamentals 1: assignment, calculation, strings
- python fundamentals 2: lists, tuples, dicts, types
- python fundamentals 3: if/else, for loops, list comprehensions
- python fundamentals 4: slicing, user-defind function, objects
- Edgar Allen Poe string practice: decode a Valentine's poem (my interpretation of Mike Waugh's assignment)
15-minute Friday
- Friday: writting readable code, PEP-8, the
.format()
method (todo: f-strings) - Friday: function arguments, positional args, keyword args, more practice
- Friday: more flow control,
elif
,in
,not
(an optional bit on floats) - Friday: relative file paths
Data wrangling with pandas
The workhorse data handling package is pandas. Let's get started.
- pandas fundamentals 1: dataframe, series, indexes, filtering
- pandas fundamentals 2: calculations on dataframes
- pandas I/O: the file sytem, reading and writing csv and excel files
15-minute Friday
- Friday: more DataFrame practice
Visualization
Visualizing data is probably the highest value-added task we do. All the high-tech modeling in the world is not very valuable if you cannot get your results across to the reader. A quick look at the WSJ or NYT should convince you that visualizations are great for this.
- matplotlib fundamentals: figures and axes, line plots, histograms, subplots
- visualization best practices: graphical excellence
- stylesheets: make the defaults work for you
- seaborn: regplot, facet plot (todo: grouped bar plots)
- interactive figures: the plotly package, lines, bars, maps
- gapminder: test your skills, recreate one of my favorite online visualizations
15-minute Friday
- Friday: formatting axes; ticks, date formats
- Friday: bar graphs, scatter plots
- Friday: rolling window calculations, exporting figures for word
Projects
Practice is more fun if you are trying to learn about something in the real world. At this point in the class, we know enough to try a few small projects.
- data scientist's workflow
- mini-project on inflation (only need pandas basics)
- mini-project on Thanksgiving (pandas + matplotlib)
Avanced data wrangling with pandas
So much of what I do involves fighting with data before I can do anything useful with it. There is not a strict ordering of these notebooks. Some of them use skills developed in other notebooks. (Some also use api calls, which are covered below under "data acquisition").
- dates: datetime and strings, datetime index and plotting, resampling
- multiIndex: indexing with many variables, panel data basics
- reshaping: "wide" vs. "long"
- merging: combine data sets
- gropuby: apply-split-combine
- transforming data: string methods, replace, map
- regex: powerful string searching in pandas
15-minute Friday
- Friday: fuzzy matching and merging
Data acquisiton
Data don't just come in files.
- apis: pandas datareader, FRED, Census apis and processing retrieved json
- web scraping fundamentals: beautiful soup, combing through html for clues
Maps
Geospatial data works best with a map. The sofware (geopandas) can be a bit tricky to install, but making maps is not too hard. (If you can't get geopandas to install, try these notebooks out on google colab.)
- geopandas fundamentals: points, polygons, geodataframes
- choropleths: heatmaps on maps, choosing the right colormap
- insets: put Alaska and Hawaii where they can be seen
- coordinate referene systems: changing CRS, choosing the right CRS
- interactive maps: uses plotly rather than geopandas
Regression
This is a class for economics students, so I assume you know about regressions. These notebooks show you how to regress in python, rather than stata.
- ols: statsmodels, categorical variables, logs, polynomials
- discrete dependent variables: linear probability, logit, probit, marginal effects
- difference in difference: almost a replication of Card and Kruger (1994)
15-minute Friday
- Friday: exporting regression tables, latex
Natural language processing
Words as data.
- nlp fundamentals: pre-processing, bag of words
Miscellaneous
Whatever is left...
Data
There may also be a few data files that the notebook links directly to, so you can download it from the source.
apple.dta, auto_data.dta, bea_gdp.csv, broadband_size.xlsx, callcenterdatacurrent.zip, chile.xlsx, cities_4269.zip, conts.csv, CPS_March_2016.csv, debt.xlsx, dogs.csv, gdp_components.csv, gdp_parts.csv, inflation_food.csv, inflation_tuition.xlsx, map_data.zip, Most-Recent-Cohorts-Institution.zip, movies.csv, newsgroups.zip, nipa.xlsx, osk.csv, pntsprd.dta, results_2020.xlsx, spam.csv, steps.csv, us_agg_data.csv, vix.csv, WEOOct2021all.csv, zillow.csv