Self-guided study

Below is a rough outline of how I structure my courses. I have also added other notebooks that I have used in class in the past, but either have not updated or do not have time to cover. Many of these notebooks require data files. I think I have included them all at the bottom of the page. If one is missing, let me know.

The notebooks labeled "Friday" are short topics that provide a bit more practice and introduce some new material. They are meant to be worked through on a Friday afternoon over coffee.

This material is all "as is" but I would like to hear if you find mistakes or other issues. Hey, I'd like to hear if you think this information is useful, too. Drop me a line at ruhl2@wisc.edu to let me know. Thanks.

Preliminary

I teach using jupyter notebooks. They combine code and output into one easy-to-follow document.

Python basics

The goal here is to learn enough python that we can use tools like pandas and matplotlib to work with data.

15-minute Friday

  • Friday: writting readable code, PEP-8, the .format() method (todo: f-strings)
  • Friday: function arguments, positional args, keyword args, more practice
  • Friday: more flow control, elif, in, not (an optional bit on floats)
  • Friday: relative file paths

Data wrangling with pandas

The workhorse data handling package is pandas. Let's get started.

15-minute Friday

  • Friday: more DataFrame practice

Visualization

Visualizing data is probably the highest value-added task we do. All the high-tech modeling in the world is not very valuable if you cannot get your results across to the reader. A quick look at the WSJ or NYT should convince you that visualizations are great for this.

15-minute Friday

  • Friday: formatting axes; ticks, date formats
  • Friday: bar graphs, scatter plots
  • Friday: rolling window calculations, exporting figures for word

Projects

Practice is more fun if you are trying to learn about something in the real world. At this point in the class, we know enough to try a few small projects.

Avanced data wrangling with pandas

So much of what I do involves fighting with data before I can do anything useful with it. There is not a strict ordering of these notebooks. Some of them use skills developed in other notebooks. (Some also use api calls, which are covered below under "data acquisition").

  • dates: datetime and strings, datetime index and plotting, resampling
  • multiIndex: indexing with many variables, panel data basics
  • reshaping: "wide" vs. "long"
  • merging: combine data sets
  • gropuby: apply-split-combine
  • transforming data: string methods, replace, map
  • regex: powerful string searching in pandas

15-minute Friday

  • Friday: fuzzy matching and merging

Data acquisiton

Data don't just come in files.

  • apis: pandas datareader, FRED, Census apis and processing retrieved json
  • web scraping fundamentals: beautiful soup, combing through html for clues

Maps

Geospatial data works best with a map. The sofware (geopandas) can be a bit tricky to install, but making maps is not too hard. (If you can't get geopandas to install, try these notebooks out on google colab.)

Regression

This is a class for economics students, so I assume you know about regressions. These notebooks show you how to regress in python, rather than stata.

15-minute Friday

  • Friday: exporting regression tables, latex

Natural language processing

Words as data.

Miscellaneous

Whatever is left...

Data

There may also be a few data files that the notebook links directly to, so you can download it from the source.

apple.dta, auto_data.dta, bea_gdp.csv, broadband_size.xlsx, callcenterdatacurrent.zip, chile.xlsx, cities_4269.zip, conts.csv, CPS_March_2016.csv, debt.xlsx, dogs.csv, gdp_components.csv, gdp_parts.csv, inflation_food.csv, inflation_tuition.xlsx, map_data.zip, Most-Recent-Cohorts-Institution.zip, movies.csv, newsgroups.zip, nipa.xlsx, osk.csv, pntsprd.dta, results_2020.xlsx, spam.csv, steps.csv, us_agg_data.csv, vix.csv, WEOOct2021all.csv, zillow.csv