Data sets

A work-in-progress collection of easy (and some not so easy) to access data sets. Created for the Data Analytics for Economists course at the University of Wisconsin -- Madison, but all are welcome. Suggestions and corrections are always appreciated (ruhl2@wisc.edu).

Aggregate Economic Data

  • FRED (St. Louis FRB) Massive repository of economic data. Somewhat U.S.-centric. Accessible through pandas_datareader.

  • COMTRADE (United Nations) Very detailed data on international trade in goods.

  • World Bank Data for many countries. Includes economic data, but also demographic, social, and environmental topics.

  • Eurostat Mostly focused on European Union countries. Data on many topics including trade, GDP, agriculture, environment, demographics... h/t H. Schriefer '22

  • Penn World Table The big draw here is GDP at purchasing power parity, which allows for meaningful cross-country comparisons. We used read_excel() to directly import this from the web.

  • UN Population Data Demographic data by country, including forecasts.

  • BLS Quarterly Census of Employment and Wages Quarterly employment, wages, etc. at the county/metro/state levels.

  • BLS Occupational Employment Statistics Wages and employment by occupation and geography.

Data on Individuals

Wisconsin Data

  • WI Dept. of Health Services Data on Asthma, Zika, and lots in between. It takes some clicking around, but many of datasets can be visualized as a map to get you thinking. Look for the download button in the top right corner.

  • Wisconsin Voting Data A lot of detail. There is an api, too.

  • GIS Data Data on boundaries, roads, etc. for Wisconsin. Use it with geopandas to create maps.

  • City of Madison More data on the city, including lots of spatial data. The tax rolls are interesting—I can see my house in this dataset!

  • City of Milwaukee Housing, services, elections, and spatial data. There is neat data on calls to the service center. It would take some good data wrangling, but who wouldn’t want to know that average time it takes to get a street light fixed?

  • Wisconsin COVID-19 Data on cases, testing, and other measures for Wisconsin, by county and even census tract. I found the Dane County numbers in this dataset to have some differences from the ones reported by Madison and Dane County Public Health.

Micro Export Data

  • Brookings Export Monitor Exports by industry at the county, metro, and state levels. (aggregates, too) This data tracks goods according to where they were produced.

  • USA Trade Sign up for a free account to use. Imports and exports by product and U.S. state. This data tracks goods according to origin of movement rather than production.

Business Data

  • Inside Airbnb Data on listing, reviews and calendar data. Doesn't have data for a Wisconsin city, but Minneapolis and Chicago are in there.

  • Yahoo Finance Historical and current financial data. The api in pandas_datareader is broken, but you can still download files from the site.

  • FDIC Aggregate data on US banks, including balance sheet and income statement data. The data on bank failures might make for an interesting analysis.

  • Airline routes (T-100) Route-segment based data. Monthly observations on number of passengers, seats, and cargo transported on a given route segment for each airline.

  • Airline itineraries (DB1B) Quarterly sample of 10% of passenger itineraries from major airlines. Includes price data.

  • Zillow Housing and rental data by metro area.

Medical Data

  • HRSA Data Grant, loan, and scholarship program data, as well as data about availability of healthcare. The data on health professional shortage areas looks interesting.

  • Dartmouth Atlas of Health Care Compiled from medicare data, the database provides information about health care at detailed levels, right down to the hospital.

  • COVID-19 The Johns Hopkins database contains daily data on coronavirus cases in the United States and the world.

  • National injury database The US Consumer Product Safety Commission collects data on injuries. There is an "incident narrative" in each entry with details.

Sports

Arts and Culture

  • Cooper Hewitt Open access to data about the collection.

  • MovieLens Movie ratings and demographic data about the raters. Some very large datasets, but some small ones for getting your code up and running.

  • New York Philharmonic Data on more than 20,000 performances.

Education

  • College Scorecard University/College level data about the school and its student body.

  • Opportunity Insights Data on social mobility outcomes and demographic characteristics.

  • Illinois Report Card The Report Card Public Dataset reports demographic and financial characteristics for every school in the state. The data go back to the 1990s, but will require some manipulation. h/t M. Warren '20

Other Data Collections

  • NBER Datasets that go with NBER working papers. Some data is easy to access some is not (and some is missing). The associated papers are full of good questions, too.

  • ICPSR A large collection of social science data. We have not used this data—let us know if you do, we would like to hear about it.

  • Kaggle This site runs competitions and warehouses lots of data and code.

  • Chicago data portal Lots of data about the city. In fact, most big cities have data portals now.

  • UC Irvine Data sets meant for "machine learning" but they can be used for anything. Some are very simple, some complex.

Political data