Data sets
A work-in-progress collection of easy (and some not so easy) to access data sets. Created for the Data Analytics for Economists course at the University of Wisconsin -- Madison, but all are welcome. Suggestions and corrections are always appreciated (ruhl2@wisc.edu).
Aggregate Economic Data
-
FRED (St. Louis FRB) Massive repository of economic data. Somewhat U.S.-centric. Accessible through
pandas_datareader
. -
COMTRADE (United Nations) Very detailed data on international trade in goods.
-
World Bank Data for many countries. Includes economic data, but also demographic, social, and environmental topics.
-
Eurostat Mostly focused on European Union countries. Data on many topics including trade, GDP, agriculture, environment, demographics... h/t H. Schriefer '22
-
Penn World Table The big draw here is GDP at purchasing power parity, which allows for meaningful cross-country comparisons. We used
read_excel()
to directly import this from the web. -
UN Population Data Demographic data by country, including forecasts.
-
BLS Quarterly Census of Employment and Wages Quarterly employment, wages, etc. at the county/metro/state levels.
-
BLS Occupational Employment Statistics Wages and employment by occupation and geography.
Data on Individuals
-
American Community Survey Social, economic, housing, and demographic data often used to produce county-level analysis. The public use microdata sample (PUMS) contains anonymous data on individuals.
-
American Time Use Survey How to people spend their time? Working? Sleeping? Cleaning?
-
Current Population Survey Household level data on employment, income, and education.
-
NLSY79 These data follow a cohort of men and women who were 14-22 years old in 1979. They are then re-surveyed each year until 1994. Not the easiest data to access, but there is a lot to learn from it.
-
Income Distributions and Dynamics Provides statistics on income percentiles, shares, growth rates, persistence, and more for many U.S. demographic groups at national and state levels. Data you can use to think about inequality, for example.
-
MIDUS A longitudinal study of health and well being. Run by a group at UW–Madison.
-
National Survey of Family Growth Interviews with females about pregnancy and associated topics. Includes demographic data.
-
FBI Crime Data Explorer What kinds of crimes are being committed? How are they changing over time?
-
Mexican Household Income and Expenditure Survey Micro data on household income, expenditure, and demographic characteristics.
-
Baccalaureate and Beyond Longitudinal Study Panel data that follows students after receiving an undergraduate degree.
-
Post-secondary Employment Outcomes Earning and employment outcomes for graduates by degree level and major.
Wisconsin Data
-
WI Dept. of Health Services Data on Asthma, Zika, and lots in between. It takes some clicking around, but many of datasets can be visualized as a map to get you thinking. Look for the download button in the top right corner.
-
Wisconsin Voting Data A lot of detail. There is an api, too.
-
GIS Data Data on boundaries, roads, etc. for Wisconsin. Use it with geopandas to create maps.
-
City of Madison More data on the city, including lots of spatial data. The tax rolls are interesting—I can see my house in this dataset!
-
City of Milwaukee Housing, services, elections, and spatial data. There is neat data on calls to the service center. It would take some good data wrangling, but who wouldn’t want to know that average time it takes to get a street light fixed?
-
Wisconsin COVID-19 Data on cases, testing, and other measures for Wisconsin, by county and even census tract. I found the Dane County numbers in this dataset to have some differences from the ones reported by Madison and Dane County Public Health.
Micro Export Data
-
Brookings Export Monitor Exports by industry at the county, metro, and state levels. (aggregates, too) This data tracks goods according to where they were produced.
-
USA Trade Sign up for a free account to use. Imports and exports by product and U.S. state. This data tracks goods according to origin of movement rather than production.
Business Data
-
Inside Airbnb Data on listing, reviews and calendar data. Doesn't have data for a Wisconsin city, but Minneapolis and Chicago are in there.
-
Yahoo Finance Historical and current financial data. The api in
pandas_datareader
is broken, but you can still download files from the site. -
FDIC Aggregate data on US banks, including balance sheet and income statement data. The data on bank failures might make for an interesting analysis.
-
Airline routes (T-100) Route-segment based data. Monthly observations on number of passengers, seats, and cargo transported on a given route segment for each airline.
-
Airline itineraries (DB1B) Quarterly sample of 10% of passenger itineraries from major airlines. Includes price data.
-
Zillow Housing and rental data by metro area.
Medical Data
-
HRSA Data Grant, loan, and scholarship program data, as well as data about availability of healthcare. The data on health professional shortage areas looks interesting.
-
Dartmouth Atlas of Health Care Compiled from medicare data, the database provides information about health care at detailed levels, right down to the hospital.
-
COVID-19 The Johns Hopkins database contains daily data on coronavirus cases in the United States and the world.
-
National injury database The US Consumer Product Safety Commission collects data on injuries. There is an "incident narrative" in each entry with details.
Sports
- Baseball Database by Sean Lahman Batting and pitching statistics from 1871 plus much more.
Arts and Culture
-
Cooper Hewitt Open access to data about the collection.
-
MovieLens Movie ratings and demographic data about the raters. Some very large datasets, but some small ones for getting your code up and running.
-
New York Philharmonic Data on more than 20,000 performances.
Education
-
College Scorecard University/College level data about the school and its student body.
-
Opportunity Insights Data on social mobility outcomes and demographic characteristics.
-
Illinois Report Card The Report Card Public Dataset reports demographic and financial characteristics for every school in the state. The data go back to the 1990s, but will require some manipulation. h/t M. Warren '20
Other Data Collections
-
NBER Datasets that go with NBER working papers. Some data is easy to access some is not (and some is missing). The associated papers are full of good questions, too.
-
ICPSR A large collection of social science data. We have not used this data—let us know if you do, we would like to hear about it.
-
Kaggle This site runs competitions and warehouses lots of data and code.
-
Chicago data portal Lots of data about the city. In fact, most big cities have data portals now.
-
UC Irvine Data sets meant for "machine learning" but they can be used for anything. Some are very simple, some complex.
Political data
-
MIT election lab Historical data on voting
-
Trump twitter archive Text data ready for analysis.