Data Analytics for Economists¶

Open up your laptop:¶

  • Go to badgerdata.org
  • Top of page: 'Econ 570'
  • Open: Week 0 > Slides > introduction

Outline:¶

  1. Data analysis examples: Thinking about data
  2. Why Python? Why not Excel? STATA? R?
  3. Syllabus stuff

Before we get started...¶

  • Me: Prof. Kim Joseph Ruhl

  • School: BS Bowling Green State University (Ohio); PhD U. Minnesota

    • Studied computer science and economics
  • Jobs: Minneapolis FRB; U. Texas Austin; NYU Stern Business School; Penn State
  • Research: International finance, macro, trade, multinational firms
    • Data + computational models
    • If you are curious: kimjruhl.com/research
    • I use the stuff we will learn in this class everyday
  • Not work: Fly fishing, hiking, robots

The teaching assistants¶

  • Satyen Pandita
    • Sections 302, 304
  • Mitchel Valdes Bobes
    • Sections 301, 303, 305
  • In discussion sections we will
    • Cover new topics
    • Review homework and exams
    • Have time for questions
  • Discussions are mandatory and material covered in them may be on exams

Introduction: Thinking about data.¶

This course is about using data and 'the right kind of analysis' to answer questions.

  1. What is the question we are trying to answer? Throughout the course (and the rest of your lives) you should be generating questions.
  2. Data: Where does it come from? What questions can it answer?
  3. What is 'the right kind of analysis?' Often, this is creating a visualization (e.g., a plot, a map) to convey information. Using the right kind of visualization is part of the analysis.

Let's look at some visualizations:

  1. Gapminder
  2. Voting in Wisconsin
  3. Inflation across the world

Gapminder¶

Go to www.gapminder.org/tools/ (open in new tab)

  1. What does the visualization tell you?
  2. What kind of data are used? Where are they from?
  3. What else would you like to know?

We will able to make this figure by the end of the semester.

Voting in Wisconsin¶

Go to https://www.cnn.com/election/2020/results/state/wisconsin/president (new tab) and focus on the presidential results

Take 5 minutes (work with those around you) and try to answer:

  1. What does the visualization tell you?
  2. What kind of data are used? Where are they from?
  3. What else would you like to know?

We will able to make this figure by the end of the semester.

Inflation across the world¶

Go to data.oecd.org/price/inflation-cpi.htm

Take 5 minutes (work with those around you) and try to answer:

  1. What does the visualization tell you?
  2. What kind of data are used? Where are they from?
  3. What else would you like to know?

We will able to make this figure better by the end of the semester.

Why Python?¶

We want to:

  • Work with large(ish) datasets
  • Manage numeric data and 'string' data
  • Have control over our figures' appearance
  • Be transparent: Can our work be replicated?
  • Automate repetitive stuff

Python gives us

  • A reasonably fast language
  • Great support for numeric and string manipulation
  • Many plotting options
  • Ways to create well-documented analysis
  • Automation

An example¶

Let's plot the dollar-yuan exchange rate to get a feel for what we can do with Python. (I do not expect you to follow all of this today!)

We will get the data from the St. Louis FRB FRED database. We will work with FRED often. It is an easy place to get economic data.

In [6]:
# Do some preliminary things to get Python set up.

# Import needed packages
import pandas as pd                            # the workhorse data package

import pandas_datareader.data as web           # for FRED data calls
import matplotlib.pyplot as plt                # for plots
import datetime as dt                          # for dates


# IPython command to include plots in the notebook
%matplotlib inline
In [7]:
# Create datetime object to hold the begin date
start = dt.datetime(1990, 1, 1)

# Get monthly Yuan-per-dollar exchange rate. 'fred' tells the data reader to use the FRED repository.
# 'EXCHUS' is the name of the data series in FRED. You can find the series codes (names) on the FRED website.
exchus = web.DataReader('EXCHUS', 'fred', start)

# Print out the first 3 observations.
print(exchus.head(3))

# Print out the last 3 observations.
print(exchus.tail(3))
            EXCHUS
DATE              
1990-01-01  4.7339
1990-02-01  4.7339
1990-03-01  4.7339
            EXCHUS
DATE              
2023-06-01  7.1614
2023-07-01  7.1863
2023-08-01  7.2486
In [8]:
# The basic plotting command. The 'b--' means 'make the line blue and dashed.'
plt.plot(exchus.index, exchus['EXCHUS'], 'b--' )
plt.xlabel('date')                  # Label the axes
plt.ylabel('yuan per dollar')
plt.show()

That worked pretty well! We could continue to customize the plot by adding data markers, changing colors, adding legends, etc. We will leave that stuff for later.

We could have made that figure in Excel. Did Python buy us much? In this case, maybe not, although I would argue that

exchus = web.DataReader('EXCHUS', 'fred', start)

is easier than going to FRED, downloading the data, getting set up in a workbook and then plotting.

Automating work in Python¶

What if we need to make several plots? What if we need to update the plot every day?

Python just requires a few extra lines of code. Again, don't worry about the details, we are just taking a quick look at what Python can do for us. All of this will make more sense later.

In [9]:
# Make list with the names of the data we would like to plot.
ctry_list = ['EXCHUS', 'EXJPUS', 'EXCAUS', 'EXUSEU']      # China, Japan, Canada, Euro

# Make a list of the units for the y axis.
units = ['yuan per USD', 'yen per USD', 'CAD per USD', 'USD per Euro']

# Read the data. Pass a list of codes rather a single string.
ex_many = web.DataReader(ctry_list, 'fred', start)
print(ex_many)
            EXCHUS    EXJPUS  EXCAUS  EXUSEU
DATE                                        
1990-01-01  4.7339  144.9819  1.1720     NaN
1990-02-01  4.7339  145.6932  1.1965     NaN
1990-03-01  4.7339  153.3082  1.1800     NaN
1990-04-01  4.7339  158.4586  1.1641     NaN
1990-05-01  4.7339  154.0441  1.1747     NaN
...            ...       ...     ...     ...
2023-04-01  6.8876  133.4745  1.3484  1.0962
2023-05-01  6.9854  137.0532  1.3517  1.0867
2023-06-01  7.1614  141.3581  1.3286  1.0840
2023-07-01  7.1863  140.9360  1.3211  1.1067
2023-08-01  7.2486  144.7804  1.3478  1.0910

[404 rows x 4 columns]

To plot, we loop over the list of variable names (and the units) and plot them.

In [10]:
fig, ax = plt.subplots(2, 2, figsize=(15,6))

for ctry, unit, axi in zip(ctry_list, units, fig.axes) :
    axi.plot(ex_many.index, ex_many[ctry], 'r-')
    axi.set_ylabel(unit)

Broad outline¶

  • Weeks 0 to 2: Getting up to speed with python
  • Weeks 3 to 5: Data management and visualization basics
  • Weeks 7 to 9: Advanced data management
  • Weeks 10 to 14: Advanced topics (regular expressions, maps, webscraping, text analysis, interactive figures)

Course information¶

  • This information is in the syllabus (which is on the course webpage)
  • Ask questions as we go

Expectations¶

  • Prerequisites
    • ECON 310 and ECON 301
    • NOT a prerequisite: coding experience
  • Attendance
    • No attendance grade, but you really should be here
    • Work through the material each week
    • It is difficult to "cram" computer programming

Course materials¶

  • Reference text: Python for Data Analysis by Wes McKinney (3rd ed)
    • This is more of a reference than a textbook
    • Free online version is here: wesmckinney.com/book
    • $35 on Amazon.com; ebook versions available
  • Besides the text: Lots (and lots and lots) of free online guides, tutorials, and references
  • Course webpage: badgerdata.org/pages/econ-570/
    • Everything you need is here
    • Week-by-week schedule
    • Links to data and resources
    • Constantly being improved

Canvas¶

  • Used for assignments and announcements (at students' request)

  • I am pretty new to Canvas

  • Did you receive an announcement from me?

Grades¶

Deliverable Weight in final grade
Student survey and winstat logon 1%
Best four coding practices 4%
In-class exam 1 20%
In-class exam 2 30%
Project roster 1%
Project proposal 9%
Project 35%
  • Coding practice: Five assignments meant to help you practice coding. Graded as: check, check+, or check-. Check or check+ earns full 1%. Check- earns 0%. Lowest score dropped. No late assignments accepted.
  • In-class exams: In-class exam (open book, open internet, no AI, no texting, etc.). Meant to test basic coding skills that should be routine. I will circulate a list of topics before the exam.
  • Project: This is a chance to develop a piece of data analysis that showcases what you have learned in class.
  • Work in group of up to three students.
  • Two final deliverables
    1. A 3-page professional report that presents your analysis to someone who is interested in the results, but not the technical details
    2. A well-documented Jupyter notebook with the technical details
  • Intermediate deliverable
    1. Project proposal Clear statement of question. Brief discussion of the data. One relevant figure or statistic that shows your team has the data loaded and ready for analysis.

Important dates¶

  • Survey and winstat logon due: September 8
  • Coding practices due: September 29, October 13, November 3, November 17, December 8
  • In-class exams: October 16, November 20
  • Project proposal due: November 10
  • Project due: December 15

Need some help?¶

  • See me: T 9:00AM-10:00AM & T 3:30PM-4:30PM | 7444 Soc Sci
  • Send me email: ruhl2@wisc.edu
  • See TA Pandita: M 3:00PM-4:00PM | 6413 Soc Sci
  • spandita@wisc.edu
  • See TA Valdes Bobes: Th 3:00PM-4:00PM | Soc Sci
  • valdsbobes@wisc.edu

Technical details¶

  • We will be working on Winstat
    • Windows servers that are pre-loaded (and managed!) by the Social Sciences Computing Cooperative (SSCC)
    • If you do not have a login, send me an email today (ruhl2@wisc.edu).
  • You are welcome to work on your own python installation rather than Winstat
    • You are your own tech support!
    • We will be working in python 3.11.3