Fall 2023 wrap up¶

Today's agenda:

  1. Python on your laptop
  2. Set up GitHub to host your work
  3. Reflect on class, discuss where to go from here

Reminders¶

Be sure to include your group member names on the first page of your report.

  1. Submit project files by 11:59 PM on Friday December, 15, 2023. Follow the instructions for submission carefully.
  2. Submit your course evaluations (please!) by December 13.

Python on your laptop¶

We have been lucky to have the sscc and winstat to use in class. Soon, you will be moving on from this class, and the University, so it makes sense to set up a local python installation.

On winstat, we have been working with the Anaconda distribution. If you recall from earlier in the semester, python is a standard set of functions and data types that can be extended by installing packages. Anaconda is a distribution: a bundle of python and a set of packages that are useful for data analysis and numerical computing. A list of the packages is here. This includes many of the packages we have been using: numpy, pandas, matplotlib, seaborn, scikit-learn...

To install Anaconda on your laptops:

  1. Go to https://www.anaconda.com/download/ and click on the download button.
  2. Follow the prompts. FOR PC USERS: When asked, do not check the box to 'add anaconda to the path.'
  3. Run jupyter notebook
  4. If all goes well, a web browser will open and jupyter notebook will be open to your user directory
  5. Open and run one of your notebooks to test that everything is up and running.
  6. You will need to reinstall any packages you installed on winstat (e.g., pandas data_reader)

If step 5 worked, then you are ready to code. I recommend that you move your files from winstat to your local machine at some point so you have a local copy.

Set up GitHub¶

GitHub is a web-based service that hosts (stores) files. It implements a powerful version control system called Git and has other features like bug tracking, wikis, etc.

There are a lot of useful things you can do with GitHub — most of which are outside the scope of our class. If you continue on and venture deeper into developing code (particularly if you are doing it with others) you will want to learn more about these features.

For our purpose, we want to take advantage of GitHub's ability to host jupyter notebooks. A jupyter notebook is just a text file. (Try opening one in notepad.) The jupyter notebook software interprets the text file and renders what we see on the screen. When we upload our notebooks to GitHub, they will be rendered for others to see when they go to your GitHub repository. You cannot run the notebook, but you can view it.

Setting up GitHub:

  1. Go to https://github.com/ and create an account. If you already have a GitHub account, sign in.
  2. Once you are signed in, proceed to the repositories page. A repository is like a folder.
  3. Create a new repository. Name it something like 'Kim-Ruhl's-Portfolio'. Make it public and check the box to initialize with a readme file. [You can delete this repository later.]
  4. Edit the readme file to describe your repository. You can use markdown here. 'Commit' the changes.
  5. Upload an ipynb file to your repository. View the file!

You can create as many repositories as you like.

Create a portfolio¶

You now have a place you can upload your project and other work. Linking to the notebook on GitHub is an easy way to share your accomplishments with others: graduate schools, potential employers, your mom and dad...

Where we have been...¶

Let's take a minute to reflect on what you knew when we first met on September 6. On that day, we looked at some visualizations. How does your approach to these figures today differ from your approach three months ago?

Voting in Wisconsin¶

Go to https://www.cnn.com/election/2020/results/state/wisconsin/president and focus on the presidential results

Inflation in the OECD¶

Go to https://data.oecd.org/price/inflation-cpi.htm

Skills we have developed¶

It's always a good idea to take a moment at the end of a class or project to take an inventory of what we have done or what skills we have learned. You will often be asked to make summaries like this on resumes and reports. Writing the summary when it is all fresh in your head is easier.

Skills acquired

  1. Basic python programming: data types, loops, conditionals, functions
  2. Pandas proficiency: reading and saving data, working with DataFrames and Series (slicing, subsetting), indexing, using web apis
  3. Data cleaning and preparation: dealing with missing values, applying transformations to raw data, working with dates and times
  4. Data 'wrangling': reshaping data, merging data sets
  5. Basic analysis: summarizing data, using groupby to efficiently analyze subsets
  6. Formal analysis: linear regression, classifier problems, nlp
  7. Visualization: bar plots, time-series plots, line plots, scatter plots, maps, interactive figures ...
  8. Visualization: characteristics of a good visualization, graphical excellence

You are probably more comfortable with some of these skills more than others. You can always go back and look through the jupyter notebooks to brush up.

Data we have explored¶

Having experience with commonly-used datasets is valuable. Some datasets we have worked with this semester include:

  1. FRED. A repository of economic data hosted by the St. Louis Federal Reserve Bank.
  2. Census shape files. Census tracts, counties, and states.
  3. Zillow. Housing market data.
  4. Airline origin and destination survey (DB1B). Ticket prices and passenger itineraries.
  5. College scorecard. University level data about students and outcomes.
  6. The American Time Use Survey

Links to these datasets can be found at http://badgerdata.org/pages/data-sets.

Many of you worked with and learned about other datasets for your projects. Add those data sets to the list above!

Where to go from here¶

This class has (hopefully) provided you with a set of skills and a window into many different things you can do with a computer, python, and some data. Suppose you thought this course was fun, or useful, or both. Where could you go from here? The sky is the limit. You could spend the rest of your life learning about this kind of stuff, but here are some thoughts.

1. More coding¶

We developed a pretty good understanding of how to use python. We did not spend much time learning how to 'code': Deeply understanding object-oriented programming; structuring complex code; coding as part of a team; code optimization.

To be honest, most data analysis does not require you to be an expert coder. Being a better programmer, though, is helpful. If you would like to learn more about writing code, a programming course could be a good idea. You could do this here at UW, at Madison College, or as a self-guided program online or from a book.

Some benefits: Being a better coder means that you will likely write more efficient code, which becomes more important as the datasets get larger. You will also become better at writing reusable code, so that you save time and effort. It is also worth noting that once you have a good sense of how to program, learning a new language (R, Java, Ruby...) is much easier.

2. More analytic tools¶

We spent a lot of time learning how to wrangle data. Real-world data are messy, and must be beaten into shape before we can do anything useful with it. Our analytic tools were mostly lifted from the econometrics courses you have already taken. We learned how to implement them in python.

If analytic tools interest you, think about picking up some more of them. The economics department offers several quantitative courses. You can learn more about them here. The Internet is full of data-analytic tutorials. Panel data? Simulation-based econometrics? Financial statistics? Think about what kinds of data you want to work with and what kinds of questions you want to ask. Then go find the tools.

Some benefits: Having a broad set of skills will allow you to tackle a wide variety of questions. If you want to become an expert in a particular field, drill down into the techniques that are most applicable.

3. More data tools¶

Our focus was on python, pandas, and matplotlib/seaborn/geopandas. Data is often locked away in databases that require structured query language (SQL) type interfaces. Learning some SQL will allow you to get at that data — even if it is just so that you can download it and load it into pandas.

Learning how to read and write javascript object notation (JSON) encoded data will make it easier to snag data from the web. You can do this in pandas. Grab a cup of coffee and google 'json pandas tutorial.'

'Big data' tools like Hadoop are aimed at working with really large datasets distributed across a cluster of computers. This is a pretty advanced skill, but there is no reason to think you couldn't learn how to use it if you put in the effort.

Some benefits: The more types of data you can access, the more you can do.

Thank you!¶

  • We have overcome many difficulties and learned a lot. Be proud of your accomplishments.
  • Keep in touch, it makes me happy to hear from my students.
  • Good luck with all of your future endeavors.