Economics + Data

download notebook
view notebook w/ solutions

Python basics: Part 3

files needed = none

Before we can start working with data, we need to work out some of the basics of Python. The goal is to learn enough so that we can do some interesting data work—we do not need to be Python Jedi.

We now know about the basic data structures in python, how types work, and how to do some basic computation and string manipulation. What we need next is flow control.

A python program is a list of statements. The python interpreter reads those statement from top to bottom and executes them. Depending on what is happening in our program, we may want to skip some statements, or repeat some other statements. Flow control statements manage this process.

In this notebook we will cover

The bool type
The if/else statement
The for loop
List comprehensions

Remember: Ask questions as we go.

Bool

Flow control often requires asking whether a statement is true or false and then taking an action conditional on the answer. For example: Is this variable a string? If yes, convert to a float. If not, do nothing.

The python type bool can take on two values: True or False. Let's see it in action.

my_age = 41               

# Here we compare my_age to see if it is less than 18
is_a_minor = my_age < 18   

print(is_a_minor)
print(type(is_a_minor))

The comparison operators we will often use are

< (less than)
> (greater than)
<= (less than or equal to)
>= (greater than or equal to)
== (equal)
!= (not equal)

Important: We use a double equal sign == to check for equality and a single equal sign for assignment.

# A bit of code to see if the variable 'year' is equal to the current year
year = 2019
current_year = 2020
is_current_year = (current_year == year)  # the parentheses are not needed, but I like them for clarity
print(is_current_year)

Go back and set year to 2020. What happened?

Note the capitalization: True and False. These are key words in python. You cannot use them as variable names, although you could use true, TRue or FALSE. I would not recommend you do so.

The True and False values are also mapped to integers. Boolean variables only take on the value 0 (False) or 1 (True).

# These directly create Boolean variables
is_true = True
is_false = False

# They have numerical value
print("Is True equal to 1?", is_true == 1)         # This statement evaluates the question: "Is True equivalent to 1?"
print("Is False equal to 0?", is_false == 0)        # Same thing here: "Is False equal to 0?"

More complicated comparisons

We can build more complicated expressions using and and or. For and all the sub-comparisons need to evaluate to True for the whole comparison to be True. For or only one of the sub-comparisons needs to be true for the whole comparison to be true.

If you have multiple and/or statements on the same line, remember to use parentheses to properly separate them from each other. This ensures the computer recognizes the logical statements in the way you want it to.

x = (2 < 3) and (1 > 0)      # Both sub-comparions are true
print('Is 2<3 and 1>0?', x)

y = (2 < 3) and (1 < 0)      # Only one sub-comparison is true
print('Is 2<3 and 1<0?', y)

z = (2 < 3) or (1 < 0)       # Only one sub-comparison is true
print('Is 2<3 or 1<0?', z)

Comparing strings

Given the nature of data, we might need to compare strings. Remember, programming languages are picky...

state = 'Wisconsin'

is_sconnie = ('wisconsin' == state)
print(is_sconnie)

Case matters. Luckily, python has lots of features to manipulate strings. We will learn some of these as we go along. In this case we use the lower() method of the string class to make the string all lower case. Methods are a way of calling some functions in Python. We tack them on to the end of a variable here to apply the function to that variable.

We are introducing the 'dot' notation without really explaining it yet, but that explanation is coming.

state_lowcase = state.lower()  # We are applying the lower() method to the variable state
print('state, after being lowered:', state_lowcase)

is_sconnie = 'wisconsin' == state_lowcase  
print(state_lowcase, is_sconnie)

# You don't have to store the lowered string separately
is_sconnie = 'wisconsin' == state.lower()  
print(state.lower(), is_sconnie)

Conditional statements

Conditional statements check a condition statement. If the statement is true it evaluates one set of code. If the statement is false it evaluates another set of code.

Important: Earlier, I mentioned that white space doesn't matter around operators like + or * and that we can insert blank lines wherever we want. Here comes a caveat: When we form a conditional, we need exactly four spaces in the lines following the condition statement. The indents define the lines of code that are executed in each branch of the if statement.

quantity = 5

if quantity > 0: 
    # this indented code is the 'if branch'
    print('This print statement occured because the statement is true.')  
    print('The quantity is positive.')
    temp = quantity + 5
    print('When I add 5 to the quantity it is:', temp, '\n')

else:
    # this indented code is the 'else branch'
    print('This print statement occured because the statement is false.') 
    print('The quantity is not positive.\n')


print('This un-indented code runs no matter what.')

Now go back to the code and change quantity to 0, or -10 and run the cell. What happens?
Now go back to the code and change the indentation of the first print statement after if quantity > 0: to be two spaces. Run the cell. What happened?

# The else is optional. 

size = 'md'

if (size == 'sm') or (size == 'md') or (size == 'lg'):
    print('A standard size was requested.\n')

print('This un-indented code runs no matter what.')

Change size to 'xxl'. Run the cell.

Practice: Conditionals

Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. I am here, too.

Edit this markdown cell and write True, False, or error next to each statement.
1 > 2 (False)
'bill' = 'Bill' (False...or is it?)
(1 > 2) or (2*10 < 100) (True, the second expression is true.)
'Dennis' == 'Dennis' (True)

x = 2
0 < x < 5

(True, two comparisons at once!)

x = 0.10
y = 1/10
x == y

(True? This depends on your computer.)

Before you run the code cell below: do you think it will be true or false?
Run the code cell.

x = 1/3
y = 0.3333    # This is an approximation of 1/3
print(x == y)

In the previous cell, add a few more 3s to the end of the definition of y so you get a better approximation of x. Can you get x==y to be true?

Representing a floating point number that does not have a base-2 fractional representation is a problem in all programing languages. It is a limitation of the computer hardware itself. The python documentation has a nice discussion. https://docs.python.org/3.7/tutorial/floatingpoint.html

This will not likely be an issue for us (although it could crop up) but it is a big deal in numerical computing.

Let's introduce a new function that is built into python: the len() function. This computes the length of an object. In the code cell below, try print(len('hello world'))
In the cell below, write some code (use an if statement) that compares two strings. Print out the longer string in all lower case letters and print out the shorter string in all upper case letters. [Hint: the companion to .lower() is .upper()].

Test your code with these two strings.

string1 = 'MemoriaL'
string2 = 'unIon'

The output should be 'memorial' and 'UNION'

The for loop

The conditional statement allows us to selectively run parts of our program. Loops allow us to re-run parts of our code several times, perhaps making some changes each time the code is run. There are several types of loops. The for loop runs a block of code 'for' a fixed number of times.

Here is a basic example.

# loop three times and print out the value of 'i'

for i in range(3):       # The counter variable 'i' can be named anything. 
    print('i =', i )

# Loops don't need to use the counter variable if you just want to repeat
# an action a set number of times

for j in range(3):
    print("Hello!")

Important: Notice the 4-space indent again. In general, the colon tells us that the indented lines below 'belong' to the line of code with the colon.

Ranges

The function range() creates a sequence of whole numbers. With a single argument, it starts at zero, but it can do more. Examples: * range(3) returns 0, 1, 2 * range(2,7) returns 2, 3, 4, 5, 6 * range(0, 10, 2) returns 0, 2, 4, 6, 8 [the third argument is the 'step' size]

Change the code above to try out these ranges.

# A range is python type, like a float or a str
my_range = range(5)
print(type(my_range))

# What happens if I print the range?
print(my_range)

That last print out might not be what you expected. If you want to see the sequence, convert it to a list first.

# Remember what list() does?
print(list(my_range))

Looping over collections (lists, strings, etc)

Looping over a range is the only kind of for loop you can use in languages like C or MATLAB. Python gives us a very easy way to loop over many kinds of things. Here, we loop over a string.

street = 'hello @yyy'

# The variable 'char' could be named anything. 
for char in street:
    print(char)

The same syntax works for looping over a list, too. Remember that, since lists are ordered, we can refer to an element in a list by its index. The code

x = [1, 20, 15]
print(x[2])

will return the value 15.

Here are two ways to loop through a list.

var_names = ['GDP', 'POP', 'INVEST', 'EXPORTS']

# Here is a clunky, C-style way to do this
print('The old-school way:')
for i in range(4):       # i = 0, 1, 2, 3
    print(var_names[i])


# The python way
print('\nThe python way:')
for var in var_names:     # Again, 'var' can be named anything
    print(var)

Wow.

Ranges, lists, and strings are all 'iterable objects'. An iterable object is an object that knows how to return the 'next' element within it. When we iterate over a list, each time the for loop 'asks' for the next element in the list, the list knows how to answer with the next element.

Ranges iterate over whole numbers
Lists iterate over the elements of the list
Strings iterate over the characters
Dicts iterate over the keys
and more...

Iterators are used in places besides loops, too. We will see other uses as we go on. Powerful stuff.

Get the index and the value at the same time

When we looped over the list var_names, our first loop used the index and the second just used the values. We can ask an iterable object for both the index and value at the same time with the enumerate() function (docs).

enumerate() takes an iterable object as an argument and returns a tuple of the index and the value.

# 'i' will hold the index and 'var' will hold the value in var_names[i]

for i, var in enumerate(var_names):     
    print('The index is', i, 'and the value is', var)

Practice: Loops

Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. The TA and I are here, too.

Remember this example from earlier? 1. We have 5 integer observations in our dataset: 1, 3, 8, 3, 9. Unfortunately, the data file ran all the observations together and we are left with the variable raw_data in the cell below. 2. What type is raw_data? 3. Turn raw_data into a list.

raw_data = '13839'
list_data = list(raw_data)
list_data

Is your data ready to be analyzed? Why not?

In the cell below, covert your list to a list of integers. ~~You might try repeating statements like list_data[0]=int(list_data[0])~~ Put a loop to work! (We will see even better ways to do this soon...)

[Hint: You might want to start with an empty list and use the .append() method to add elements to it.]
Loop through the following list: commands = ['go', 'go', 'go', 'stop', 'go', 'go'] If the command is 'go' print out the word 'Green'. If the command is 'stop' print out the word 'Red'.

List comprehensions

List comprehensions provide a very compact syntax to do loops over lists (or other iterable objects). Anything you can do with a list comprehension you can do with a for loop. In this sense, we don't really need to know this, but python programmers love list comprehensions, so you will see them in other people's code. Plus, it's a cool skill to have.

[Programmers call this kind of thing syntactic sugar. It makes the code 'sweeter' for humans to read, but doesn't add functionality to the language. You might want to casually drop this kind of language around your programmer friends.]

Here is a common problem with data in the wild. We would like to check for certain string values, but we have to be careful about cases. To facilitate comparison, let's make all the strings lower case.

First, the loop way

# Some data
class_rank = ['Senior', 'senior', 'Freshman', 'sophomore', 'senior', 'Junior']

# Create a new list with all lower case entries
class_rank_cleaned = []                             # creates an empty list
for datum in class_rank:
    class_rank_cleaned.append(datum.lower())        # append() adds an element to the end of a list

print(class_rank_cleaned)

Not bad. We now have cleaned up data and can do comparisons without worrying about case.

Now, let's roll out a list comprehension.

# Some data
class_rank = ['Senior', 'senior', 'Freshman', 'sophomore', 'senior', 'Junior']

# 'elem' could be anything. It is a counter variable.
class_rank_cleaned_lc = [elem.lower() for elem in class_rank]     

print(class_rank_cleaned_lc)

Very clean. Very easy. Let's break down the list comprehension.

class_rank_cleaned_lc = [elem.lower() for elem in class_rank]

The square brackets [ ] are creating a new list, just like we have done in the past
The code on the left-hand side of for is the operation we want performed on each element of the list
The for elem in class_rank is the for loop syntax, like we have used before.

Let's try another. Before you run the cell, what do you think this code does?

# What does this code do?
sq = [item**2 for item in range(3)]
print(sq)

# What about this code? What does it do?
class_rank_len = [len(elem) for elem in class_rank]
print(class_rank_len)

We can apply a conditional statement so that we only perform an operation on certain elements.

# Seniors rule!
class_rank_caps = [i.upper() for i in class_rank_cleaned if i=='senior']
print(class_rank_caps)

Practice: List comprehensions

Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. The TA and I are here, too.

Here is a list of interest rates: r = [0.01, 0.01, 0.015, 0.02, 0.022] Multiply each of them by 100 to make them percentage interest rates. What happened to 0.022?
Here we go again! Turn raw_data = '13839' into a list of integers. Use a list comprehension. (We've come a long way!)

Try at home (or if you finish early)

A bit harder: create two lists derived from the following list: data_list = [1, 2, 3, 4, 5, 6, 7, 8,] One list should have only the odd numbers and the other list should have only the even numbers. You might use the modulo operator % which yields the remainder from the division of the first argument by the second.' For example 3%2 = 1.