download notebook
view notebook w/o solutions
Python basics: Part 3
files needed = none
Before we can start working with data, we need to work out some of the basics of Python. The goal is to learn enough so that we can do some interesting data work—we do not need to be Python Jedi.
We now know about the basic data structures in python, how types work, and how to do some basic computation and string manipulation. What we need next is flow control.
A python program is a list of statements. The python interpreter reads those statement from top to bottom and executes them. Depending on what is happening in our program, we may want to skip some statements, or repeat some other statements. Flow control statements manage this process.
In this notebook we will cover
- The bool type
- The if/else statement
- The for loop
- List comprehensions
Remember: Ask questions as we go.
Bool
Flow control often requires asking whether a statement is true or false and then taking an action conditional on the answer. For example: Is this variable a string? If yes, convert to a float. If not, do nothing.
The python type bool
can take on two values: True
or False
. Let's see it in action.
my_age = 41
# Here we compare my_age to see if it is less than 18
is_a_minor = my_age < 18
print(is_a_minor)
print(type(is_a_minor))
False
<class 'bool'>
The comparison operators we will often use are
<
(less than)>
(greater than)<=
(less than or equal to)>=
(greater than or equal to)==
(equal)!=
(not equal)
Important: We use a double equal sign ==
to check for equality and a single equal sign for assignment.
# A bit of code to see if the variable 'year' is equal to the current year
year = 2019
current_year = 2020
is_current_year = (current_year == year) # the parentheses are not needed, but I like them for clarity
print(is_current_year)
False
Go back and set year
to 2020. What happened?
Note the capitalization: True
and False
. These are key words in python. You cannot use them as variable names, although you could use true
, TRue
or FALSE
. I would not recommend you do so.
The True
and False
values are also mapped to integers. Boolean variables only take on the value 0 (False
) or 1 (True
).
# These directly create Boolean variables
is_true = True
is_false = False
# They have numerical value
print("Is True equal to 1?", is_true == 1) # This statement evaluates the question: "Is True equivalent to 1?"
print("Is False equal to 0?", is_false == 0) # Same thing here: "Is False equal to 0?"
Is True equal to 1? True
Is False equal to 0? True
More complicated comparisons
We can build more complicated expressions using and
and or
. For and
all the sub-comparisons need to evaluate to True
for the whole comparison to be True
. For or
only one of the sub-comparisons needs to be true for the whole comparison to be true.
If you have multiple and
/or
statements on the same line, remember to use parentheses to properly separate them from each other. This ensures the computer recognizes the logical statements in the way you want it to.
x = (2 < 3) and (1 > 0) # Both sub-comparions are true
print('Is 2<3 and 1>0?', x)
y = (2 < 3) and (1 < 0) # Only one sub-comparison is true
print('Is 2<3 and 1<0?', y)
z = (2 < 3) or (1 < 0) # Only one sub-comparison is true
print('Is 2<3 or 1<0?', z)
Is 2<3 and 1>0? True
Is 2<3 and 1<0? False
Is 2<3 or 1<0? True
Comparing strings
Given the nature of data, we might need to compare strings. Remember, programming languages are picky...
state = 'Wisconsin'
is_sconnie = ('wisconsin' == state)
print(is_sconnie)
False
Case matters. Luckily, python has lots of features to manipulate strings. We will learn some of these as we go along. In this case we use the lower()
method of the string class to make the string all lower case. Methods are a way of calling some functions in Python. We tack them on to the end of a variable here to apply the function to that variable.
We are introducing the 'dot' notation without really explaining it yet, but that explanation is coming.
state_lowcase = state.lower() # We are applying the lower() method to the variable state
print('state, after being lowered:', state_lowcase)
is_sconnie = 'wisconsin' == state_lowcase
print(state_lowcase, is_sconnie)
state, after being lowered: wisconsin
wisconsin True
# You don't have to store the lowered string separately
is_sconnie = 'wisconsin' == state.lower()
print(state.lower(), is_sconnie)
wisconsin True
Conditional statements
Conditional statements check a condition statement. If the statement is true it evaluates one set of code. If the statement is false it evaluates another set of code.
Important: Earlier, I mentioned that white space doesn't matter around operators like +
or *
and that we can insert blank lines wherever we want. Here comes a caveat: When we form a conditional, we need exactly four spaces in the lines following the condition statement. The indents define the lines of code that are executed in each branch of the if
statement.
quantity = 5
if quantity > 0:
# this indented code is the 'if branch'
print('This print statement occured because the statement is true.')
print('The quantity is positive.')
temp = quantity + 5
print('When I add 5 to the quantity it is:', temp, '\n')
else:
# this indented code is the 'else branch'
print('This print statement occured because the statement is false.')
print('The quantity is not positive.\n')
print('This un-indented code runs no matter what.')
This print statement occured because the statement is true.
The quantity is positive.
When I add 5 to the quantity it is: 10
This un-indented code runs no matter what.
-
Now go back to the code and change quantity to 0, or -10 and run the cell. What happens?
-
Now go back to the code and change the indentation of the first print statement after
if quantity > 0:
to be two spaces. Run the cell. What happened?
# The else is optional.
size = 'md'
if (size == 'sm') or (size == 'md') or (size == 'lg'):
print('A standard size was requested.\n')
print('This un-indented code runs no matter what.')
A standard size was requested.
This un-indented code runs no matter what.
Change size to 'xxl'
. Run the cell.
Practice: Conditionals
Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. I am here, too.
-
Edit this markdown cell and write True, False, or error next to each statement.
-
1 > 2
(False) 'bill' = 'Bill'
(False...or is it?)(1 > 2) or (2*10 < 100)
(True, the second expression is true.)'Dennis' == 'Dennis'
(True)
x = 2
0 < x < 5
(True, two comparisons at once!)
x = 0.10
y = 1/10
x == y
(True? This depends on your computer.)
- Before you run the code cell below: do you think it will be true or false?
- Run the code cell.
x = 1/3
y = 0.3333 # This is an approximation of 1/3
print(x == y)
False
In the previous cell, add a few more 3s to the end of the definition of y
so you get a better approximation of x
. Can you get x==y
to be true?
Representing a floating point number that does not have a base-2 fractional representation is a problem in all programing languages. It is a limitation of the computer hardware itself. The python documentation has a nice discussion. https://docs.python.org/3.7/tutorial/floatingpoint.html
This will not likely be an issue for us (although it could crop up) but it is a big deal in numerical computing.
- Let's introduce a new function that is built into python: the
len()
function. This computes the length of an object. In the code cell below, tryprint(len('hello world'))
print(len('hello world'))
11
- In the cell below, write some code (use an if statement) that compares two strings. Print out the longer string in all lower case letters and print out the shorter string in all upper case letters. [Hint: the companion to
.lower()
is.upper()
].
Test your code with these two strings.
string1 = 'MemoriaL'
string2 = 'unIon'
The output should be 'memorial' and 'UNION'
string1 = 'MemoriaL'
string2 = 'unIon'
if len(string1) < len(string2):
print(string2.lower(), string1.upper())
else:
print(string1.lower(), string2.upper())
memorial UNION
The for loop
The conditional statement allows us to selectively run parts of our program. Loops allow us to re-run parts of our code several times, perhaps making some changes each time the code is run. There are several types of loops. The for
loop runs a block of code 'for' a fixed number of times.
Here is a basic example.
# loop three times and print out the value of 'i'
for i in range(3): # The counter variable 'i' can be named anything.
print('i =', i )
i = 0
i = 1
i = 2
# Loops don't need to use the counter variable if you just want to repeat
# an action a set number of times
for j in range(3):
print("Hello!")
Hello!
Hello!
Hello!
Important: Notice the 4-space indent again. In general, the colon tells us that the indented lines below 'belong' to the line of code with the colon.
Ranges
The function range()
creates a sequence of whole numbers. With a single argument, it starts at zero, but it can do more. Examples:
* range(3)
returns 0, 1, 2
* range(2,7)
returns 2, 3, 4, 5, 6
* range(0, 10, 2)
returns 0, 2, 4, 6, 8 [the third argument is the 'step' size]
Change the code above to try out these ranges.
# A range is python type, like a float or a str
my_range = range(5)
print(type(my_range))
# What happens if I print the range?
print(my_range)
<class 'range'>
range(0, 5)
That last print out might not be what you expected. If you want to see the sequence, convert it to a list first.
# Remember what list() does?
print(list(my_range))
[0, 1, 2, 3, 4]
Looping over collections (lists, strings, etc)
Looping over a range is the only kind of for loop you can use in languages like C or MATLAB. Python gives us a very easy way to loop over many kinds of things. Here, we loop over a string.
street = 'hello @yyy'
# The variable 'char' could be named anything.
for char in street:
print(char)
h
e
l
l
o
@
y
y
y
The same syntax works for looping over a list, too. Remember that, since lists are ordered, we can refer to an element in a list by its index. The code
x = [1, 20, 15]
print(x[2])
will return the value 15.
Here are two ways to loop through a list.
var_names = ['GDP', 'POP', 'INVEST', 'EXPORTS']
# Here is a clunky, C-style way to do this
print('The old-school way:')
for i in range(4): # i = 0, 1, 2, 3
print(var_names[i])
# The python way
print('\nThe python way:')
for var in var_names: # Again, 'var' can be named anything
print(var)
The old-school way:
GDP
POP
INVEST
EXPORTS
The python way:
GDP
POP
INVEST
EXPORTS
Wow.
Ranges, lists, and strings are all 'iterable objects'. An iterable object is an object that knows how to return the 'next' element within it. When we iterate over a list, each time the for loop 'asks' for the next element in the list, the list knows how to answer with the next element.
- Ranges iterate over whole numbers
- Lists iterate over the elements of the list
- Strings iterate over the characters
- Dicts iterate over the keys
- and more...
Iterators are used in places besides loops, too. We will see other uses as we go on. Powerful stuff.
Get the index and the value at the same time
When we looped over the list var_names
, our first loop used the index and the second just used the values. We can ask an iterable object for both the index and value at the same time with the enumerate()
function (docs).
enumerate()
takes an iterable object as an argument and returns a tuple of the index and the value.
# 'i' will hold the index and 'var' will hold the value in var_names[i]
for i, var in enumerate(var_names):
print('The index is', i, 'and the value is', var)
The index is 0 and the value is GDP
The index is 1 and the value is POP
The index is 2 and the value is INVEST
The index is 3 and the value is EXPORTS
Practice: Loops
Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. The TA and I are here, too.
Remember this example from earlier?
1. We have 5 integer observations in our dataset: 1, 3, 8, 3, 9. Unfortunately, the data file ran all the observations together and we are left with the variable raw_data
in the cell below.
2. What type is raw_data?
3. Turn raw_data into a list.
raw_data = '13839'
list_data = list(raw_data)
list_data
['1', '3', '8', '3', '9']
Is your data ready to be analyzed? Why not?
- In the cell below, covert your list to a list of integers. ~~You might try repeating statements like
list_data[0]=int(list_data[0])
~~ Put a loop to work! (We will see even better ways to do this soon...)
[Hint: You might want to start with an empty list and use the.append()
method to add elements to it.]
# This way replaces list_data with the integers.
for i, var in enumerate(list_data):
list_data[i] = int(var)
print(list_data)
[1, 3, 8, 3, 9]
# This way creates a new list and fills it with integers.
int_list = [] # An empty list
for var in list_data:
int_list.append(int(var))
print(int_list)
[1, 3, 8, 3, 9]
- Loop through the following list:
commands = ['go', 'go', 'go', 'stop', 'go', 'go']
If the command is 'go' print out the word 'Green'. If the command is 'stop' print out the word 'Red'.
commands = ['go', 'go', 'go', 'stop', 'go', 'go']
# Here I use two if statements rather than an else. Maybe there are directions in the command list that are neither stop nor go?
for direction in commands:
if direction == 'go':
print('Green')
if direction == 'stop':
print('Red')
Green
Green
Green
Red
Green
Green
List comprehensions
List comprehensions provide a very compact syntax to do loops over lists (or other iterable objects). Anything you can do with a list comprehension you can do with a for loop. In this sense, we don't really need to know this, but python programmers love list comprehensions, so you will see them in other people's code. Plus, it's a cool skill to have.
[Programmers call this kind of thing syntactic sugar. It makes the code 'sweeter' for humans to read, but doesn't add functionality to the language. You might want to casually drop this kind of language around your programmer friends.]
Here is a common problem with data in the wild. We would like to check for certain string values, but we have to be careful about cases. To facilitate comparison, let's make all the strings lower case.
First, the loop way
# Some data
class_rank = ['Senior', 'senior', 'Freshman', 'sophomore', 'senior', 'Junior']
# Create a new list with all lower case entries
class_rank_cleaned = [] # creates an empty list
for datum in class_rank:
class_rank_cleaned.append(datum.lower()) # append() adds an element to the end of a list
print(class_rank_cleaned)
['senior', 'senior', 'freshman', 'sophomore', 'senior', 'junior']
Not bad. We now have cleaned up data and can do comparisons without worrying about case.
Now, let's roll out a list comprehension.
# Some data
class_rank = ['Senior', 'senior', 'Freshman', 'sophomore', 'senior', 'Junior']
# 'elem' could be anything. It is a counter variable.
class_rank_cleaned_lc = [elem.lower() for elem in class_rank]
print(class_rank_cleaned_lc)
['senior', 'senior', 'freshman', 'sophomore', 'senior', 'junior']
Very clean. Very easy. Let's break down the list comprehension.
class_rank_cleaned_lc = [elem.lower() for elem in class_rank]
- The square brackets [ ] are creating a new list, just like we have done in the past
- The code on the left-hand side of
for
is the operation we want performed on each element of the list - The
for elem in class_rank
is the for loop syntax, like we have used before.
Let's try another. Before you run the cell, what do you think this code does?
# What does this code do?
sq = [item**2 for item in range(3)]
print(sq)
[0, 1, 4]
# What about this code? What does it do?
class_rank_len = [len(elem) for elem in class_rank]
print(class_rank_len)
[6, 6, 8, 9, 6, 6]
We can apply a conditional statement so that we only perform an operation on certain elements.
# Seniors rule!
class_rank_caps = [i.upper() for i in class_rank_cleaned if i=='senior']
print(class_rank_caps)
['SENIOR', 'SENIOR', 'SENIOR']
Practice: List comprehensions
Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. The TA and I are here, too.
- Here is a list of interest rates:
r = [0.01, 0.01, 0.015, 0.02, 0.022]
Multiply each of them by 100 to make them percentage interest rates. What happened to 0.022?
# Part 1
r = [0.01, 0.01, 0.015, 0.02, 0.022]
r_pct = [i*100 for i in r]
print(r_pct)
[1.0, 1.0, 1.5, 2.0, 2.1999999999999997]
- Here we go again! Turn
raw_data = '13839'
into a list of integers. Use a list comprehension. (We've come a long way!)
# Part 2
raw_data = '13839'
int_data = [int(x) for x in raw_data]
print(int_data)
[1, 3, 8, 3, 9]
Try at home (or if you finish early)
A bit harder: create two lists derived from the following list: data_list = [1, 2, 3, 4, 5, 6, 7, 8,]
One list should have only the odd numbers and the other list should have only the even numbers. You might use the modulo operator %
which yields the remainder from the division of the first argument by the second.' For example 3%2 = 1.
# Part 3
data_list = [1, 2, 3, 4, 5, 6, 7, 8,]
evens = [i for i in data_list if i%2==0]
odds = [i for i in data_list if i%2!=0]
print('The evens are:', evens)
print('The odds are:', odds)
The evens are: [2, 4, 6, 8]
The odds are: [1, 3, 5, 7]