download notebook
view notebook w/ solutions
Python basics: Part 2
files needed = none
Before we can start working with data, we need to work out some of the basics of Python. The goal is to learn enough so that we can do some interesting data work—we do not need to be Python Jedi.
Lists, tuples, and dicts are very powerful. We could spend weeks going through all the things we can do with them. Instead, we will cover some basics here, and add to our knowledge as needed.
In this notebook we will cover (the terms ordered and mutable will make sense by the time we are done here)
- Lists (ordered, mutable)
- Tuples (ordered, immutable)
- Dictionaries (unordered, mutable)
- More on types
Remember: Ask questions as we go.
Data structures
The types: float
, int
, and str
are scalar types. Think of them as the individual data types. An important—and very powerful—part of Python are its data structures which collect scalar types (and also other data structures) together. These data structures include: list
, tuple
, and dict
. We will use lists and dicts (dictionaries) extensively. These specific structures are a kind of compound data type, similar to what we consider to be "sets" or "arrays" in mathematics.
Lists
A list is an ordered and modifiable (in Pythonese: mutable) collection of objects. We will use lists a lot. Let's try some out.
# You define a list using square brackets
number_list = [2, 3, 5, 8] # a list of numeric data (integers)
string_list = ['university', 'of', 'wisconsin', 'madison'] # a list of strings
# Notice that the print function understands types. We have passed it
# floats, ints, strs, and now lists, and it 'knows' how to print them out.
print('The string list:')
print(string_list)
print(type(string_list))
print('\n') # '\n' is just a string. \n is the special character that creates a new line
print('The number list:')
print(number_list)
print("number_list's type", type(number_list))
Note that the "type" of a list
is not dependent on its contents. Using the type
command on the entirety of number_list
only shows you information about the array.
We can make lists of mixed types. This would not work in many languages.
# Some numbers and some strings
mixed_list = [1, 25, 'biochemistry', 3, 'foo' ] # 'foo' is a programmer's favorite generic placeholder
print('The mixed list:')
print(mixed_list)
We can access an element of a list using square brackets, like this:
print(mixed_list[0], mixed_list[2], '\n') # print out the first and third elements of the list
print(type(mixed_list[0]), type(mixed_list[2])) # print out the types of the first and third elements
Key concept: Lists are ordered, like an array in many other languages.
Important: The list index starts with 0. (In some languages, the list index starts with 1.) This means that the last element of a list is the number of elements it contains minus 1. To access the end of a list (in the event we don't know its length, for example), we can use negative indexing:
print("length of mixed_list is ", len(mixed_list)) # the len() function shows us the length of the list
print("last element is ", mixed_list[4])
# Negative indexing works just like positive indexing, just backwards!
print("mixed_list[-1] =", mixed_list[-1])
print("mixed_list[-2] =", mixed_list[-2])
The code below shows us two python features.
- We can concatenate lists using the + operator
- We can create a list on the same line we assign it
temp_list = ['Dane', 'County', 3]
long_mixed_list_1 = mixed_list + temp_list # This concatenates temp_list and mixed_list
# The next line of code does two things. What are they?
long_mixed_list_2 = mixed_list + ['Dane', 'County', 3]
print('long_mixed_list_1:', long_mixed_list_1, '\n')
print('long_mixed_list_2:', long_mixed_list_2, '\n')
The +
operator works like the print()
function: it 'knows' what kinds of objects it is working with (lists, ints, strings) and takes the appropriate action. Everything, however, has its limits.
# What does this do?
long_mixed_list_3 = mixed_list + 'Bucky'
print(long_mixed_list_3)
The +
operator is not set up to concatenate a list to a string. We can see this in the TypeError
message.
long_mixed_list_3 = mixed_list + ['Bucky']
print(long_mixed_list_3)
Because lists are mutable, we can assign new values to them.
print('Before I change the first element:', long_mixed_list_2, '\n')
# Change the first element from 1 to 50
long_mixed_list_2[0] = 50
print('After I changed the first element:', long_mixed_list_2)
Lists are not limited to scalar types. What Type
is each element in the following list?
xzibit_list = [1, 'oak', [3.2, 5, 'elm']]
print(xzibit_list)
print('element 1', type(xzibit_list[0]))
print('element 2', type(xzibit_list[1]))
print('element 3', type(xzibit_list[2]))
We can have a list be an element within a list!
How would you print (or otherwise access) the [3.2, 5, 'elm']
list?
How would you print 'elm'
?
print('The sublist: ', xzibit_list[2])
print('Is this elm?', xzibit_list[2][2])
Note that lists can be empty.
empty_list = []
print(empty_list)
print("length of list: ", len(empty_list))
Practice: Lists
Take a few minutes and try the following. Feel free to chat with those around if you get stuck. The TA and I are here, too.
Insert a code cell and try these out
- Create a list containing the integers
1, 2, 3
. Name itmy_int_list
. - Create a list containing
1, 2, 3
where each number is a string, not an int. Name itmy_string_list
. - What is the
Type
ofmy_int_list
andmy_string_list
? Print out the types.
Insert another code cell and
- Concatenate
my_int_list
andmy_string_list
. Name the new listmy_super_list
. - Print out your super list.
- In your super list, change the integer 2 to your favorite number.
- In your super list, change the string 3 to your least favorite number.
- Print your super list. If you made a mistake, go back and fix it.
Tuples
Tuples are collections of objects, like lists, but they are immutable: once they are created, they cannot be changed. We will not use tuples that often, but they will pop up, so we should be ready.
We create a tuple with round brackets.
# You define a tuple using round brackets
number_tuple = (2, 3, 5, 8) #a tuple of numeric data
string_tuple = ('university', 'of', 'wisconsin', 'madison') # a tuple of strings
print('number_tuple type:', type(number_tuple))
print('number_tuple:', number_tuple, '\n')
print('string_tuple type:', type(string_tuple))
print('string_tuple:', string_tuple)
Notice the printed output: round vs. square brackets. Thanks print()
!
Now, let's see how immutability works.
# Change the second element of the list to 1000
number_list[1] = 1000
print(number_list)
# Now try that with a tuple
number_tuple[1] = 1000
This property of tuples is useful if you have data that you want to protect from being accidentally changed.
Dictionaries (dicts)
Dicts are unordered key-value pairs. Each element of a dict is made of a key and its associated value. The keys must be unique, but the values do not need to be. We create dicts with curly brackets. It's easiest to understand with some examples.
# This is a dict with five elements
grades = {'A':4.0, 'B':3.0, 'C':2.0, 'D':1.0, 'F':0.0} # I am associating the key A with the value 4.0
print(type(grades))
print(grades)
We use the keys to refer to the values.
print(grades['B'], grades['D'])
# What happens here? Will this return 'B'?
print(grades[3.0])
You get an error here because a dictionary can only be referenced by its keys–never by its values. The error message says KeyError
because it is looking for a key named 3.0, and that key doesn't exist.
# let's try looking for the numeric grade associated with withdrawing from class
print(grades['W'])
We get the same KeyError
, again, because the key W
does not exist.
We can add to our dictionary (it is mutable). We can also change the value of an existing entry.
# Let's add a withdrawal score
grades['W'] = 0.0
print(grades, '\n')
# Our grading is generous!
grades['F'] = 0.5
print(grades)
Practice: Dicts
Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. I am here, too.
Insert a code cell and try these out
- Create a dict with keys: 'coke', 'pepsi', 'root beer', and 'fanta' and give each key a value that corresponds to your rating on a 1 to 10 scale with 1 being the worst and 10 the best. For example, I would rate root beer a 9.
- Print your dict.
- Can you rank more than one soda a 2?
Insert a second cell and
- The coca-cola corp is hiring you as a celebrity endorser. Change your ranking for coke to a 10.
- Change your ranking of pepsi to a 1.
- Print your dict.
Here is a challenging but important example from my old colleagues at NYU Stern:
Consider the dictionary
data = {'Year': [1990, 2000, 2010], 'GDP': [8.95, 12.56, 14.78]}
What are the keys here? The values? What do you think this dictionary represents?
More on types
We have seen several types now: int
, float
, str
, tuple
, dict
, and list
. There a few more to come this semester and many more that we will not address.
Types are great because many of the functions [like print()
] and operators [like +
] automatically know how to handle objects of different types. We don't need functions like print_string()
, print_int()
, print_list()
...
Types are also great because they keep us from doing dumb things, like trying to add an int
and a str
. There are languages that will not stop you from adding a string and an integer, even though the result will be garbage.
x_int = 10
x_string = '10'
x = x_int + x_string
Changing types
Here is something that comes up a lot in the 'wild.' You have a file with some data in it. When you read the file into your program, the numeric data are strings. Not good. (What is the difference between y = 3 + 2
and y = '3' + '2'
?)
Python gives us an easy way to change a variable's type.
# https://en.wikipedia.org/wiki/Golden_ratio
golden_ratio_s = '1.6180339'
print(type(golden_ratio_s))
print(golden_ratio_s)
# Now turn the string into a float
golden_ratio_f = float(golden_ratio_s)
print('golden_ratio_f is of type:', type(golden_ratio_f))
print(golden_ratio_f)
We just 'cast' the string variable to a float variable. Can we do the reverse, and cast the float to a string?
# You can 'cast' the float back to a string with str()
golden_ratio_s_2 = str(golden_ratio_f)
print('golden_ratio_s_2 is of type:', type(golden_ratio_s_2))
I am feeling pretty powerful right now.
Can we turn the string into an int?
We used float()
to cast to a float, str()
to cast to a string.
We use the int()
function to cast things to ints.
golden_ratio_i = int(golden_ratio_s) #Let's try from the string version
type(example)
Nope! The int()
doesn't know how to convert a str with a decimal point float to an int.
What if we tried to convert to an int from a float?
golden_ratio_i = int(golden_ratio_f) #Let's try from the float version
print(golden_ratio_i)
What just happened? It did something, but it isn't obvious what int()
should do to a float: Should it round it up? Round it down? Truncate it? There is no obvious way to turn a float into an int.
If we look at the documentation we will see that int()
truncates floats (cuts off the decimal), rather than rounding them in one direction or the other.
We can convert types with list()
and tuple()
, too.
x = [0, 1, 2, 3] # what type is this?
x_tup = tuple(x)
print("Can you tell x_tup's type from looking at the printout?", x_tup)
# Another conversion that is often useful
y = list('on wisconsin')
print(y)
Practice: Types
Take a few minutes and try the following. Feel free to chat with those around if you get stuck. I am here, too.
- We have 5 integer observations in our dataset: 1, 3, 8, 3, 9. Unfortunately, the data file ran all the observations together and we are left with the variable
raw_data
in the cell below. - What type is raw_data?
- Turn raw_data into a list. Print it.
raw_data = '13839'
Is your data ready to be analyzed? Why not?
- In the cell below, convert your list to a list of integers. You might try repeating statements like
list_data[0]=int(list_data[0])
- Print out your list of integers.
That worked for our small list, but imagine having a list of several thousand elements. This approach will not work, but it introduced us to a common problem with data in the wild: numbers stored as text.
We will soon learn that Python has very powerful and simple ways to repeatedly apply operations to lists.