download notebook
view notebook w/ solutions
Python basics: Part 4
files needed = none
Before we can start working with data, we need to work out some of the basics of Python. The goal is to learn enough so that we can do some interesting data work—we do not need to be Python Jedi.
We now know about the basic data structures in python, how types work, and how to do some basic computation and string manipulation. We can use flow control statements to steer our program to different blocks of code depending on conditional statements and we have sorted out loops and list comprehensions.
Up next are a few more important topics before we get started with pandas. We will cover
- Slicing
- User defined functions
- Objects and TAB completion
Slicing
Slicing is an important part of python life. We slice a list (or a tuple or a string) when we take a subset of it. As you can probably imagine, slicing will be a common thing we do with data. We often want to grab slices of the data set and analyze them.
The slice syntax uses square brackets—even if we are slicing a string or a tuple. The basic command is
some_list[start:stop:stride]
start
is the first element to include in the slicestop
is the first element we do NOT includestride
is the step size
Notice that the start
is inclusive and the stop
is exclusive. Think of a slice as a half open interval in mathematics: [start, stop). We include start
in the interval but exclude stop
.
The default stride is 1, meaning take every element from [start, stop).
some_list = [5, 6, 7, 8, 9]
print(some_list[0:2]) # indexes start with zero; stride defaults to 1
print(some_list[0:2:1]) # this should be the same
print(some_list[0:5:2]) # take every other element
# take a slice out of the middle
print(some_list[1:3]) #take the second element and the third element
If we want to take a start
and then 'everything to the end' we just leave the second argument blank. A similar syntax for taking everything from the beginning to a stop
.
print(some_list[2:]) # the third element to the end of the list
print(some_list[:4]) # everything up to but not including the fifth element
Slice arguments can be negative (we first saw this in python_basics_2). When we use a negative number for start
or stop
, we are telling python to count from the end of the list.
print(some_list[:-1]) # all but the last one
print(some_list[:-2]) # all but the last two
print(some_list[-4:-2]) # ugh (again, we don't take the -2 value)
# [5 | 6 | 7 | 8 | 9] # The list
# -5 -4 -3 -2 -1 # backwards index
# 0 1 2 3 4 # forwards index
If we use a negative number for the stride
argument, we iterate backwards.
print(some_list[::-1]) # print the list out backwards
print(some_list[4:1:-1]) # we are counting backwards, so be careful about start and stop
# start at the [4] element in the list and end at the [2]
# Don't forget, we can do this with strings, too
slogan = 'onward'
print(slogan[:2]) # just print 'on'
print(slogan[::-1]) # backwards
Practice: Slices
Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. I am here, too
- Create the variable
boss = 'Ananth Seshadri'
- Slice
boss
to create the variablesfirst_name
andlast_name
- Redo question two to create
first_name_neg
andlast_name_neg
by slicingboss
using the negative number notation that counts from the end of the list.
Consider this list of sorted data.
x_sorted = [10, 40, 100, 1000, 50000]
- Print out the 3 largest elements
- Print out the two smallest elements
[Try coding 4. and 5. as if you did not know the length of x_sorted
.]
User-defined functions
We have seen some of python's built-in functions: print()
, type()
, and len()
. Like many other languages, python allows users to create their own functions.
Using functions lets us (or someone else) write and debug the code once—then we can reuse it. Very powerful stuff. Here is a simple example:
def lb_to_kg(pounds):
"""
Input a weight in pounds. Return the weight in kilograms.
"""
# 1 pound = 0.453592 kilos...
kilos = pounds * 0.453592
# This is the value the function returns
return kilos
When you run the cell above, it looks like nothing happened, but python read the code and created the function.
We can use the whos
statement (a jupyter notebook 'magic' command) to learn about what objects are in the namespace. [A namespace is a list of all the objects we have created and the names we have assigned them.]
whos
We can see the variables we have created earlier as well as the function lb_to_kg
. Notice functions are of type function
. Just like any other variable, lb_to_kg
is loaded into the namespace.
You must have the function located and executed in your program prior to any attempts to use it, otherwise you will get an error. Remember, the python interpreter reads your code and executes it line-by-line.
Now that our function is defined and loaded into memory, we are ready to use it.
car_weight_pounds = 5000
car_weight_kilos = lb_to_kg(car_weight_pounds)
print('The car weighs', car_weight_kilos, 'kilos.')
Inputs and outputs
-
The inputs to the function are called arguments. The names we give the arguments in our function definiton (e.g.,
pounds
) become the variable names we use in the function code. -
The
return
command is what pushes our result back to the code that 'called' the function.
Since it is our function, we have to handle potentially bad inputs, or python will throw an error.
def lb_to_kg_v2(pounds):
"""
Input a weight in pounds. Return the weight in kilograms.
"""
if type(pounds)==float or type(pounds)== int: # check that pounds is an allowable type
kilos = pounds * 0.453592 # 1 pound = 0.453592 kilos...
return kilos # this is the value the function returns
else:
print('error: lb_to_kg_v2 only takes integers or floats.')
return -99
truck_weight_pounds = '5000' #A classic problem with real data
truck_weight_kilos = lb_to_kg_v2(truck_weight_pounds)
print('The truck weighs', truck_weight_kilos, 'kilos.')
How much time you spend writing code that is safe from errors is a tradeoff between your time and how robust your code needs to be. Life is all about tradeoffs.
We can have functions with several input variables:
def name_fixer(first, middle, last):
"""
Fix any capitalization problems and create a single variable with the complete name.
"""
# The string method title() makes the first letter capital
return first.title() + ' ' + middle.title() + ' ' + last.title()
mascot_first = 'bucKingham'
mascot_middle = 'u'
mascot_last = 'badger'
full_name = name_fixer(mascot_first, mascot_middle, mascot_last)
print(full_name)
We can assign several return variables. This is called multiple assignment. First, let's look at multiple assignment outside of a function, then we use it in a function.
# This is an example of multiple assignment.
# Assign 'foo' to a and 10 to b... all in one statement.
a, b = 'foo', 10
print(a, b)
Multiple assignment lets us return several variables from a function.
def temp_converter(temp_in_fahrenheit):
"""
Takes a temperature in fahrenheit and returns it in celsius and in kelvin.
"""
temp_in_celsius = (temp_in_fahrenheit - 32) * 5/9
temp_in_kelvin = (temp_in_fahrenheit + 459.67) * 5/9
return temp_in_celsius, temp_in_kelvin
# Note that I am defining the function and using it in the same code cell.
# The code below is NOT part of the function definition. We can see that because it is not indented.
t_f = 65 # temp in fahrenheit
t_c, t_k = temp_converter(t_f)
print('{0:5.1f} degrees fahrenheit is {1:5.2f} degrees celsius and {2:5.2f} degrees kelvin.'.format(t_f, t_c, t_k))
Function scope
- Any variable defined within your function is not accessible outside of that function. We say that these variables are local to the function. Local variables do not show up in
whos
.
It is possible to create global variables which are defined within your broader program, and then accessible to all the functions you write. This is not a good coding practice. We want our functions to be self-contained so that they can be reused.
print(temp_in_kelvin)
whos
Documentation
You may have noticed that we write a triple-quote comment at the beginning of our functions. This is called a docstring, and we use it to tell others what the function does. Remember the '?' operator? Give it a try below.
name_fixer?
Practice: Functions
Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. I am here, too.
- Write a change-counting function. Pass the function the number of pennies, nickels, dimes, and quarters, and return the value of the coins.
Test it with 5 pennies, 4 dimes, 2 quarters = 95 cents.
- Modify the
name_fixer()
function to return both the fixed-up full name and the length of the full name. Use multiple assignment.
Test it with "nelsoN websTER DEweY" (Who?)
- Back in python 1, we worked on the problem:
- In a code cell, set
m=2
andn=3
. Write some code that swaps the values ofm
andn
.
Back then, we created a temp variable to help us make the swap. Insert a code cell below and use multiple assignment to swap m
and n
in one line of code.
Objects and TAB completion
Like c++ or javascript, python is an object-oriented language. This is a topic that a computer science course could devote weeks to, but our goal is simpler: let's understand objects enough to use them well.
Everything in python is an object. The variables we have been creating are objects. The functions we have written are objects. Objects are useful because they have attributes and methods associated with them. What attributes and methods an object has, depends on the object's type. Let's take lists for example.
list_1 = ['a', 'b', 'c']
list_2 = [4, 5, 6, 7, 8]
Attributes describe the object. Both lists are objects and both have type list
, but their attributes are different. For example, a list's length is an attribute: list_1
is of length 3, while list_2
is of length 5.
Methods are like functions that are attached to an object. Different types of objects have different methods available. Methods implement operations that we often use with a particular data type. We access methods with the 'dot' notation.
list_1.method()
where method()
is a method associated with the list
class. We have been using the .lower()
, .upper()
, and .title()
methods of the string class already. We have used the .append()
method of the list class.
list_1 = ['a', 'c', 'b']
list_1.sort() # the sort() method from the 'list' class
print(list_1)
How do we find out what methods are available for an object? Google is always a good way. You can also use help()
with the class name. help(str)
for strings, help(list)
for lists. Python's documentation under "string types" shows us what is available for strings, for example.
Important: We can also use TAB completion in jupyter notebooks. Type list_1.
in the cell below and hit the TAB key.
list_1
The TAB gives us a list of possible methods. We have already seen .append()
. .reverse()
looks interesting. Let's give it a try.
TAB completion is also there to make it easier to reference variables in the namespace. Insert a code cell and start typing lis
and hit tab. It should bring up a list of variables in the namespace that start with 'lis'. This is handy: it saves typing and avoids errors from typos.
Practice: Objects and TAB completion
Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. I am here, too.
-
Suppose you have data
gdp = '18,570.50'
, which is a string. Convert the variable to a float. Use TAB completion (and Google, if needed) to find a method that removes the comma. -
Sort the list below.
scores = [50, 32, 78, 99, 39, 75]
- Using TAB completion and/or the object inspector, use methods of the list type to insert
new_score
intoscores
in the correct position so that the list stays sorted.
new_score = 85