download notebook
view notebook w/o solutions
Cleaning and transforming data
Files needed = ('atussum_2017.dat', 'movies.csv')
In this notebook we will continue practicing techniques for manipulating data into forms that are amenable to analysis. We will cover:
.replace()
andunique()
for recoding variables.map()
and.applymap()
for working element-wise on DataFrames- String methods for working with strings in DataFrames
We will also learn about the American Time Use Survey.
import pandas as pd # pandas for data handling
import matplotlib.pyplot as plt # matplotlib for plots
import numpy as np # numpy for numerical methods
American Time Use Survey (ATUS)
The Bureau of Labor Statistics oversees the American Time Use Survey, which asks a sample of Americans to complete detailed diaries keeping track of each minute of their day.
Follow this link www.bls.gov/tus/data/datafiles_2017.htm to the page for the 2017 survey. Download the ATUS 2017 Activity summary file (zip) file located in the 2017 Basic ATUS Data Files section of the page. Alternatively, download it directly www.bls.gov/tus/datafiles/atussum_2017.zip.
Unzip the file. We are looking for atussum_2017.dat
. It is a coma separated file (even though it has a '.dat' extension). Let's get it loaded.
Variables
This data set has 421 variables! That's too many for us today. Let's just keep a few.
The demographic variables are all uppercase letters. The time variables are of the form 'txxyyzz' where xx is the major category code, yy is the second-tier code, and zz is the third-tier code. (docs)
Let's keep some demographic data and some data about working and sleeping.
variables = {'TEAGE':'age', 'TESEX':'sex', 'PTDTRACE':'race', 'PEEDUCA':'edu', 'GTMETSTA':'metro', 'TELFS':'employ',
'TUDIARYDAY':'day', 't050101':'work_main', 't050102':'work_other', 't010101':'sleep', 't050201':'work_soc', 't010102':'no_sleep'}
atus_small = pd.read_csv('atussum_2017.dat', usecols=variables.keys())
atus_small.rename(columns=variables, inplace=True)
atus_small.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10223 entries, 0 to 10222
Data columns (total 12 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 age 10223 non-null int64
1 sex 10223 non-null int64
2 edu 10223 non-null int64
3 race 10223 non-null int64
4 metro 10223 non-null int64
5 employ 10223 non-null int64
6 day 10223 non-null int64
7 sleep 10223 non-null int64
8 no_sleep 10223 non-null int64
9 work_main 10223 non-null int64
10 work_other 10223 non-null int64
11 work_soc 10223 non-null int64
dtypes: int64(12)
memory usage: 958.5 KB
Okay, let's start transforming the data to get it ready for analysis.
A word on loops
We have seen many places where a loop saved us time and effort. It might seem natural to use a loop to perform an operation on each element of a column or a DataFrame.
In general, however, we want to avoid this. Instead, we have used pandas vectorized operations such as
x['new_var'] = x['var_1'] / x['var_2']
to perform element-wise division. Using pandas' native operations is much (much much) faster than looping over the rows of a DataFrame. A lot of optimization has been written into the native functions that is not there when we loop ourselves. If you find yourself looping over the rows of a DataFrame, take a step back and think about other ways forward.
Fortunately, pandas provides methods that let us do very complex and very general operations to a DataFrame without resorting to a loop. We consider a few of these below.
replace( )
The sex variable is coded 1 for male and 2 for female. I do not want to have to remember that!
The replace( )
method replaces one value for another. One syntax is
atus_small['sex'] = atus_small['sex'].replace(1, 'male')
but a more powerful one passes a dict or a list.
atus_small['sex'] = atus_small['sex'].replace({1:'male', 2:'female'})
sex_codes = {1:'male', 2:'female'}
atus_small['sex'] = atus_small['sex'].replace(sex_codes)
atus_small.head()
age | sex | edu | race | metro | employ | day | sleep | no_sleep | work_main | work_other | work_soc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 34 | female | 39 | 2 | 1 | 1 | 1 | 728 | 0 | 450 | 0 | 0 |
1 | 28 | female | 40 | 1 | 1 | 5 | 7 | 385 | 0 | 0 | 0 | 0 |
2 | 15 | male | 35 | 3 | 1 | 5 | 4 | 570 | 0 | 0 | 0 | 0 |
3 | 46 | male | 39 | 1 | 1 | 1 | 2 | 525 | 0 | 480 | 0 | 0 |
4 | 85 | male | 44 | 1 | 1 | 1 | 7 | 756 | 0 | 0 | 0 | 0 |
unique( )
Let's code race, too. What codes are in our data? The method unique( )
returns the unique values.
races = atus_small['race'].unique()
races.sort()
print(races)
[ 1 2 3 4 5 6 7 8 9 10 11 13 16 18]
# I looked up the codes in the documentation...
race_codes = {1:'white', 2:'black', 3:'native_am', 4:'asian', 5:'hawaiian', 6:'wh_bl',
7:'wh_na', 8:'wh_as', 9:'wh_ha', 10:'bl_na', 11:'bl_as', 13:'na_as', 16:'wh_bl_as', 18:'wh_as_ha'}
atus_small['race'] = atus_small['race'].replace(race_codes)
atus_small.head()
age | sex | edu | race | metro | employ | day | sleep | no_sleep | work_main | work_other | work_soc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 34 | female | 39 | black | 1 | 1 | 1 | 728 | 0 | 450 | 0 | 0 |
1 | 28 | female | 40 | white | 1 | 5 | 7 | 385 | 0 | 0 | 0 | 0 |
2 | 15 | male | 35 | native_am | 1 | 5 | 4 | 570 | 0 | 0 | 0 | 0 |
3 | 46 | male | 39 | white | 1 | 1 | 2 | 525 | 0 | 480 | 0 | 0 |
4 | 85 | male | 44 | white | 1 | 1 | 7 | 756 | 0 | 0 | 0 | 0 |
Apply a function to a Series [a single column]: map( )
We can apply functions to the individual elements in a column, too. The map()
functions handles this for us. It applies the given function to each element of the column. These can be built-in functions, or user-defined functions. This is quite powerful. We can write a function that performs a complicated transformation and apply to each element of a column in one simple line.
Let's define a function that converts minutes to hours.
def minutes_to_hours(x):
return x/60
Now apply map( )
to the work column.
[This example is a bit contrived. We could have just as easily done atus_small['work_main']/60
. But that wouldn't have given us practice with map( )
.]
atus_small['work_main'] = atus_small['work_main'].map(minutes_to_hours)
atus_small.head()
age | sex | edu | race | metro | employ | day | sleep | no_sleep | work_main | work_other | work_soc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 34 | female | 39 | black | 1 | 1 | 1 | 728 | 0 | 7.5 | 0 | 0 |
1 | 28 | female | 40 | white | 1 | 5 | 7 | 385 | 0 | 0.0 | 0 | 0 |
2 | 15 | male | 35 | native_am | 1 | 5 | 4 | 570 | 0 | 0.0 | 0 | 0 |
3 | 46 | male | 39 | white | 1 | 1 | 2 | 525 | 0 | 8.0 | 0 | 0 |
4 | 85 | male | 44 | white | 1 | 1 | 7 | 756 | 0 | 0.0 | 0 | 0 |
Apply a function to a DataFrame: applymap( )
If we want to apply the same function to each element in several columns of a DataFrame (rather than a Series) we use applymap( )
. It works the same way, applying the function to each element.
[I am not sure why the developers of pandas created separate methods for a Series vs a DataFrame. My best guess is that each method takes advantage of Series- or Dataframe-specific functionality for better performance.]
# We can map to several columns at once.
atus_small[['work_other', 'sleep', 'no_sleep']] = atus_small[['work_other', 'sleep', 'no_sleep']].applymap(minutes_to_hours)
atus_small.head()
age | sex | edu | race | metro | employ | day | sleep | no_sleep | work_main | work_other | work_soc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 34 | female | 39 | black | 1 | 1 | 1 | 12.133333 | 0.0 | 7.5 | 0.0 | 0 |
1 | 28 | female | 40 | white | 1 | 5 | 7 | 6.416667 | 0.0 | 0.0 | 0.0 | 0 |
2 | 15 | male | 35 | native_am | 1 | 5 | 4 | 9.500000 | 0.0 | 0.0 | 0.0 | 0 |
3 | 46 | male | 39 | white | 1 | 1 | 2 | 8.750000 | 0.0 | 8.0 | 0.0 | 0 |
4 | 85 | male | 44 | white | 1 | 1 | 7 | 12.600000 | 0.0 | 0.0 | 0.0 | 0 |
.map()
and .applymap()
are very powerful methods. They can do a lot more than we have covered and they can be used to do very complex manipulations. We won't need those advanced capabilities, but they are out there if you someday do.
Practice
Take a few minutes and try the following. Feel free to chat with those around you if you get stuck.
-
edu
in 'atus_small' holds the highest level of education obtained. It can take values between 31 and 46. Covert the numeric values to labels. -
39 = 'high school'
- 40 = 'some college'
- 41 & 42 are 'associate'
- 43 = 'bachelor'
- 44 = 'master'
- 45 = 'prof'
- 46 = 'phd'
educ_codes = {39:'high school', 40:'some college', 41:'associate', 42:'associate',
43: 'bachelor', 44: 'master', 45:'prof', 46:'phd'}
atus_small['edu'] = atus_small['edu'].replace(educ_codes)
atus_small.sample(10)
age | sex | edu | race | metro | employ | day | sleep | no_sleep | work_main | work_other | work_soc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
5348 | 77 | female | some college | white | 1 | 5 | 6 | 10.500000 | 0.0 | 0.000000 | 0.0 | 0 |
4819 | 52 | female | some college | white | 1 | 4 | 5 | 9.500000 | 1.0 | 0.000000 | 0.0 | 0 |
2639 | 48 | female | bachelor | white | 1 | 1 | 5 | 7.916667 | 0.0 | 0.000000 | 0.0 | 0 |
175 | 65 | female | master | white | 1 | 5 | 2 | 12.866667 | 0.0 | 0.000000 | 0.0 | 0 |
9725 | 66 | female | bachelor | white | 1 | 2 | 5 | 5.316667 | 0.0 | 0.000000 | 0.0 | 0 |
2365 | 33 | female | master | white | 1 | 1 | 4 | 8.500000 | 0.0 | 13.533333 | 0.0 | 0 |
1773 | 61 | female | bachelor | black | 1 | 5 | 5 | 6.500000 | 0.0 | 0.000000 | 0.0 | 0 |
3595 | 49 | female | bachelor | white | 1 | 1 | 3 | 8.833333 | 0.0 | 8.583333 | 0.0 | 0 |
3085 | 85 | female | some college | white | 2 | 5 | 7 | 12.583333 | 0.0 | 0.000000 | 0.0 | 0 |
3647 | 61 | female | associate | white | 1 | 1 | 6 | 8.833333 | 0.0 | 9.000000 | 0.0 | 0 |
# Another way to do it is to write a function like this
def recode(x):
educ_codes = {39:'high school', 40:'some college', 41:'associate', 42:'associate',
43: 'bachelor', 44: 'master', 45:'prof', 46:'phd'}
return educ_codes[x]
# and use .map()
# atus_small['edu'] = atus_small['edu'].map(recode)
- Print out a list of the unique values in the column 'edu'.
atus_small['edu'].unique()
array(['high school', 'some college', 35, 'master', 'bachelor', 36,
'prof', 'associate', 'phd', 37, 34, 33, 38, 31, 32], dtype=object)
- Apply the
minutes_to_hours
function to thework_soc
variable. This variable is the time spent 'Socializing, relaxing, and leisure as part of job'. I have no idea what kind of jobs makes you relax and take leisure.
atus_small['work_soc'] = atus_small['work_soc'].map(minutes_to_hours)
atus_small.head()
age | sex | edu | race | metro | employ | day | sleep | no_sleep | work_main | work_other | work_soc | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 34 | female | high school | black | 1 | 1 | 1 | 12.133333 | 0.0 | 7.5 | 0.0 | 0.0 |
1 | 28 | female | some college | white | 1 | 5 | 7 | 6.416667 | 0.0 | 0.0 | 0.0 | 0.0 |
2 | 15 | male | 35 | native_am | 1 | 5 | 4 | 9.500000 | 0.0 | 0.0 | 0.0 | 0.0 |
3 | 46 | male | high school | white | 1 | 1 | 2 | 8.750000 | 0.0 | 8.0 | 0.0 | 0.0 |
4 | 85 | male | master | white | 1 | 1 | 7 | 12.600000 | 0.0 | 0.0 | 0.0 | 0.0 |
- Create a column named 'work' that is the sum of 'work_main' and 'work_other'.
atus_small['work'] = atus_small['work_main'] + atus_small['work_other']
atus_small.sample(10)
age | sex | edu | race | metro | employ | day | sleep | no_sleep | work_main | work_other | work_soc | work | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9025 | 60 | male | 35 | black | 1 | 5 | 6 | 4.500000 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 |
7969 | 32 | male | bachelor | asian | 1 | 1 | 7 | 10.500000 | 0.75 | 0.0 | 0.0 | 0.0 | 0.0 |
8267 | 53 | male | associate | white | 1 | 1 | 7 | 9.416667 | 0.00 | 1.0 | 0.0 | 0.0 | 1.0 |
8706 | 49 | male | high school | white | 1 | 1 | 1 | 10.000000 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 |
3293 | 29 | male | some college | white | 1 | 5 | 5 | 9.000000 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 |
9549 | 28 | male | some college | white | 1 | 1 | 1 | 8.500000 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 |
3738 | 31 | female | high school | black | 1 | 4 | 7 | 8.500000 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 |
8301 | 48 | female | bachelor | white | 1 | 1 | 7 | 5.166667 | 0.00 | 0.5 | 0.0 | 0.0 | 0.5 |
5719 | 19 | female | high school | white | 1 | 1 | 7 | 8.750000 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 |
4924 | 41 | female | associate | white | 2 | 1 | 7 | 7.250000 | 0.00 | 0.0 | 0.0 | 0.0 | 0.0 |
- Create a histogram of 'work' for only those observations with work>0.
fig, ax = plt.subplots(figsize=(10,6))
ax.hist(atus_small.loc[ atus_small['work']>0, 'work'], bins=40, alpha = 0.4, color = 'red')
ax.set_xlabel('hours of work')
ax.set_title('Distribution of hours worked, conditional on working at least one minute')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
Extra practice
If you finish early, try rewriting your solution to part 1, but code values less than 39 as "less than high school".
String methods
We have seen some of these before, when we learned about strings. These are analogous to the string methods in standard python, but they have been optimized for DataFrames. These vectorized string methods, work element-wise over an entire column. The method call looks like
data['var'].str.method()
where .str.method( )
is the method we are applying. A list of vectorized string methods is on page 218 in McKinney and in the documentation. Below, we try a few out.
MovieLens data set
We are going to work with the MovieLens ml-latest-small dataset. The GroupLens organization released this data. It is meant to help build recommendation algorithms, like the ones you see in Netflix or Spotify. [In 2006, Netflix started a contest, with a $1 mil. reward, for an algorithm that could beat their own.] They have other ratings datasets, too, on music, jokes, and books.
An observation is a movie.
movies = pd.read_csv('movies.csv')
movies.sample(10)
movieId | title | genres | |
---|---|---|---|
7317 | 77233 | Union: The Business Behind Getting High, The (... | Comedy|Documentary |
9213 | 151763 | Death Note Rewrite: Genshisuru Kami (2007) | Mystery |
8661 | 121097 | To Grandmother's House We Go (1992) | Adventure|Children|Comedy |
1928 | 2557 | I Stand Alone (Seul contre tous) (1998) | Drama|Thriller |
4916 | 7371 | Dogville (2003) | Drama|Mystery|Thriller |
233 | 271 | Losing Isaiah (1995) | Drama |
285 | 327 | Tank Girl (1995) | Action|Comedy|Sci-Fi |
6945 | 65350 | General Died at Dawn, The (1936) | Adventure|Crime|Thriller |
7797 | 92094 | Einstein and Eddington (2008) | Drama |
1877 | 2495 | Fantastic Planet, The (Planète sauvage, La) (1... | Animation|Sci-Fi |
str.contains( )
The genres are mixed together. Let's get all the comedies. The .contains( )
method returns a bool Series with True for observations in which the string contains the search term.
movies['genres'].str.contains('Comedy')
0 True
1 False
2 True
3 True
4 True
...
9737 True
9738 True
9739 False
9740 False
9741 True
Name: genres, Length: 9742, dtype: bool
print(movies.shape)
comedies = movies[movies['genres'].str.contains('Comedy')]
print(comedies.shape)
(9742, 3)
(3756, 3)
str.split( )
This method splits the string up at the delimiter that is passed to .split( )
. It returns a list of each chunk that falls between the delimiter.
This could be useful processing name data that come in the form: last,first or city,state.
# The movie genres are separated with the '|' character.
# Remember, DataFrames can have columns of lists...
movies['genre_split'] = movies['genres'].str.split('|')
movies.head()
movieId | title | genres | genre_split | |
---|---|---|---|---|
0 | 1 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy | [Adventure, Animation, Children, Comedy, Fantasy] |
1 | 2 | Jumanji (1995) | Adventure|Children|Fantasy | [Adventure, Children, Fantasy] |
2 | 3 | Grumpier Old Men (1995) | Comedy|Romance | [Comedy, Romance] |
3 | 4 | Waiting to Exhale (1995) | Comedy|Drama|Romance | [Comedy, Drama, Romance] |
4 | 5 | Father of the Bride Part II (1995) | Comedy | [Comedy] |
movies.loc[0,'genre_split']
['Adventure', 'Animation', 'Children', 'Comedy', 'Fantasy']
str.join ( )
Put strings together. Separate the pieces with a delimiter of your choosing.
movies['with_colons'] = movies['genre_split'].str.join('::')
movies.sample(5)
movieId | title | genres | genre_split | with_colons | |
---|---|---|---|---|---|
8720 | 126420 | American Heist (2015) | Action | [Action] | Action |
7721 | 90430 | Carnage (2011) | Comedy|Drama | [Comedy, Drama] | Comedy::Drama |
9710 | 187595 | Solo: A Star Wars Story (2018) | Action|Adventure|Children|Sci-Fi | [Action, Adventure, Children, Sci-Fi] | Action::Adventure::Children::Sci-Fi |
3686 | 5076 | Adventures of Huck Finn, The (1993) | Adventure|Children|Comedy|Drama | [Adventure, Children, Comedy, Drama] | Adventure::Children::Comedy::Drama |
8412 | 110669 | Honest Liar, An (2014) | Comedy|Documentary | [Comedy, Documentary] | Comedy::Documentary |
Practice
Take a few minutes and try the following. Feel free to chat with those around you if you get stuck. I am here, too.
Our data does not have a column for the year the movie was released. Let's create one. The year the movie was released is in the title string.
- Reload 'movies.csv'
-
Use
.str.strip()
(docs) to remove any leading or trailing spaces from 'title'. -
Extract the four-digit year from the titles and put them into a new column named 'year'.
Notice that the year, including the parentheses, is always the last 6 digits of the title. You might try str.slice()
and work with negative indexes to count from the end of 'title'.
If there is any extra space at the end of a title, it will mess up my algorithm! That's why we strip the extra spaces first.
# The negative notation counts backwards. I am relying on the fact that each title ends in '(year)'.
# Not the most robust method.
movies = pd.read_csv('movies.csv')
movies['title'] = movies['title'].str.strip()
movies['year'] = movies['title'].str.slice(-5, -1)
movies.head(2)
movieId | title | genres | year | |
---|---|---|---|---|
0 | 1 | Toy Story (1995) | Adventure|Animation|Children|Comedy|Fantasy | 1995 |
1 | 2 | Jumanji (1995) | Adventure|Children|Fantasy | 1995 |
- There are 12 movies that do not have a year in their title. Find them in your DataFrame. You might try the
str.isdigit()
(doc) method to see if the year you extracted in step 2. is numeric.
movies.loc[~movies['year'].str.isdigit()]
movieId | title | genres | year | |
---|---|---|---|---|
6059 | 40697 | Babylon 5 | Sci-Fi | lon |
9031 | 140956 | Ready Player One | Action|Sci-Fi|Thriller | r On |
9091 | 143410 | Hyena Road | (no genres listed) | Roa |
9138 | 147250 | The Adventures of Sherlock Holmes and Doctor W... | (no genres listed) | atso |
9179 | 149334 | Nocturnal Animals | Drama|Thriller | imal |
9259 | 156605 | Paterson | (no genres listed) | erso |
9367 | 162414 | Moonlight | Drama | ligh |
9448 | 167570 | The OA | (no genres listed) | he O |
9514 | 171495 | Cosmos | (no genres listed) | osmo |
9515 | 171631 | Maria Bamford: Old Baby | (no genres listed) | Bab |
9525 | 171891 | Generation Iron 2 | (no genres listed) | ron |
9611 | 176601 | Black Mirror | (no genres listed) | irro |
- For the movies without a year, set the 'year' variable to
np.nan
. Use.unique()
to check that all the 'year' values are either numbers (as strings) ornan
.
np.nan
is the "not a number" object. Pandas will interpret that it as a missing value.
movies.loc[~movies['year'].str.isdigit(), 'year'] = np.nan
movies['year'].unique()
array(['1995', '1994', '1996', '1976', '1992', '1967', '1993', '1964',
'1977', '1965', '1982', '1990', '1991', '1989', '1937', '1940',
'1969', '1981', '1973', '1970', '1955', '1959', '1968', '1988',
'1997', '1972', '1943', '1952', '1951', '1957', '1961', '1958',
'1954', '1934', '1944', '1960', '1963', '1942', '1941', '1953',
'1939', '1950', '1946', '1945', '1938', '1947', '1935', '1936',
'1956', '1949', '1932', '1975', '1974', '1971', '1979', '1987',
'1986', '1980', '1978', '1985', '1966', '1962', '1983', '1984',
'1948', '1933', '1931', '1922', '1998', '1929', '1930', '1927',
'1928', '1999', '2000', '1926', '1919', '1921', '1925', '1923',
'2001', '2002', '2003', '1920', '1915', '1924', '2004', '1916',
'1917', '2005', '2006', '1902', nan, '1903', '2007', '2008',
'2009', '2010', '2011', '2012', '2013', '2014', '2015', '2016',
'2017', '2018', '1908'], dtype=object)
- Convert the year variable to datetime. Check your dtypes to verify.
movies['year'] = pd.to_datetime(movies['year'])
movies.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9742 entries, 0 to 9741
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 movieId 9742 non-null int64
1 title 9742 non-null object
2 genres 9742 non-null object
3 year 9730 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(1), object(2)
memory usage: 304.6+ KB
Extra practice (try at home)
If you finished early, compute the number of movies in the DataFrame for each year. You might try .groupby()
.
- Which year has the most observations?
movie_counts = movies[['year', 'movieId']].groupby('year').count()
most = movie_counts['movieId'].sort_values().tail(1).index.date[0].year
f'The year {most} has the most movies in the dataset.'
'The year 2002 has the most movies in the dataset.'
Here is something I have not figured out yet...
- How many movies are there in each decade? i.e., 1900–1909, 1910–1919, 1920–1929...
# I tried this, but I want the first bin to be 1900-01-01...
movie_counts.resample('10AS').sum()
# I know that I can use pd.cut(), but I think there is some way to get this to work by just resampling.
# If you figure it out, let me know!
movieId | |
---|---|
year | |
1902-01-01 | 3 |
1912-01-01 | 10 |
1922-01-01 | 53 |
1932-01-01 | 160 |
1942-01-01 | 197 |
1952-01-01 | 307 |
1962-01-01 | 410 |
1972-01-01 | 601 |
1982-01-01 | 1290 |
1992-01-01 | 2495 |
2002-01-01 | 2774 |
2012-01-01 | 1430 |