Survey
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
STA312
Python Introduction
Craig Burkett, Dan Zingaro
January 6, 2015
Python History
I
Late 1970s: programming language called ABC
I
I
I
I
I
High-level, intended for teaching
Only five data types
Programs are supposedly one-quarter the size of the equivalent
BASIC or Pascal program
Not a successful project
More ABC information:
http://homepages.cwi.nl/~steven/abc/
Python History...
I
1983: Guido van Rossum joined the ABC team
I
Late 1980s: Guido started working on a new project, in which
a scripting language would be helpful
I
Based Python on ABC, removed warts (e.g. ABC wasn’t
extensible)
I
Python, after Monty Python
I
Guido: Benevolent Dictator for Life (BDFL)... but he’s
retiring!
I
http://www.artima.com/intv/ (search for Guido)
Why Python for Big Data?
I
Readable, uniform code structure
I
No compilation step; Python is interpreted
I
Supports object-oriented programming (OOP) features
I
Batteries included: Python’s standard library comes with tools
for a variety of problem domains
I
Additional modules are available for download: data mining,
language processing . . .
Dynamic Typing
I
Biggest conceptual change compared to C, Java etc.
I
Variables do not have types. Objects have types
>>> a = 5
>>> type (a)
<type ’int’>
>>> a = ’hello’
>>> type (a)
<type ’str’>
>>> a = [4, 1, 6]
>>> type (a)
<type ’list’>
Built-in Types
I
We’ll look at the core five object types that are built-in to
Python
I
I
I
I
I
I
Numbers
Strings
Lists
Dictionaries
Files
They’re extremely powerful and save us from writing tons of
low-level code
Built-in Types: Numbers
I
Create numbers by using numeric literals
I
If you include no fractional component, it’s an integer;
otherwise it’s a float
I
We have all of the standard mathematical operators, and even
** for exponent
I
Make integers as small or large as you like — they can’t go
out of bounds
Built-in Types: Strings
I
A string is a sequence of characters
I
To indicate that something is a string, we place single- or
double-quotes around it
I
We can use + to concatenate strings
I
This is an example of overloading: + is used to add numbers
too; it knows what to do based on context
What happens if we try to use + with a string and a number?
I
I
I
I
Error: + doesn’t know what to do!
e.g. is ’3’ + 4 supposed to be the string ’34’ or the number
7?
Design philosophy: Python tries never to guess at what you
mean
Strings...
I
The * operator is overloaded, too
I
I
I
Applied to a string and an integer i, it duplicates the string i
times
If i ≤ 0, the result is the empty string
Can also use relational operators such as < or > to
alphabetically compare strings
Looping Through Strings
for char in s:
<do something with char>
I
We’ll see this pattern again and again for each Python type
I
It’s like Php’s foreach or Java’s for-with-the-colon
I
Let’s write a function that counts the number of vowels in a
string
I
A function is a named piece of code that carries out some task
Possible Solution: How Many Vowels? (num vowels.py)
def num_vowels(s):
’’’Return the number of vowels in string s.
The letter "y" is not treated as a vowel.’’’
count = 0
for char in s:
if char in "aAeEiIoOuU":
count += 1
return count
String Methods
I
Strings are objects and have tons of methods
I
Use dot-notation to access methods
I
Use dir (str) to get a list of methods, and
help (str.methodname) for help on any method
I
Useful ones: find, lower, count, replace...
I
Strings are immutable (cannot be modified): all we can do is
create new strings
Indexing and Slicing Strings
I
Assume s is a string
I
Then, s[i] for i ≥ 0 extracts character i from the left (0 is
the leftmost character)
I
We can also use a negative index i to extract a character
beginning from the right (-1 is the rightmost character)
Slice notation: s[i:j] extracts characters beginning at
s[i] and ending at the character one to the left of s[j]
I
I
I
If we leave out the first index, Python defaults to using index 0
to begin the slice
Similarly, if we leave out the second index, Python defaults to
using index len(s) to end the slice
Built-in Types: Lists
Lists are like arrays in other languages,
Strings
Sequences of?
Characters
Yes
Immutable?
Can be heterogeneous? No
Yes
Can index and slice?
Can use for-loop?
Yes
Created like?
’hi’
but much more powerful.
Lists
Any object types
No
Yes
Yes
Yes
[4, 1, 6]
List Methods
I
As with strings, there are lots of methods; use dir (list) or
help (list.method) for help
I
append is used to add an object to the end of a list
I
extend is used to append the objects of another list
I
insert (index, object) inserts object before index
I
sort() sorts a list
I
remove (value) removes the first occurrence of value from
the list
Exercise: Length of Strings
3
1
0
I
Write a function that takes a list of strings, and prints out the
length of each string in the list
I
e.g. if the list is [’abc’, ’q’, ’’], the output would be as
follows
Built-in Types: Dictionaries
Dictionaries are like associative arrays or maps in other languages.
Stores?
Immutable?
Can be heterogeneous?
Can index and slice?
Can use for-loop?
Created like?
Lists
Sequences of objects
No
Yes
Yes
Yes
[4, 1]
Dictionaries
Key-value pairs
No
Yes
No
Yes
{’a’: 1, ’b’: 2}
Dictionaries vs. Lists
I
Compared to using “parallel lists”, dictionaries make an
explicit connection between a key and a value
I
I
But unlike lists, dictionaries do not guarantee any ordering of
the elements
If you use for k in d, for a dictionary d, you get the keys
back in arbitrary order
bird_dict = {
’peregrine falcon’: 1, ’harrier falcon’: 5,
’red-tailed hawk’: 2, ’osprey’: 11}
Adding to Dictionaries
I
Dictionary keys must be of immutable types (no lists!), but
values can be anything
I
We can use d[k] = v to add key k with value v to dictionary
d
I
We can use the update method to dump another dictionary’s
key-value pairs into our dictionary
We can use d[k] to obtain the value associated with key k of
dictionary d
I
I
I
If k does not exist, we get an error
The get method is similar, except it returns None instead of
giving an error when the key does not exist
Built-in Types: Files
I
We’ll use files whenever we read external data (websites,
spreadsheets, etc.)
I
To open a file in Python, we use the open function
I
Syntax: open (filename, mode)
I
mode is the string ’r’ to open the file for reading, ’w’ to
open the file for writing, or ’a’ to open the file for
appending. No mode = ’r’
I
open gives us a file object that we can use to read or write
the file
Reading Files with Methods
To read the next line from a file:
readline: reads and returns next line; returns empty string at
end-of-file
There are other methods, but try not to use these because they
read the entire file into memory:
I
read: reads the entire file into one string
I
readlines: reads the entire file into a list of strings
All of these leave a trailing ’\n’ character at the end of each line.
Reading Files with Loops
A file is a sequence of lines:
f = open(’songs.txt’)
for line in f:
print(line.strip())
. . . or using a while-loop:
f = open(’songs.txt’)
line = f.readline()
while line:
print(line.strip())
line = f.readline()
Skipping Headers
Suppose we have a file of this format:
header
# comment text
# comment text
# ...
... actual data
...
Let’s write a function that skips the header of such a file and
returns the first line of actual data.
Multi-Field Records
I
So far, we have been reading entire lines from our file
I
But, our lines are actually records containing three fields:
game name, song name, and rating
I
Let’s write a function to read this data into three lists
The critical string method here is split
I
I
I
With no parameters, it splits around any space
With a string parameter, it splits around that string
Further Python Resources
I
http://www.rmi.net/~lutz
I
I
I
http://docs.python.org/tutorial
I
I
Mark Lutz’ Python books
Constantly-updated to keep up with Python releases
Free online Python tutorial
https://mcs.utm.utoronto.ca/~108
I
Dan’s intro CS course in Python