Lecture 4: More Python Datatypes

Eric Brauer

Introduction

This week we are introducing more iterable datatypes. Recall that iterables contain multiple elements that can be looped through with a for loop.

We will also discuss more ways of interacting with strings, and introduce many new methods for strings.

method
a special function that ‘belongs’ to a particular datatype object.

DATATYPE: List

  • Lists have many elements, can be combination of many datatypes
  • Can be changed with insert(), remove(), append() or pop() (removes last item while also returning it)
  • Use index (starting at 0) to pick elements
  • Is the first datatype we’ve used which is mutable.

Object Mutability

Most datatypes we’ve worked with are immutable. That means they cannot be changed (“mutated”) after creation.

The proof:

>>>> string_obj = 'hi'  # creates a string
>>>> print(string_obj.upper())  # returns the string but UPPER CASE
HI
>>>> print(string_obj)  # the original version is unchanged
hi
>>>> x = 3  # integers are also immutable
>>>> x + 1
4
>>>> x  # the original value of x unchanged
3
>>>> x = x + 1  # this is why we have to use x = x + 1 here, 
# we are creating a new version of x

Lists Are Different

Lists are mutable. Using a method will change them permanently.

>>>> mylist = [ 'dog', 'cat', 'aardvark' ]
>>>> mylist.append('bat')
>>>> mylist
[ 'dog', 'cat', 'aardvark', 'bat' ]
>>>> mylist.sort()  # notice no need start with mylist = 
>>>> mylist
['aardvark', 'bat', 'cat', 'dog' ]

New Datatype: Tuple

  • Tuples have many elements (like lists)
  • Tuples have an index (also like lists)
  • However, tuples are immutable (can’t be changed)
  • Use ()
# A Tuple of one element:
my_tup = ('capybara', )

# More:
my_tup2 = ('first', 'second')

# Can't be changed, can only be created
my_tup3 = my_tup + my_tup2

Applications of Tuples

  • Faster to access than lists
  • Data protection (subprocess.Popen)
  • Work for situations when something needs to be immutable

Datatype: Sets

  • Sets contain many elements
  • Cannot be sliced (no index numbers!)
  • Every element is unique
  • Can be modified, use add()
  • Use {}
>>>> myset = {'orange', 'blue', 'green'}
>>>> myset.add('red')  # can be modified
>>>> myset.add('orange')  # but duplicates are ignored!
>>>> myset
{'blue', 'green', 'orange', 'red'}  # order isn't remembered, it's just a group of things

Applications Of Sets

  • Faster performance than a list
  • Often used for input validation
  • Consider whenever there is no inherent sort order
>>>> provs = {'AB', 'BC', 'SK', 'ON', 'QC', 'MN', 'PE', 'NL', 'NS', 'NB'}
>>>> 'AB' in provs  # 'in' will see if an item exists inside of an iterable
True
>>>> 'ON' in provs
True
>>>> 'Alaska' in provs
False

Set Combinations

Venn Diagram

Set Combinations: Union

zoo_animals = {'tiger', 'bat', 'rabbit', 'fox'}
predators = {'tiger', 'dog', 'fox', 'shark'}
print(zoo_animals | predators)
{'tiger', 'bat', 'rabbit', 'fox', 'dog', 'shark'}

Set Combinations: Intersection

zoo_animals = {'tiger', 'bat', 'rabbit', 'fox'}
predators = {'tiger', 'dog', 'fox', 'shark'}
print(zoo_animals & predators)
{'tiger', 'fox'}

Set Combinations: Difference

zoo_animals = {'tiger', 'bat', 'rabbit', 'fox'}
predators = {'tiger', 'dog', 'fox', 'shark'}
print(zoo_animals - predators)
{'bat', 'rabbit'}

Set Combinations: Symmetric Difference

zoo_animals = {'tiger', 'bat', 'rabbit', 'fox'}
predators = {'tiger', 'dog', 'fox', 'shark'}
print(zoo_animals ^ predators)
{'bat', 'rabbit', 'dog', 'shark'}

Datatype: Dictionaries

  • Dicts contain many elements, like lists
  • Assume they don’t have an order (only in recent Python!)
  • Each element in a dictionary has two parts: a key and a value
  • Can be modified
  • Use {}
>>>> mydict = {'name': 'Chris', 'age': 42}
>>>> mydict['name']
'Chris'

Datatype: Dictionaries

Dicts can be confusing, so think of it like this:

my_list_dict = {0:'cat', 1:'dog', 2:'human'}

There we go, we have made a dictionary that behaves like a list! The only difference is now we get to replace numbers with meaningful keys to access our data!

Dictionaries: Keys And Values

  • Dictionary keys and values can be different data types, with some caveats.
  • Keys are immutable, and can’t be changed. Only created or destroyed.
  • Values can be changed.
>>>> mydict = {'name': 'Chris', 'age': 42}
>>>> mydict['age'] = 43  # changes value for existing key
>>>> mydict['hobby'] = 'hiking'  # adds new key/value pair
>>>> mydict
{'name': 'Chris', 'age': 43, 'hobby': 'hiking'}

Iteration

  • You can iterate through sets and tuples, just like a list!
  • You can iterate through a dictionary, but it’s different:
for key in my_dict:  # iterating variable contains key
    print(key)
    print(my_dict[key])  # with the key, get the value

# another method
for k, v in my_dict.items():  # .items() will generate a tuple for each item in the dictionary
    print(k + ': ' + v)  # what is k, and what is v do you think?

String Slicing

By the way, strings are iterables as well. You can iterate through them with a for loop. They have index numbers, just like lists. Here are more ways of using index numbers:

Consider the string PYTHON

0 1 2 3 4 5 left-to-right: positive numbers
P Y T H O N string
-6 -5 -4 -3 -2 -1 right-to-left: negative numbers
>>>> test = 'PYTHON'
>>>> test[3:-1]
'HO'
>>>> test[-5:]  # leaving second number blank means slice includes to the end 
'YTHON'
>>>> test[:4]  # leaving first number blank includes the beginning
'PYTH'

String Methods

As discussed before, methods are functions that belong to a particular variable that you create.

>>>> x = 'hi' # create variable x, which is a string
>>>> x.upper()  # all strings contain a method called upper 
'HI'

You know you’re dealing with a method when you see them start with a dot (.).

Discovering Methods

>>>> dir(str)
'center',
'upper',
'strip',
'lower',
...

Some Useful Methods

strip
remove leading and trailing whitespace
split
convert a string into a list, by splitting on a delimiter
center
opposite of strip
join
opposite of split
replace
do a substitution
startswith
return True if string starts with a substring

Examples

>>>> raw_data = '  this is a test \n'
>>>> raw_data.strip()
'this is a test'
>>>> raw_output = 'file001.jpg\nfile002.jpg'
>>>> raw_lines = raw_output.split('\n')
['file001.jpg', 'file002.jpg']

Examples

>>>> 'hello'.center(9)  
'  hello  '
>>>> mylist = ['toast', 'eggs']
>>>> '&'.join(mylist)
'toast&eggs'
>>>> 'color'.replace('o', 'ou')  
'coulour'
>>>> 'Hello World'.startswith('Hell')
True