Lab 7: Dictionaries and CSV Files, F-Strings

Learning Outcomes

In this lab, you will learn:

Introduction

In this lab we will be introducing some new datatypes, notably the dictionary. Different datatypes are suited to different use-cases, and have different performance characteristics as well (although for the scripts we are creating, performance is not really that much of an issue!). In addition, we will introduce a method of working with Comma-Separated Value files, otherwise known as csv files. CSV files store data in rows and columns similar to spreadsheets, but do not contain formulae. They can easily be imported into spreadsheet programs such as Microsoft Excel. Oftentimes we would like to use scripting to parse data and then export it to a CSV file.

Why not simply create xls or xlsx spreadsheets? These are proprietary formats, and working with them can cause formatting or compatibility issues. While there exist modules that will work with xls(x) files, we can bypass a lot of frustration by working with open formats whenever possible.

7.1 Dictionaries

TASK: Understanding the Limitations of Lists

You have already been working with lists, which look like this:

student1 = ['eric', 'smith', '217 Au Large Blvd', 'Toronto', 'Ontario', 'M4A 1P3']

A list is a collection of elements, grouped together under one variable name. In this example, you can see that student1 appears to contain a name and mailing address.

Lists elements are accessible by identifying the name of the list and the index number of the element. Let’s explore this by creating a function that will create a shipping label string. Note that here we are using f-strings to structure our output.


def shipping_label(a_lst):
    "takes a list, generates a string"
    a_str = f"{a_lst[0]} {a_lst[1]}\n"
    a_str += f"{a_lst[2]}\n"
    a_str += f"{a_lst[3]}, {a_lst[4]}\n"
    a_str += f"{a_lst[5]}"
    return a_str

Create a new Python script called lab7a.py and place both student1 and the shipping_label function inside it. Make sure to call the function using student1 as an argument, and use print with the return value. Your output should look like this:

eric smith
217 Au Large Blvd
Toronto, Ontario
M4A 1P3

This method has a lot of advantages: we are grouping data that needs to stay together. We don’t want student1’s address to get confused with student2’s, for example. But you might notice that the shipping_label function is hard to read. Why does the student’s first name come first in the list? Why is the Postal code contained in a_lst[5]? Let’s expand on this example and add some contact information:

student1 = ['eric', 'smith', '647-555-01234', '217 Au Large Blvd', 'Toronto', 'Ontario', 'M4A 1P3', 'esmith1@myseneca.ca']

Any programmer working on this program now has to remember that the telephone number is the third element, and the email address is the eighth. Why the third, and why the eighth? There’s no reason, in this case the order of elements in the list is completely arbitrary! Lists are excellent for storing a collection of data with a specific order, but in this case we would much rather identify each element by what they contain, rather than by the order in which appear.

Using Dictionaries To Solve The Problem With Lists

Dictionaries are like lists, in that they are a group of elements. However, the individual elements are accessible using keys rather than index numbers.

>>>> mydict = {'key': 'value'}  # the curly braces indicate a dictionary when there are two elements with a : between them.
>>>> mydict['key']  # 'key' is used instead of 0.
value

TASK: Modify lab7a

Modify lab7a.py so that student1 is replaced with a dictionary:

student1 = {'first_name': 'eric', 'last_name':'smith', 'addr1': '217 Au Large Blvd', 'city': 'Toronto', 'prov': 'Ontario', 'pcode': 'M4A 1P3'}

And now modify the function to use this dictionary instead of a list. Make sure that the format stays the same.

>>>> from lab7a import student1, shipping_label
>>>> print(shipping_label(student1))
Eric Smith
217 Au Large Blvd
Toronto, Ontario
M4A 1P3

Make sure to research a way to get the name capitalized.

7.2 Modifying Dictionaries

Dictionaries contain mutable values and immutable keys. As stated before, immutable objects cannot be modified, only created or removed. Mutable objects are modifiable. That means we cannot change a key once it has been created, but we can change the value assigned to it.

Use the Python interpreter to follow these examples.

meal_plan = {'breakfast': 'oatmeal', 'lunch': 'sandwiches', 'dinner': 'broccoli'}

meal_plan['lunch'] = 'ice cream'
print(meal_plan)

New key/value pairs can be added using the same syntax:

meal_plan['snack'] = 'cookies'
print(meal_plan)

Removing key/value pairs from a dictionary is more complicated. We can use a method called .pop(), which will take a key as argument, return the value associated with the key, and remove that pair from the dictionary. Observe what happens when using this method with meal_plan.

value_removed = meal_plan.pop('dinner')
print(meal_plan)
print(value_removed)

You will a .pop() method inside list objects as well. “Popping” is a concept in programming where we select an item for processing, handle it, and remove it from the original collection.

7.3 TASK: Print A Menu

Sample Output

MENU FOR TODAY
==============
Breakfast: oatmeal
Lunch: sandwiches
Dinner: broccoli

7.4 F-String Formatting

You’ll notice that the formatting for our menu printing program isn’t very nice. Usually when we are creating menus, we would like titles to be centred, and columns to be justified to the left or right. Fortunately, things like this are easy to do with f-strings.

Open a new interpreter window. Let us start by defining a total width for our menu:

width = 40

It will be easier to visualise this space if we print characters to fill up this space. Enter this code:

print("=" * width)

What you should see printed is 40 equal signs. It’s a nice feature that we can use the * multiply a string!

Now we can demonstrate alignment using f-strings.

print(f"{'x' : <40}")  # Notice that " is on the outside, and ' is on the inside.
print(f"{'x' : ^40}")
print(f"{'x' : >40}")

What you should see now are three lines with x, with the x justified to the left, to the centre, and to the right.

This code, however, has a weakness: what if we decide that our ‘width’ should be 50 instead of 40? Right now we have hardcoded that width inside our f-strings. Fortunately we can use variables to specify width, but we need to contain it inside curly braces.

print(f"{'x' : <{width}}")  # Notice there is not space between < and {width}.
print(f"{'x' : ^{width}}")
print(f"{'x' : >{width}}")

We can combine things to create columns. I’m going to use hardcoded widths again to make things easier to see.

print(f"|{'x' : <20}|{'y' : >20}|")

7.5 TASK: Improve The Menu Script

Sample Output

                  MENU FOR TODAY                  
==================================================
Breakfast                                  oatmeal
Lunch                                   sandwiches
Dinner                                    broccoli

7.6 Common Issues With Dictionaries

Handling KeyErrors

There is one issue that arises when working with a dictionary that has been modified. What if we try to get the value of a key that doesn’t exist? Refer back to the interpreter and your meal_plan example to simulate this. Someone has removed the ‘snack’ key, now what happens when we try to access it?

>>>> print(meal_plan['snack'])
KeyError: 'snack'

We have already learned about using exception handling to deal with issues such as these. However it can be tedious to put every instance of this in a try/except block. As an alternative, you can use the method .get(). This has the added feature of allowing us to substitute any missing value with a default of our choosing.

>>>> meal_plan.get('dinner')
broccoli
>>>> meal_plan.get('snack')
None
>>>> meal_plan.get('snack', 'fruit')
fruit

This second argument is useful, it contains a default value of None, but we can override that default value with one of our own. Use help(meal_plan) and look at some of the other methods available; you will no doubt see more examples of these arguments with defaults already set.

7.7 Iterating Dictionaries

You can use a for loop to iterate through a dictionary. The only difference being, by default the looping variable will contain the key for each key/value pair found. Also you will note that dictionaries don’t have an order. You have no guarantee about which key will appear first during the iteration.


>>>> mydict = {'brand': 'subaru', 'model': 'impreza', 'color': 'green', 'year': '2017'}

>>>> for key in mydict:
>>>>    print(key)

brand
color
model
year

There are other ways of iterating through a dictionary, which may be more useful depending on what you’re doing. By using the methods .keys() and .values() you can generate a list of either keys or values respectively.

>>>> mydict.keys()
['brand', 'model', 'color', 'year']
>>>> mydict.values()
['subaru', 'impreza', 'green', '2017']

Finally, by using the method .items() you can generate a tuple which contains both.

for k, v in mydict.items():
    print{f"Key is: {k} and value is: {v}.")

What Is A Tuple?

Remember how we said that dictionary keys must be immutable, and therefore we cannot use a list as a key? A tuple is yet another datatype in Python. It is very similar to a list in almost every way, except that it is immutable. This means that you cannot append() or remove() an element from a tuple, you can only create or delete the entire tuple.

Tuples have different syntax as well.

tuple_with_one_element = ('hello', )  # notice the comma. This is only needed for tuples with one thing in them.
tuple_with_more = ('hello', 'world', '!')

Why use a tuple? There is a small difference in performance when working with a tuple versus a list: a tuple is much faster in a lot of cases. And you can use a tuple as a dictionary key, as it is immutable. As you learn about more modules in the Python standard library, you will no doubt encounter functions that return tuples rather than lists, and so it’s worth noting their existence here.

7.8 TASK: Modifying A Dictionary

For this task we are going to create a list of meal plans for a Summer Camp. For each day of camp, we will be using an empty template dictionary with keys for breakfast, lunch, and dinner. When we add a day we will modify the values for each dictionary key, and add the dictionary into a list. This will also help us to illustrate another important programming concept.

Sample Output

Would you like to add a day? (y/n) y
Please enter what you would like to eat for breakfast:eggs
Please enter what you would like to eat for lunch:tuna
Please enter what you would like to eat for dinner:pizza
Would you like to add a day? (y/n) y
Please enter what you would like to eat for breakfast:hot dogs
Please enter what you would like to eat for lunch:veal
Please enter what you would like to eat for dinner:pasta
Would you like to add a day? (y/n) n
Please enter a day number for the menu you would like to print (1-2): 1
                  MENU FOR TODAY                  
==================================================
Breakfast                                 hot dogs
Lunch                                         veal
Dinner                                       pasta

Explanation Of The Error

You might have noticed that the output is wrong! We are asking to print the first day’s menu (eggs, tuna, and pizza) but it is printing the last day’s meals.


NOTE: .copy() creates what is known as a shallow copy. There also exists (in the copy module) a function called deepcopy(). These concepts become more important when the object we are copying contain more objects inside it, but this is outside the scope of our course. If you would like to learn more, refer to this discussion on StackOverflow.


7.9 CSV Files

CSV files are used to store tables. They don’t contain formula information like spreadsheets, but can easily be imported into a spreadsheet program.

A typical CSV file will have columns and rows. Data is store between double-quotes, and a comma (,) will define the boundary between columns. A newline character (\n) will define the boundary between rows. For example:

"First Name","Last Name","Student Number","GPA"
"Eric","Smith","0123","2.0"
"Bart","Simpson","0567","4.0"
"Sung-won","Kim","0489","3.8"

Note: You can download sample.csv from our site.

Reading From A CSV File

Python has a module called csv which will allow us to work with a CSV file very easily. You can refer to the documentation to learn more about this module. Since we’ve learned about using dictionaries, it will be easiest to use DictReader to import the data as a list of dictionaries.

To open a csv file, you will need to create a file object, same as you learned in the last lab. It’s always good practice to add exception handling, and to close the file afterward.

Once that is done, you can use DictReader with the file object as an argument.

import csv

list_of_dicts = []
f = open('sample.csv', 'r')
reader = csv.DictReader(f)
for row in reader:
    list_of_dicts.append(row)
f.close()
{'First Name': 'Eric', 'Last Name': 'Smith', 'Student Number': '0123', 'GPA': '2.0'}
{'First Name': 'Bart', 'Last Name': 'Simpson', 'Student Number': '0567', 'GPA': '4.0'}
{'First Name': 'Sung-won', 'Last Name': 'Kim', 'Student Number': '0489', 'GPA': '3.8'}

Writing To A CSV File

To perform the inverse operation, you might expect the csv module to have a DictWriter method. And you would be correct.

One difference between reading and writing, is that the header (containing column names such as ‘First Name’) aren’t written by default, so we will need to perform the operation manually.

We create a list called fieldnames, and we use the .keys() method only on the first dictionary to get the fieldnames to use in our CSV file. Then we can use w.writeheader() to write this list as our first row in the file. Then a simple for loop can be used to convert each dictionary into a new row.

f = open('sample2.csv', 'w')
fieldnames = list_of_dicts[0].keys()
w = csv.DictWriter(f, fieldnames=fieldnames)
w.writeheader()
for row in list_of_dicts:
    w.writerow(row)
f.close()

7.10 TASK: Substitute Data In A CSV File

CHALLENGE TASK: Printing a Table

Sample Output

First Name | Last Name | GPA
----------------------------
Bart       | Simpson   | 4.0
...

Deliverables

You can download the check script here.

Glossary