CSV Files¶

CSV in its Origins ¶

Simple in principle
Line-based file
- Each line has a record
- Each record has the same number of fields
- Each field has a meaning

Here is a simple CSV file in its very original form

simple.csv¶

5,Joerg,Faschingbauer
3,Sepp,Huber
7,Elizabeth,Queen

No heading
Commas (“,”) as field separators
ASCII encoded

The `csv` Module: `csv.reader`¶

csv.reader has sane defaults (documentation)
Use it in its basic form

read-simple.py¶

import csv

f = open('simple.csv')
rdr = csv.reader(f)
for row in rdr:
    print(row)

$ python read-simple.py
['5', 'Joerg', 'Faschingbauer']
['3', 'Sepp', 'Huber']
['7', 'Elizabeth', 'Queen']

And Tuple Unpacking?¶

One row is a list
Same number of elements (in the sane CSV case)
Tuple unpacking useful

read-simple-tupleunpacked.py¶

import csv

f = open('simple.csv')
rdr = csv.reader(f)
for num, firstname, lastname in rdr:
    print(f'ID: {num} --- First name: {firstname} - Last name: {lastname}')

$ python read-simple-tupleunpacked.py
ID: 5 --- First name: Joerg - Last name: Faschingbauer
ID: 3 --- First name: Sepp - Last name: Huber
ID: 7 --- First name: Elizabeth - Last name: Queen

Going Insane: CSV? SCSV?¶

CSV stands for “Comma Separated Values”
One popular CSV producer produces SCSV (“SemiColon Separated Values”) files though.

simple-semicolon.csv¶

5;Joerg;Faschingbauer
3;Sepp;Huber
7;Elizabeth;Queen

One would configure the delimiter in such a case …

read-simple-semicolon.py¶

import csv

f = open('simple-semicolon.csv')
rdr = csv.reader(f, delimiter=';')
for num, firstname, lastname in rdr:
    print(f'ID: {num} --- First name: {firstname} - Last name: {lastname}')

More Tuning Needed: Quoting ¶

But let’s use our original CSV (that with the commas)

simple.csv¶

5,Joerg,Faschingbauer
3,Sepp,Huber
7,Elizabeth,Queen

And let’s say, Queen Elizabeth would jump up and require us to spell her full name, “Queen Elizabeth, II”
The resulting file would suddenly have four fields in the “Queen” row

simple-broken.csv¶

5,Joerg,Faschingbauer
3,Sepp,Huber
7,Elizabeth, II,Queen

A correctly escaped CSV would look like so:

escaped.csv¶

5,Joerg,Faschingbauer
3,Sepp,Huber
7,"Elizabeth, II",Queen

Tuning Options ¶

Perhaps redundantly (the delimiter is ‘,’ by default, and the quotechar is ‘”’ by default), we would read escaped.csv like so,

read-escaped.py¶

import csv

f = open('escaped.csv')
rdr = csv.reader(f, delimiter=',', quotechar='"')
for num, firstname, lastname in rdr:
    print(f'ID: {num} --- First name: {firstname} - Last name: {lastname}')

$ python read-escaped.py
ID: 5 --- First name: Joerg - Last name: Faschingbauer
ID: 3 --- First name: Sepp - Last name: Huber
ID: 7 --- First name: Elizabeth, II - Last name: Queen

Many more tuning options are available; for a complete list see the documentation.

And Headings?¶

Another common practice among CSV producers is to export

the first line to contain the field names
remaining lines to contain spreadsheet data

If the original spreadsheet would look like so,

ID	Firstname	Lastname
5	Joerg	Faschingbauer
3	Sepp	Huber
7	Elizabeth, II	Queen

, the exported CSV would be,

headers.csv¶

ID,Firstname,Lastname
5,Joerg,Faschingbauer
3,Sepp,Huber
7,"Elizabeth, II",Queen

Enter `csv.DictReader`¶

This - the column names come with the CSV - is a case for csv.DictReader (see documentation).
csv.DictReader, when given no explicit fieldnames parameter, does exactly that: take the information from the first line as keys into a dictionary, and spit the rest out as dictionaries with this shape.

read-headers.py¶

import csv
from pprint import pprint

f = open('headers.csv')
rdr = csv.DictReader(f, delimiter=',', quotechar='"')
for record in rdr:
    pprint(record, width=30)

$ python read-headers.py
{'Firstname': 'Joerg',
 'ID': '5',
 'Lastname': 'Faschingbauer'}
{'Firstname': 'Sepp',
 'ID': '3',
 'Lastname': 'Huber'}
{'Firstname': 'Elizabeth, II',
 'ID': '7',
 'Lastname': 'Queen'}