CSV Files

CSV in its Origins

  • Simple in principle

  • Line-based file

    • Each line has a record

    • Each record has the same number of fields

    • Each field has a meaning

Here is a simple CSV file in its very original form

5,Joerg,Faschingbauer
3,Sepp,Huber
7,Elizabeth,Queen
  • No heading

  • Commas (“,”) as field separators

  • ASCII encoded

The csv Module: csv.reader

  • csv.reader has sane defaults (documentation)

  • Use it in its basic form

import csv

f = open('simple.csv')
rdr = csv.reader(f)
for row in rdr:
    print(row)
$ python read-simple.py
['5', 'Joerg', 'Faschingbauer']
['3', 'Sepp', 'Huber']
['7', 'Elizabeth', 'Queen']

And Tuple Unpacking?

  • One row is a list

  • Same number of elements (in the sane CSV case)

  • Tuple unpacking useful

import csv

f = open('simple.csv')
rdr = csv.reader(f)
for num, firstname, lastname in rdr:
    print(f'ID: {num} --- First name: {firstname} - Last name: {lastname}')
$ python read-simple-tupleunpacked.py
ID: 5 --- First name: Joerg - Last name: Faschingbauer
ID: 3 --- First name: Sepp - Last name: Huber
ID: 7 --- First name: Elizabeth - Last name: Queen

Going Insane: CSV? SCSV?

  • CSV stands for “Comma Separated Values”

  • One popular CSV producer produces SCSV (“SemiColon Separated Values”) files though.

5;Joerg;Faschingbauer
3;Sepp;Huber
7;Elizabeth;Queen

One would configure the delimiter in such a case …

import csv

f = open('simple-semicolon.csv')
rdr = csv.reader(f, delimiter=';')
for num, firstname, lastname in rdr:
    print(f'ID: {num} --- First name: {firstname} - Last name: {lastname}')

More Tuning Needed: Quoting

But let’s use our original CSV (that with the commas)

5,Joerg,Faschingbauer
3,Sepp,Huber
7,Elizabeth,Queen
  • And let’s say, Queen Elizabeth would jump up and require us to spell her full name, “Queen Elizabeth, II”

  • The resulting file would suddenly have four fields in the “Queen” row

5,Joerg,Faschingbauer
3,Sepp,Huber
7,Elizabeth, II,Queen

A correctly escaped CSV would look like so:

5,Joerg,Faschingbauer
3,Sepp,Huber
7,"Elizabeth, II",Queen

Tuning Options

Perhaps redundantly (the delimiter is ‘,’ by default, and the quotechar is ‘”’ by default), we would read escaped.csv like so,

import csv

f = open('escaped.csv')
rdr = csv.reader(f, delimiter=',', quotechar='"')
for num, firstname, lastname in rdr:
    print(f'ID: {num} --- First name: {firstname} - Last name: {lastname}')
$ python read-escaped.py
ID: 5 --- First name: Joerg - Last name: Faschingbauer
ID: 3 --- First name: Sepp - Last name: Huber
ID: 7 --- First name: Elizabeth, II - Last name: Queen

Many more tuning options are available; for a complete list see the documentation.

And Headings?

Another common practice among CSV producers is to export

  • the first line to contain the field names

  • remaining lines to contain spreadsheet data

If the original spreadsheet would look like so,

ID

Firstname

Lastname

5

Joerg

Faschingbauer

3

Sepp

Huber

7

Elizabeth, II

Queen

, the exported CSV would be,

ID,Firstname,Lastname
5,Joerg,Faschingbauer
3,Sepp,Huber
7,"Elizabeth, II",Queen

Enter csv.DictReader

  • This - the column names come with the CSV - is a case for csv.DictReader (see documentation).

  • csv.DictReader, when given no explicit fieldnames parameter, does exactly that: take the information from the first line as keys into a dictionary, and spit the rest out as dictionaries with this shape.

import csv
from pprint import pprint

f = open('headers.csv')
rdr = csv.DictReader(f, delimiter=',', quotechar='"')
for record in rdr:
    pprint(record, width=30)
$ python read-headers.py
{'Firstname': 'Joerg',
 'ID': '5',
 'Lastname': 'Faschingbauer'}
{'Firstname': 'Sepp',
 'ID': '3',
 'Lastname': 'Huber'}
{'Firstname': 'Elizabeth, II',
 'ID': '7',
 'Lastname': 'Queen'}