Pandas: Selecting Rows (And Columns) With loc[]

import pandas as pd

persons = pd.DataFrame({
    'firstname': ['Joerg',                  'Johanna',           'Caro',              'Philipp'          ],
    'lastname':  ['Faschingbauer',          'Faschingbauer',     'Faschingbauer',     'Lichtenberger'    ],
    'email':     ['jf@faschingbauer.co.at', 'johanna@email.com', 'caro@email.com',    'philipp@email.com'],
    'age':       [56,                       27,                  25,                  37                 ],
})

Rows (And Columns) By Label

  • Label?

  • ⟶ Default index (more on indexes) is integer, so … just the same as iloc

    persons.loc[0]
    
    firstname                     Joerg
    lastname              Faschingbauer
    email        jf@faschingbauer.co.at
    age                              56
    Name: 0, dtype: object
    
    persons.loc[[0,1]]
    
    firstname lastname email age
    0 Joerg Faschingbauer jf@faschingbauer.co.at 56
    1 Johanna Faschingbauer johanna@email.com 27
  • More power: Pandas: Filters

Hiccup: Slices Are Inclusive

  • Contrary to iloc[], the end of a slice specifier is included in the slice

    persons.loc[0:1]
    
    firstname lastname email age
    0 Joerg Faschingbauer jf@faschingbauer.co.at 56
    1 Johanna Faschingbauer johanna@email.com 27
  • Why? Read on

Column Selection By Label

persons.loc[0, ['firstname', 'age']]
firstname    Joerg
age             56
Name: 0, dtype: object
persons.loc[[0, 1], ['firstname', 'age']]
firstname age
0 Joerg 56
1 Johanna 27

Columns By Slicing: Inclusive

persons.loc[1, 'firstname' : 'age']
firstname              Johanna
lastname         Faschingbauer
email        johanna@email.com
age                         27
Name: 1, dtype: object
  • Not consistent with Python’s definition of ranges

  • … but user friendly (hard to understand why 'age' had to be left out)

  • Rant: does slicing by column name bear any value?

Summary

  • Attention: inconsistent with rest of Python (and iloc[])

  • More (absolute) power by using filters with loc[]