Pandas: Filters

import pandas as pd

persons = pd.DataFrame({
    'firstname': ['Joerg',                  'Johanna',           'Caro',              'Philipp'          ],
    'lastname':  ['Faschingbauer',          'Faschingbauer',     'Faschingbauer',     'Lichtenberger'    ],
    'email':     ['jf@faschingbauer.co.at', 'johanna@email.com', 'caro@email.com',    'philipp@email.com'],
    'age':       [56,                       27,                  25,                  37                 ],
})

Basic Filter Example

persons.loc[persons['lastname'] == 'Faschingbauer']
firstname lastname email age
0 Joerg Faschingbauer jf@faschingbauer.co.at 56
1 Johanna Faschingbauer johanna@email.com 27
2 Caro Faschingbauer caro@email.com 25
  • Comprehensive

  • More operators ⟶ later

  • How does that work?

Filter? Bitmap!

  • A filter is a boolean expression

  • Returns a Series of True and False values ⟶ a bitmap

  • Given to loc[] to select rows of interest

    flt = (persons['lastname'] == 'Faschingbauer')
    flt
    
    0     True
    1     True
    2     True
    3    False
    Name: lastname, dtype: bool
    
    type(flt)
    
    pandas.core.series.Series
    
  • Content of persons

    persons
    
    firstname lastname email age
    0 Joerg Faschingbauer jf@faschingbauer.co.at 56
    1 Johanna Faschingbauer johanna@email.com 27
    2 Caro Faschingbauer caro@email.com 25
    3 Philipp Lichtenberger philipp@email.com 37
  • Filter applied to persons with loc[]

    persons.loc[flt]
    
    firstname lastname email age
    0 Joerg Faschingbauer jf@faschingbauer.co.at 56
    1 Johanna Faschingbauer johanna@email.com 27
    2 Caro Faschingbauer caro@email.com 25

Confusion: Filter/Column Addressing

  • User friendliness: filter can be directly applied to DataFrame

    persons[flt]
    
    firstname lastname email age
    0 Joerg Faschingbauer jf@faschingbauer.co.at 56
    1 Johanna Faschingbauer johanna@email.com 27
    2 Caro Faschingbauer caro@email.com 25
  • Looks like columns were selected

  • confusion

Column Selection And Filters

  • Filters select rows, just like persons.loc[0] selects row by label

  • Consequently, can be combined with column selection

persons.loc[flt, 'firstname']
0      Joerg
1    Johanna
2       Caro
Name: firstname, dtype: object

Boolean Expressions: Bitwise Operators! Braces!

flt = (persons['firstname'] == 'Joerg') & (persons['lastname'] == 'Faschingbauer') | (persons['firstname'] == 'Philipp')
  • Confusion: braces around == are important!

  • Bitwise & binds stronger than ==

  • Wish they had used the normal boolean operators instead

  • But those cannot be overloaded (they are more like control flow tools)

Anyway,

persons.loc[flt]
firstname lastname email age
0 Joerg Faschingbauer jf@faschingbauer.co.at 56
3 Philipp Lichtenberger philipp@email.com 37

Negation: ~

  • Consequently 🙄, bitwise negation was chosen to mean logical NOT

persons.loc[~flt]
firstname lastname email age
1 Johanna Faschingbauer johanna@email.com 27
2 Caro Faschingbauer caro@email.com 25

Neat Helpers

flt = persons['firstname'].isin(['Caro', 'Philipp'])
persons[flt]
firstname lastname email age
2 Caro Faschingbauer caro@email.com 25
3 Philipp Lichtenberger philipp@email.com 37
flt = persons['firstname'].str.startswith('J')
persons[flt]
firstname lastname email age
0 Joerg Faschingbauer jf@faschingbauer.co.at 56
1 Johanna Faschingbauer johanna@email.com 27