Pandas: Indexes#
import pandas as pd
persons = pd.DataFrame({
'firstname': ['Joerg', 'Johanna', 'Caro', 'Philipp' ],
'lastname': ['Faschingbauer', 'Faschingbauer', 'Faschingbauer', 'Lichtenberger' ],
'email': ['jf@faschingbauer.co.at', 'johanna@email.com', 'caro@email.com', 'philipp@email.com'],
'age': [56, 27, 25, 37 ],
})
Default Index: Row Number#
persons
firstname | lastname | age | ||
---|---|---|---|---|
0 | Joerg | Faschingbauer | jf@faschingbauer.co.at | 56 |
1 | Johanna | Faschingbauer | johanna@email.com | 27 |
2 | Caro | Faschingbauer | caro@email.com | 25 |
3 | Philipp | Lichtenberger | philipp@email.com | 37 |
See how rows are numbered
No column name given
⟶ default index
persons.index
RangeIndex(start=0, stop=4, step=1)
Setting Custom Index#
Notice how
email
appears to be unique⟶ could be used as an index
persons.set_index('email')
firstname lastname age email jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27 caro@email.com Caro Faschingbauer 25 philipp@email.com Philipp Lichtenberger 37 This does not change anything
Returns modified copy (could be assigned to another variable that you continue to work with, for example)
persons
is still the same as beforepersons
firstname lastname email age 0 Joerg Faschingbauer jf@faschingbauer.co.at 56 1 Johanna Faschingbauer johanna@email.com 27 2 Caro Faschingbauer caro@email.com 25 3 Philipp Lichtenberger philipp@email.com 37
Setting Custom Index, inplace=True
#
Many (but not all)
DataFrame
methods support aninplace
parameterDefault
False
⟶ no change
Returns a modified copy of the
DataFrame
object
Nice for trying around on a large dataset that we don’t want to damage
Add
inplace
if everything works⟶ No return value
persons.set_index('email', inplace=True)
Modified object in-place
persons
firstname lastname age email jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27 caro@email.com Caro Faschingbauer 25 philipp@email.com Philipp Lichtenberger 37 Index has changed
persons.index
Index(['jf@faschingbauer.co.at', 'johanna@email.com', 'caro@email.com', 'philipp@email.com'], dtype='object', name='email')
Custom Index, And loc[]
#
loc[]
selects by row label (⟶ index)Row labels are not row numbers anymore ⟶ cannot be used as row labels
persons.loc[0]
--------------------------------------------------------------------------- KeyError Traceback (most recent call last) File ~/My-Environments/jfasch-home/lib64/python3.13/site-packages/pandas/core/indexes/base.py:3812, in Index.get_loc(self, key) 3811 try: -> 3812 return self._engine.get_loc(casted_key) 3813 except KeyError as err: File pandas/_libs/index.pyx:167, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/index.pyx:196, in pandas._libs.index.IndexEngine.get_loc() File pandas/_libs/hashtable_class_helper.pxi:7088, in pandas._libs.hashtable.PyObjectHashTable.get_item() File pandas/_libs/hashtable_class_helper.pxi:7096, in pandas._libs.hashtable.PyObjectHashTable.get_item() KeyError: 0 The above exception was the direct cause of the following exception: KeyError Traceback (most recent call last) Cell In[9], line 1 ----> 1 persons.loc[0] File ~/My-Environments/jfasch-home/lib64/python3.13/site-packages/pandas/core/indexing.py:1191, in _LocationIndexer.__getitem__(self, key) 1189 maybe_callable = com.apply_if_callable(key, self.obj) 1190 maybe_callable = self._check_deprecated_callable_usage(key, maybe_callable) -> 1191 return self._getitem_axis(maybe_callable, axis=axis) File ~/My-Environments/jfasch-home/lib64/python3.13/site-packages/pandas/core/indexing.py:1431, in _LocIndexer._getitem_axis(self, key, axis) 1429 # fall thru to straight lookup 1430 self._validate_key(key, axis) -> 1431 return self._get_label(key, axis=axis) File ~/My-Environments/jfasch-home/lib64/python3.13/site-packages/pandas/core/indexing.py:1381, in _LocIndexer._get_label(self, label, axis) 1379 def _get_label(self, label, axis: AxisInt): 1380 # GH#5567 this will fail if the label is not present in the axis. -> 1381 return self.obj.xs(label, axis=axis) File ~/My-Environments/jfasch-home/lib64/python3.13/site-packages/pandas/core/generic.py:4320, in NDFrame.xs(self, key, axis, level, drop_level) 4318 new_index = index[loc] 4319 else: -> 4320 loc = index.get_loc(key) 4322 if isinstance(loc, np.ndarray): 4323 if loc.dtype == np.bool_: File ~/My-Environments/jfasch-home/lib64/python3.13/site-packages/pandas/core/indexes/base.py:3819, in Index.get_loc(self, key) 3814 if isinstance(casted_key, slice) or ( 3815 isinstance(casted_key, abc.Iterable) 3816 and any(isinstance(x, slice) for x in casted_key) 3817 ): 3818 raise InvalidIndexError(key) -> 3819 raise KeyError(key) from err 3820 except TypeError: 3821 # If we have a listlike key, _check_indexing_error will raise 3822 # InvalidIndexError. Otherwise we fall through and re-raise 3823 # the TypeError. 3824 self._check_indexing_error(key) KeyError: 0
New row label:
email
persons.loc['jf@faschingbauer.co.at']
firstname Joerg lastname Faschingbauer age 56 Name: jf@faschingbauer.co.at, dtype: object
persons.loc[['jf@faschingbauer.co.at', 'johanna@email.com']]
firstname lastname age email jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27
Custom Index, And iloc[]
#
iloc[]
selects by row number⟶ still valid as before
persons.iloc[0]
firstname Joerg
lastname Faschingbauer
age 56
Name: jf@faschingbauer.co.at, dtype: object
persons.iloc[[0, 1]]
firstname | lastname | age | |
---|---|---|---|
jf@faschingbauer.co.at | Joerg | Faschingbauer | 56 |
johanna@email.com | Johanna | Faschingbauer | 27 |
Sorting DataFrame
Object By Index Column#
DataFrame.sort_index()
: noninplace
by default ⟶ returns modified copypersons.sort_index(ascending=True)
firstname lastname age email caro@email.com Caro Faschingbauer 25 jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27 philipp@email.com Philipp Lichtenberger 37 Sorting in place
persons.sort_index(ascending=True, inplace=True)
persons
firstname lastname age email caro@email.com Caro Faschingbauer 25 jf@faschingbauer.co.at Joerg Faschingbauer 56 johanna@email.com Johanna Faschingbauer 27 philipp@email.com Philipp Lichtenberger 37
Links#
Corey Schafer: Python Pandas Tutorial (Part 3): Indexes - How to Set, Reset, and Use Indexes
Data School: How do I use the MultiIndex in pandas?