I am new to unit tests in general and Python’s unittest
in particular.
When trying to validate a pandas dataframe df
, I typically:
- Check whether
df
is empty (using one of the methods detailed here). - Check whether
df
contains the expected columns.
I would like to standardize the way I am running these tests.
The pandas documentation lists available assert functions (assert_frame_equal
, assert_series_equal
, assert_index_equal
and assert_extention_array_equal
), but as far as I understand I cannot use those to run the aforementioned tests.
I came up with the following class:
import pandas as pd
import unittest
class DataFrameTestCase(unittest.TestCase):
def test_if_dataframe_is_empty(self,df):
self.assertTrue(len(df) > 0)
def test_if_dataframe_contains_required_columns(self,df,columns):
self.assertTrue(set(df.columns.to_list()) == set(columns))
The following snippet…
data = [[412256, 142193, 4], [644402, 5208768 ,25]]
columns = ['easting', 'northing','elevation']
df = pd.DataFrame(data=data, columns=columns)
dataframetestcase = DataFrameTestCase()
dataframetestcase.test_if_dataframe_is_empty(df)
dataframetestcase.test_if_dataframe_contains_required_columns(df, columns)
…does not return any error.
On the other hand, passing an empty dataframe df
or a different columns
list raises an AssertionError: False is not true
error.
Is this the way to proceed or is there a built-it set of pandas
or unittest
assert functions that handle this in a better way?