If I have a partitioned data and I was to filter using the filters
argument in pd.read_parquet
how can I accomplish that? For example:
import pandas as pd
data = {
"ID": [1, 2, 3],
"Value": ["A", "B", "C"]
}
df = pd.DataFrame(data)
parquet_folder = "example_partitioned"
df.to_parquet(parquet_folder, index=False, partition_cols=["Value"])
So I have partitioned data structure on disk. If I construct a filter condition like this it works:
filter_conditions = [
("Value", "==", "A")
]
pd.read_parquet(parquet_folder, filters=filter_conditions)
But if I want multiple conditions (i.e. A OR B) the following does not work:
filter_conditions_two = [
("Value", "==", "A"),
("Value", "==", "B")
]
pd.read_parquet(parquet_folder, filters=filter_conditions_two)
That instead returns a empty data frame. Is this possible with filters?