i’m trying to read csv files using the python library pyarrow but i got an issue while reading file because for some fields i have “N” for values (it means that this is a null value).
the problem is that i can’t manage to skip this value while reading …
here is my code :
parse_options = csv.ParseOptions(delimiter=chr(1))
read_options = csv.ReadOptions(column_names=columns)
convert_options = csv.ConvertOptions(column_types=schema_table, include_columns=columns, include_missing_columns=True, null_values=True)
with hdfs.open_input_file("path") as f:
csv_file = csv.read_csv(f, read_options=read_options, parse_options=parse_options, convert_options=convert_options)
The error that i have :
ArrowInvalid: In CSV column #59: CSV conversion error to int64: invalid value 'N'
when i tried with a file with no value between the separators i have no problem …
many thanks!
maxgm is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.