I am trying to read a geopackage layer containing a geometry column, several text and float columns and some timestamp columns.The table contains around 300K records.
When using the default engine with geopandas readfile (fiona) there seems to be no issue apart from the fact that it takes forever to read the data.
Looking for a solution I found that the pyogrio engine should fasten the process. And indeed it does. The records are read within a minute. The issue is that when I include a timestamp column I get an error.
So my code looks like this when it takes forever to read the data
pip install pyogrio
import geopandas as gp
import fiona
import pandas as pd
import pyogrio
path = (“mnt2/Base.gpkg”)
columns_to_read = [‘BUILDING_ID’,’GEOMETRY’]
geb = gp.read_file(path, layer=’BUILDING’,columns=columns_to_read)
When I replace the last line with
geb = gp.read_file(path, layer=’BUILDING’, engine=’pyogrio’,columns=columns_to_read,use_arrow=True)
it reads 385k records within 5 seconds
However if I add a datetime column to the columns_to_read I end up with the following error:
ArrowInvalid: Casting from timestamp[ms] to timestamp[ns] would result in out of bounds timestamp: -59103216000000
With pyogrio there seems to be no option to specify the data types before reading the file and now I am seeking for another solution.
Rarts is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.