My previous solution and the problem with it
There is an option for reading .dbc files (read.dbc
in R) which was working fine for me up until now. Not sure of the reason, but it’s taking forever to read relatively small files (up to 500k rows, about 10 columns) and I have a ton of them: one file of that per month for 25 states in 15 years. Each takes about 5 minutes, which would mean, without any errors in the code, about one month to read them all. This is unpractical. Thus, I would like to find a solution to pass them to a friendlier format, like .csv, which upon conversion takes a lot less time to read into RAM.
What are .dbc files?
I’m not entirely sure, but it seems to be a type of .dbf files but compressed, used in this specific context I’m at of health data by the Brazilian Government, Datasus.
Problem with converting to .csv
I found multiple solutions online, but each of which has a drawback for me. First off, some like that of danicat seems to be aimed at, or need packages that run on, Linux. I’m on Windows. Others, like DBF can’t actually grasp .dbc files, thus, the better option I found so far is that of greatjapa, which basically uses a simpledbf. Now, when I run simpledbf locally I’m able to get the file loaded into python, but upon conversion I get the following error:
>>> from dbfread import DBF
>>>
>>> # Define the file paths
>>> dbc_file = "RDAC0901.dbc"
>>> csv_file = "RDAC0901.csv"
>>>
>>> dbf = Dbf5(dbc_file, codec='ISO-8859-1')
>>> dbf.to_csv(csv_file)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:UserscalebepAppDataRoamingPythonPython312site-packagessimpledbfsimpledbf.py", line 172, in to_csv
for n, result in enumerate(self._get_recs()):
File "C:UserscalebepAppDataRoamingPythonPython312site-packagessimpledbfsimpledbf.py", line 612, in _get_recs
value = float(value)
^^^^^^^^^^^^
Any of those files can be retrieved using a simple curl command:
curl -O ftp://ftp.datasus.gov.br/dissemin/publicos/SIHSUS/200801_/Dados/RDAC0901.dbc
What am I missing?
I’ve tried many options in python, and even some online converters. That doesn’t seem to be any easy solution. I’m out of ideas.
4