I have a very large number of stata .dta file version 69 files that I need to convert into .csv or some other excel readable format. I should note that I only know how to get the version number from the pandas error message, but opening it as a text file shows something very different from a modern stata 118 file so I suspect that it isn’t just an issue of the version number getting misencoded.
Methods tried so far (file type not supported)
-Python pandas library
-R foreign library
-R rio library
Here is my pandas code:
import pandas as pd
data = pd.read_stata('3-charge.DTA')
data.to_csv('3-charge.csv')
And the stacktrace:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[5], line 3
1 import pandas as pd
----> 3 data = pd.read_stata('3-charge.DTA')
4 data.to_csv('3-charge.csv')
File ~anaconda3Libsite-packagespandasiostata.py:2150, in read_stata(filepath_or_buffer, convert_dates, convert_categoricals, index_col, convert_missing, preserve_dtypes, columns, order_categoricals, chunksize, iterator, compression, storage_options)
2147 return reader
2149 with reader:
-> 2150 return reader.read()
File ~anaconda3Libsite-packagespandasiostata.py:1708, in StataReader.read(self, nrows, convert_dates, convert_categoricals, index_col, convert_missing, preserve_dtypes, columns, order_categoricals)
1696 @Appender(_read_method_doc)
1697 def read(
1698 self,
(...)
1706 order_categoricals: bool | None = None,
1707 ) -> DataFrame:
-> 1708 self._ensure_open()
1710 # Handle options
1711 if convert_dates is None:
File ~anaconda3Libsite-packagespandasiostata.py:1182, in StataReader._ensure_open(self)
1178 """
1179 Ensure the file has been opened and its header data read.
1180 """
1181 if not hasattr(self, "_path_or_buf"):
-> 1182 self._open_file()
File ~anaconda3Libsite-packagespandasiostata.py:1212, in StataReader._open_file(self)
1209 self._path_or_buf = BytesIO(handles.handle.read())
1210 self._close_file = self._path_or_buf.close
-> 1212 self._read_header()
1213 self._setup_dtype()
File ~anaconda3Libsite-packagespandasiostata.py:1294, in StataReader._read_header(self)
1292 self._read_new_header()
1293 else:
-> 1294 self._read_old_header(first_char)
1296 self._has_string_data = len([x for x in self._typlist if type(x) is int]) > 0
1298 # calculate size of a data record
File ~anaconda3Libsite-packagespandasiostata.py:1473, in StataReader._read_old_header(self, first_char)
1471 self._format_version = int(first_char[0])
1472 if self._format_version not in [104, 105, 108, 111, 113, 114, 115]:
-> 1473 raise ValueError(_version_error.format(version=self._format_version))
1474 self._set_encoding()
1475 self._byteorder = ">" if self._read_int8() == 0x1 else "<"
ValueError: Version of given Stata file is 69. pandas supports importing versions 105, 108, 111 (Stata 7SE), 113 (Stata 8/9), 114 (Stata 10/11), 115 (Stata 12), 117 (Stata 13), 118 (Stata 14/15/16),and 119 (Stata 15/16, over 32,767 variables).
There is a method given in this post that seems to have worked, but unfortunately I am very new to this sort of thing and it went over my head. (And I am new enough to stack overflow that I can’t comment to ask for clarification)
If someone could give me a very beginner friendly explanation of the above method, or has another method that works with files that old, I would greatly appreciate it
Atalanta is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.