Relative Content

Tag Archive for rdplyrapache-arrow

Select and convert columns to write_dataset in Arrow

everyone
I want to convert the selected fields and field formats of csv files of different years into parquet format.
(The reason arrow is used here is because it is difficult to import data into R if it exceeds 30GB)

Left_join function use in Arrow

everyone!
Recently, I am using arrow to process data exceeding 500G. The reason I use Arroww is because R seems to be unable to import data exceeding a certain GB.
I found that similar problems often occur during left_join.
I want the key of ID attribute of left_join to be the same, but she always shows that there is a problem with other fields. Does anyone know how to deal with it?