Drop duplicated values of files in a folder based on two columns R
I have over 250 large .txt
files (each approximately 1GB) in a folder. I would like to remove duplicated rows based on two columns of id1
and id2
being mindful of my Macbook’s memory limitations.
Aggregating rows in a table using multiple aggregate operations based on column name in R
I have a table with web site pages and their visits. In some cases there are rows that are duplicate. I want to deduplicate rows based on yearMonth and page columns while summing users and sessions columns.