I have a very large CSV file that won’t fit entirely into memory. I want to be able to read the file into chunks, and then chain a series of operations together to process the results. Lastly, I need to aggregate the final results together, maintaining the original order of the data.
The operations are the sort like GetColumns(start int, end int) , GetRows(start int, end int), SumRows(),
etc. Thus, I might have the following code:
err := Read("input.csv").
With(GetColumns(3, 5)).
With(GetRows(7, 20)).
With(SumRow()).
Write("output.csv")
Is it possible to use goroutines to process the data at each operation? How do I pass the data from each operation, and lastly how do I take the final results and write them to a file while maintaining the original order of the data?
In the case of just reading the whole file into memory, I figured I’d need an interface and struct like so:
type TableProcessor struct {
Operations []TableOperation
}
type TableOperation interface {
Apply(table [][]string) [][]string
}
func (tp *TableProcessor) With(op TableOperation) *TableProcessor {
tp.Operations = append(tp.Operations, op)
return tp
}
And I figure each operation will have to look something like this:
func GetRows(start, end int) TableOperation {
return &GetRowsOperation{Start: start, End: end}
}
type GetRowsOperation struct {
Start, End int
}
func (op *GetRowsOperation) Apply(table [][]string) [][]string {
return table[op.Start:op.End]
}
But I am not sure how the interfaces and structs should be if I am using chunking. And I really am not sure about how to incorporate goroutines into the mix.