I used Go’s profiling tools to see how much memory reading a file 100 short lines (50 chars or so) at a time will consume. I’m flabbergasted.
5835.67MB 49.98% 49.98% 8882.24MB 76.07% reader.ReadFile
3046.57MB 26.09% 76.07% 3046.57MB 26.09% bufio.(*Scanner).Text (inline)
Here is the tiny program:
package main
import (
"bufio"
"fmt"
"io"
)
func ReadFile(file io.Reader, buffer []byte) error {
scanner := bufio.NewScanner(file)
scanner.Buffer(buffer, 32)
scanner.Split(bufio.ScanLines)
var s []string
for scanner.Scan() {
txt := scanner.Text()
if txt == "" {
// TODO: this is an error actually... not fatal but should be reported
continue
}
s = append(s, txt)
if len(s) >= 100 {
s = nil // Clear the slice to free memory
}
}
// Check for errors in the scanner
if err := scanner.Err(); err != nil {
fmt.Printf("scan buffer for new lines: %sn", err.Error())
return fmt.Errorf("scan buffer for new lines: %w", err)
}
return nil
}
And here is the benchmark program:
package main
import (
"bytes"
"fmt"
"os"
"testing"
)
const fileName = "txtfortesting.txt"
func BenchmarkTestEntireProgram(b *testing.B) {
writeTxtToFile()
b.ResetTimer()
for i := 0; i < b.N; i++ {
file, err := os.Open(fileName)
if err != nil {
fmt.Printf("open file: %s", err.Error())
panic(err)
}
defer file.Close()
buf := make([]byte, 1024)
ReadFile(file, buf)
}
}
func writeTxtToFile() {
const (
giga = 1024 * 1024 * 10
)
file, err := os.Create(fileName)
if err != nil {
panic(err)
}
defer file.Close()
fileBuf := &bytes.Buffer{}
for i := range giga {
_, err = fileBuf.WriteString(fmt.Sprintf("%sn", fmt.Sprintf("some text %d", i)))
if err != nil {
panic(err)
}
if i % 1024 == 0 {
file.Write(fileBuf.Bytes())
fileBuf = &bytes.Buffer{}
}
}
}
Can someone tell me please, how in the world is it possible that the program consumes more memory than the size of the file it’s reading? Also, I’m reading 100 lines at a time and clearing the memory each 100 lines. What can I do to improve the memory performance?
P.S. To run the profiler simply do:
go mod init leakyreader
[copy paste the code into main.go and main_test.go]
go test -bench=. -benchmem -memprofile mem.prof -cpuprofile cpu.prof -benchtime=10s
go tool pprof mem.prof