I have a go-chi (golang) web application that is hosted behind nginx (reverse proxy – proxy_pass directives), that accepts file uploads that are then uploaded to AWS S3.
The request with a file is sent via a POST request with a content-type
of multipart/form-data.
The application then receives the request and creates a multipart.file
type, this entire file from my understanding is allocated in memory.
I would like to accept files that can be in the range of 5gb – 10gb, and cannot increase server memory of this size, as I only have 2gb.
S3 has multipart upload methods, which will allow to uploading of parts of the file to S3.
I have implemented this multipart file upload in go
below, The issue I am facing is I feel there is duplicated effort since go-http library is already allocating memory in parts to make the *multipart.file
type essentially doing what I am doing below.
-
Is there a way to take these part files created by the go-http library in the
ParseMultipartForm()
and upload to S3 directly to avoid the the allocation of the entire file in memory? (uploading the file stream as it comes in) -
Would it be better to just avoid the parsing and allocating of the
mutlipart.file
in go-http and load it directly to disk, then use the multipart s3 methods to upload the file parts? Although I believe I would still need to stream the entire file in memory to do so, which will not work as I do not have enough memory -
Is Nginx parsing and parting this file as well, before passing it to the webapp?
I have implemented a light POC of S3 multipart uploads below.
partBufSize := int64(10000000)
byteOffset := int64(0)
for {
remainingStreamBytes := uploadSize - bytesOffset
if remainingStreamBytes < partBufSize {
partBufSize = remainingStreamBytes
}
_, _ = uploadedFile.Seek(bytesOffset, io.SeekStart)
filePartBuffer := make([]byte, partBufSize)
_, err := uploadedFile.Read(filePartBuffer)
if err != nil {
if err == io.EOF {
break
}
log.Error().Msg(err.Error())
}
uploadPartRes, err := s3Client.UpParts(&aws.PartObjectInput{
Key: uploadPartsRequest.Key,
Body: bytes.NewReader(filePartBuffer),
PartNum: aws.Int64(int64(count)),
UpId: uploadPartsRequest.UploadId,
PartSize: aws.Int64(partBufSize),
})
if err != nil {
log.Error().Msg(err.Error())
}
bytesOffset = bytesOffset + partBufSize
compParts = append(compParts, &s3.CompletedPart{
ETag: uploadPartRes.ETag,
PartNumber: aws.Int64(int64(count)),
})
count++
}
6
The application then receives the request and creates a multipart.file type, this entire file from my understanding is allocated in memory.
I do not quite get this. If this is the way you’ve coded that, then just don’t do that: Go provides mime/multipart.Reader
which can be used to read multipart/form-data
-encoded payloads piecemeal.
To do that, you roll like this:
- Get the value of the
Content-Type
header field and parse it with themime.ParseMediaType
function to get the value of its “boundary” parameter—a string used to separate parts in the multipart payload. - Create a
multipart.Reader
out of the request’s body and the boundary string. - Iterate over the parts; at each part, you’ll get another
io.Reader
which will provide the part’s bytes. Do whatever you want with those bytes—possibly shoveling them into an active POST request usingio.Copy
orio.CopyN
.
This way you will control the exact amount of buffering you want to provide.
1