I have the following logic which tries to upload a list of files (SmbFile) to AWS S3.
For each file it will check if it is already uploaded, and upload it if it isn’t uploaded yet.
// Upload a batch of day directory files
private void uploadBatch(List<SmbFile> batch) {
batch.parallelStream().forEach(s -> {
try (s) {
if (isEmpty(backedUpFileRepository.findByPath(s.getUncPath()))) {
// File not yet uploaded
log.info("File with path {} has not been uploaded yet, will continue to upload it...", s.getUncPath());
// Upload logic
...
......
} else {
// File already uploaded
log.info("File with path {} already uploaded, will skip uploading it.", s.getUncPath());
}
} catch (IOException ioe) {
// Log an error message and skip uploading the file
log.warn("Got IOException uploading file to S3. File name: {}", s.getUncPath());
} catch (Exception e) {
// Log an error message and skip uploading the file
log.error("Got Exception uploading uploading file to S3. File name: {}", s.getUncPath());
}
});
}
The problem I am facing is that, it appears that the same file is being processed twice by different threads, as indicated in the log.
I was under the impression that the forEach would ensure each object in the list would be processed once only (kindly correct if I am wrong).
May I know if I can get some insights on how to ensure no 2 threads will act on the same file in the list, and that each object will be processed once only?
Thank you very much.