I am trying to extract a representative sample of a 200MB csv file by writing the header and every 500th row to a new file for testers to use. My first attempt was knowingly sub-optimal but seemed valid for a 5 minute quick hack, as I relied on out-file -append to add each row matching the modulus condition to the destination file on network share, but what I found is that I had slightly fewer rows in the sample file than expected. Repeated runs produced slightly different numbers of rows in the destination file (expected 2014, actual ranged between 1992-2011).
I re-wrote the script to gather the results of the foreach into a variable and output once at the end. That worked as expected (2014 lines), but curious as to the cause of the failure. I know it’s repeatedly opening/closing the destination file but I’d have expected it to report an error.
This is the original version of the script:
$destfile = "\UNCSHAREFolderExport_Sample_$(get-date -Format "yyyyMMdd_HHmmss").txt"
$Original = get-content \\UNCSHAREFolder200MB_Export_20231208_1545.txt
[int64]$ln = 0
[int64]$SampleCount = 0
foreach ($line in $Original) {
$ln++
if ($ln -eq 1 -or $ln % 500 -eq 0) {
$line | Out-File -FilePath $destfile -Append -ErrorAction Stop
$SampleCount++
}
}
write-host $SampleCount
(get-content $destfile).count
The error does NOT occur if I use a location on my local hard drive for the destination file.
I checked the 2nd (correct output) version output against the first, and can see that the missing lines are irregularly spaced thoughout the file (e.g. missing lines at 56,359,368,405,600,700,702,788,854…).
I’m running this in PS Core 7.4.2 on a Windows 10 workstation joined to AD domain.
Autumnal Sigh is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2