I have got a csv file (records .csv) which has 400,000 records without any header and each row in it looks like below format:
"1","222","",3333","666",777",""
"2","234","","345","234","456",""
Here first column that is “1” and “2” in above example is unique number (lets call it URN ) defining each row
Task: i got a list of 1000 urn numbers in a txt file (urn.txt) and I have been asked to edit above csv file(records .csv) such that to keep only these thousand records and delete all other rows..
Urn.txt files have one column specifying unique number (urn) and looks like below:
1
13
16
Manually it is taking time, is it possible to do it from poweshell script?
I know the logic that i need my script to first read csv file then break the row into object using comma delimiter, than i need to read txt file and i need to use loop to read each row from txt file and then find that number in csv file first column and if it is there than just copy whole row into new csv file.
I am not a programmer and i have no skill set of powershell so i am finding it difficult to write this into script. Can anyone help me
user27393232 is a new contributor to this site. Take care in asking for clarification, commenting, and answering.
Check out our Code of Conduct.
2
While it is normally preferable to perform OO processing, via ConvertFrom-Csv
, in your case plain-text processing enables a faster solution, which matters with large input files.
# Create a hash set for all URNs.
$hs =
[System.Collections.Generic.HashSet[string]] (Get-Content -ReadCount 0 urn.txt)
# Read all lines lazily from the CSV file and pass only those
# lines through that are in the hash set.
[IO.File]::ReadLines((Convert-Path records.csv)) |
ForEach-Object {
if ($hs.Contains(($_ -split ',')[0].Trim('"'))) {
$_
}
}
The above directly outputs the results.
-
To output to a file, append
| Set-Content ...
, e.g.| Set-Content filtered.csv
; use an-Encoding
argument to control the character encoding. -
If you want to write back to the input file (be sure to make a backup copy first), use
[IO.File]::ReadAllLines()
rather than[IO.File]::ReadLines()
or(Get-Content -ReadCount 0 records.csv)
(note the parentheses), but note that doing so reads the entire file into an array of lines into memory at once.
The alternative is to write to a temporary output file first, and then replace the original file with the temporary one.
Note:
-
[IO.File]::ReadLines((Convert-Path records.csv))
is used purely for performance reasons, because the PowerShell-idiomatic equivalent,Get-Content
records.csv
, is regrettably slow – see GitHub issue #7537 for a discussion and a possible future remedy. -
The assumption is that none of the rows in your CSV file span more than one line.