I have a large text file with many columns of data. One of those columns is a ping counter” which is a hex number that should increase by 1 each line. Skipping a value, repeating a value, or starting over from 0x0000 indicates an error.
Date Time *****_PING [...]
******
06/26/2024 16:48:50.720 00000001 [...]
06/26/2024 16:54:50.720 00000002 [...]
06/26/2024 17:00:50.720 00000003 [...]
06/26/2024 17:06:50.720 00000004 [...]
.
.
.
06/26/2024 18:48:50.720 0000CCD1 [...]
06/26/2024 18:54:50.720 0000CCD2 [...]
06/26/2024 19:00:50.720 0000CCD2 [...] <- Repeated Value ERR
06/26/2024 19:06:50.720 0000CCD3 [...]
.
.
.
06/26/2024 22:48:50.720 000192D1 [...]
06/26/2024 22:54:50.720 000192D2 [...]
06/26/2024 23:00:50.720 000192D4 [...] <- Missed Value ERR
06/26/2024 23:06:50.720 000192D5 [...]
.
.
.
06/26/2024 23:48:50.720 002A0562 [...]
06/26/2024 23:54:50.720 002A0562 [...]
06/27/2024 00:00:50.720 00000000 [...] <- Reset ERR
06/27/2024 00:06:50.720 00000001 [...]
Using AWK, how do I compare the value of the two adjacent lines and print both only if they do not differ by exactly 1? I’m getting stuck with needing to deal with two records at the same time.
6
datafile=/large/text/file
hexcolumn=3
awk -v hc="$hexcolumn" '
BEGIN { print "ibase=16" }
{ print $hc~/^[0-9A-F]+$/ ? $hc : -1 }
' "$datafile" |
bc 2>/dev/null |
paste - "$datafile" |
awk '
{ n=$1; sub(/^[^t]*t/,"") }
NR>1 && p1!=n { print p0 ORS $0 }
{ p0=$0; p1=n+1 }
'
awk|bc
– convert hex column into decimal value or-1
if not hexpaste
– prepend generated values to matching lineawk
:- extract generated value from line
- if it is not one more than value from previous line, print both lines
- store line and generated value for next iteration
- extract generated value from line
Without a minimal-reproducible-example it’s a guess but this might be what you’re trying to do, using GNU awk for strtonum()
:
$ cat tst.awk
$3 ~ /^[[:xdigit:]]+$/ {
currPing = strtonum("0x"$3)
if ( (prev[3] ~ /^[[:xdigit:]]+$/) && (currPing - prevPing) != 1 ) {
print prev[0]
print $0
print "-----"
}
prevPing = currPing
}
{
split($0,prev)
prev[0] = $0
}
$ awk -f ./tst.awk file
06/26/2024 18:54:50.720 0000CCD2 [...]
06/26/2024 19:00:50.720 0000CCD2 [...] <- Repeated Value ERR
-----
06/26/2024 22:54:50.720 000192D2 [...]
06/26/2024 23:00:50.720 000192D4 [...] <- Missed Value ERR
-----
06/26/2024 23:48:50.720 002A0562 [...]
06/26/2024 23:54:50.720 002A0562 [...]
-----
06/26/2024 23:54:50.720 002A0562 [...]
06/27/2024 00:00:50.720 00000000 [...] <- Reset ERR
-----
It may not need to be that complicated if your real input isn’t as complicated as the input in the question (multi-line header and rows of presumably-to-be-ignored lines of ellipsis between apparent data lines).