I have a tab delim dataset1
NC_044998.1 14582 80739 LOC100221041
NC_044998.1 31388 68748 DCBLD2
NC_044998.1 80874 299341 CMSS1
NC_044998.1 112495 297570 FILIP1L
NC_044998.1 287349 289742 LOC116808959
NC_044998.1 300404 343805 TBC1D23
NC_044998.1 333622 344667 NIT2
NC_044998.1 346168 368957 TOMM70
NC_044998.1 371654 380427 LNP1
NC_044998.1 387231 413422 TMEM45A
and another tab delim dataset2
1 NC_044998.1 15001 6.040368 2.038993e-04
1 NC_044998.1 25002 0.000000 3.333334e-01
1 NC_044998.1 35003 2.309260 4.638924e-03
1 NC_044998.1 45004 3.438428 5.053365e-03
1 NC_044998.1 55005 1.086369 9.663565e-02
1 NC_044998.1 65006 3.250019 8.298793e-04
1 NC_044998.1 75007 1.081163 8.039542e-03
1 NC_044998.1 85008 0.186722 8.158607e-02
1 NC_044998.1 95009 2.236803 3.256445e-03
1 NC_044998.1 105010 0.089978 2.846438e-01
I want to do for each line of dataset1 if string in col 1 of dataset1 matches string in col2 of dataset2 and value in col 3 of dataset2 is between values in col2 and col3 of dataset1, print line of dataset2 followed by line of dataset1, else print line of dataset2 followed by “.” in each column.
Im using
NR==FNR {
q[++n] = $0
f1[n] = $2
f2[n] = $3
next
}
# process file2
{
for (i = 1; i <= n; i++) {
if ($1 == f1[i] && (f2[i] > $2 && f2[i] < $3))
print $0"t"q[i]
else
print q[i]"t"".""t"".""t"".""t""."
}
}
but it’s not working, as it just prints whichever file is last followed by the dot columns