I’m converting the two column tab delimited file as JSON format, together with set of keys and argument sample_id = 'WGNP1000001'
.
Here’s my input and expected output format and the code. Appreciate any help.
Python to convert tab delimited file as json with supplied argument and additional keys
Tab delimited input file: WGNP1000001.list.txt
insert_size 447.3
insert_size_std 98.2
pct_properly_paired 97.9
pct_mapped 99.63
yield_bp_q30 1996315
mean_autosome_coverage 0.000644
pct_autosomes_15x 0.000016
mad_autosome_coverage 4
Expected Output without adding double quotes to the values and not adding float to the integer only values eg. mad_autosome_coverage
, yield_bp_q30
and pct_autosomes_15x
{
"sample": {
"id": "WGNP1000001"
},
"wgs_metrics": {
"insert_size": 447.3,
"insert_size_std": 98.2,
"mad_autosome_coverage": 4,
"mean_autosome_coverage": 0.000644,
"pct_autosomes_15x": 0.000016,
"pct_mapped": 99.63,
"pct_properly_paired": 97.9,
"yield_bp_q30": 1996315
}
}
But the following tried code writes (try1) the values with double quotes or (try2) adding the float to the the values of integer only.
Try1 – write the values with double quotes
def raw_data(input_metrics):
d = {}
with open(input_metrics) as f:
for line in f:
if not line.strip():
continue
row = line.split('t')
key = row[0]
value_str = row[1]
d[key] = value_str.replace("n", "")
#try:
# value = float(value_str.strip())
#except ValueError:
# value = value_str.strip()
#d[key] = value
return d
Output:
{
"sample": {
"id": "NA12878"
},
"wgs_metrics": {
"insert_size": "447.3",
"insert_size_std": "98.2",
"mad_autosome_coverage": "4",
"mean_autosome_coverage": "0.000644",
"pct_autosomes_15x": "0.000016",
"pct_mapped": "99.63",
"pct_properly_paired": "97.9",
"yield_bp_q30": "1996315"
}
}
Try2 – adding float to the integer only values (yield_bp_q30
, mad_autosome_coverage
) and log (pct_autosomes_15x
)
def raw_data(input_metrics):
d = {}
with open(input_metrics) as f:
for line in f:
if not line.strip():
continue
row = line.split('t')
key = row[0]
value_str = row[1]
try:
value = float(value_str.strip())
except ValueError:
value = value_str.strip()
d[key] = value
return d
Output:
{
"sample": {
"id": "WGNP1000001"
},
"wgs_metrics": {
"insert_size": 447.3,
"insert_size_std": 98.2,
"mad_autosome_coverage": 4.0,
"mean_autosome_coverage": 0.000644,
"pct_autosomes_15x": 1.6e-05,
"pct_mapped": 99.63,
"pct_properly_paired": 97.9,
"yield_bp_q30": 1996315.0
}
}
Updated code:
def raw_data(input_metrics):
d = {}
with open(input_metrics) as f:
for line in f:
if not line.strip():
continue
row = line.split('t')
key = row[0]
value_str = row[1]
try:
value = int(value_str)
except ValueError:
value = float(value_str.strip())
d[key] = value
return d
1