I have a .dat file which ahs description column that contains text with commas,semi-colon, double quotes. The inner double quotes is causing a problem while loading the data to temp table. So, I just want to eliminate the column from the file or not load any data from that column.
I am looking for a python program that can help me remove inner double quotes (using replace function) or using pop/del to remove that column from that file.
See my code below, which has issues removing the column
import os
import csv
import sys
from shutil import copyfile
sourcedir=sys.argv[1]
sourcefile=sys.argv[2]
targetdir=sys.argv[3]
targetfile=sys.argv[4]
qualifierIN=sys.argv[5]
delimiterIN=sys.argv[6]
qualifierOUT=sys.argv[7]
delimiterOUT=sys.argv[8]
curDir = os.getcwd()
sep = os.path.sep
if os.path.exists(sourcedir):
print("Source Directory : {x}".format(x=sourcedir))
os.chdir(sourcedir)
if os.path.isfile(sourcefile):
print("Source File : {x}".format(x=sourcefile))
else:
print("Source File Not Found : ", sourcefile)
else:
print("Source Directory Not Found : ",sourcedir)
sys.exit(5)
if os.path.exists(targetdir):
print ("Target Directory : {x}".format(x=targetdir))
else:
print ("Target Directory Not Found :",targetdir)
os.makedirs(targetdir)
try:
with open(sourcefile, 'rt') as fin:
reader = csv.reader(fin, delimiter=delimiterIN, quotechar=qualifierIN )
os.chdir(targetdir)
with open(targetfile, 'wt') as fout:
writer = csv.writer(fout, quoting=csv.QUOTE_ALL, quotechar=qualifierOUT, delimiter=delimiterOUT)
try:
[row.pop(10) for row in reader:]
index = 0
for field in row:
row[index] = str(row[index].replace(delimiterOUT, "")) # delimiter
row[index] = str(row[index].replace(qualifierOUT, "")) # qualifier
row[index] = str(row[index].replace(';', ""))
row[index] = str(row[index].replace('"', '')) # unix newline
#row[index] = str(row[index].replace('rn', "")) # windows newline
index += 1
print(row)
writer.writerow(row)
except csv.Error as e:
print("Error: ", str(e))
sys.exit(25)
finally:
fout.close()
fin.close()
os.chdir(curDir)
except Exception as e:
print("Error : ", str(e))
sys.exit(35)
try:
os.chdir(sourcedir)
os.remove(sourcefile)
copyfile(targetfile, sourcefile)
except Exception as e:
print(str(e))
sys.exit(45)
sys.exit(0)
Data sample from the file:
Header:
invoice number,invoice date,vendor number,vendor site ID,supplier site CODE,invoice description,invoice currency code,invoice total amount,line number,line amount,line description,account code,business unit,business center,department,issue code,project,task number
Input:
315,1992-02-14,2501,641,PHIL,PRINT INT’ T,USD,117,75,71,”PAN – RETRIEVE AND REVIEW REPORT FOR “LIFE LOGO” IN CLASS (2029- 17) RE: REGISTR FEB 8, 2024; REPORT TO CLIENT AND REQUEST INSTRUCTIONS TO PROCEED; UPDATE RECORDS AND ACTION IN DATA.”,6400,5951,51820,1172,08,,
Output after running the above program:
“315”,”1992-02-14″,”2501″,”641″,”PHIL”,”PRINT INT’ T”,”USD”,”117″,”75″,”71″,”PAN – RETRIEVE AND REVIEW REPORT FOR “LIFE LOGO” IN CLASS (2029- 17) RE: REGISTR FEB 8″,”2024; REPORT TO CLIENT AND REQUEST INSTRUCTIONS TO PROCEED; UPDATE RECORDS AND ACTION IN DATA.”””,”6400″,”5951″,”51820″,”1172″,”08″,””,””
“
issue:
The “2024; REPORT TO CLIENT AND REQUEST INSTRUCTIONS TO PROCEED; UPDATE RECORDS AND ACTION IN DATA.” is shifted to next column in the database table.
So, I just want to remove this entire line description column from the file.
Please help this is very urgent
Thanks.
So, I just want to remove this entire line description column from the file. and just replace null or ” in the place of line description for all the rows in the file