I’ve been tasked to do an audit of all the company wide Excel files, that could possibly be using SQL Server connections to source data? This is to determine the scope of work required to convert legacy connection strings (ODBC) over to a stronger and encrypted framework, of which I am still researching to determine what is best practice here.
I tried Python using this script, but I am constantly getting “none” found for a connection string, but I know there is one in a sample XLSX file I’ve been using to test.
import os
import glob
import zipfile
from lxml import etree
def list_excel_files(folder_path):
return glob.glob(os.path.join(folder_path, "*.xlsx"))
def extract_connection_string_from_custom_xml(file_path):
with zipfile.ZipFile(file_path, 'r') as z:
for item in z.namelist():
if item.startswith('connection'):
xml_content = z.read(item)
tree = etree.XML(xml_content)
conn_string = search_for_connection_string(tree)
if conn_string:
return conn_string
return None
def search_for_connection_string(tree):
connection_elements = tree.xpath("connection")
for elem in connection_elements:
parent = elem.getparent()
if parent is not None and parent.tag == 'cell':
return parent.text
return None
def main(folder_path):
excel_files = list_excel_files(folder_path)
connection_strings = {}
for file_path in excel_files:
conn_string = extract_connection_string_from_custom_xml(file_path)
connection_strings[file_path] = conn_string
return connection_strings
# Example usage:
folder_path = "G:\test\"
connection_strings = main(folder_path)
for file_path, conn_string in connection_strings.items():
print(f"File: {file_path}nConnection String: {conn_string}n")
The area I think I am stuck with here is how to find an actual “connection” keyword in the XML result. However, I get a feeling there’s different flavours of connection strings out there as some Excel files are over 10 years old and they most likely have used legacy ODBC 32 bit DNS strings? Not sure yet hence why I am asking.