I am running a Python script to take URLs separated by commas and check them against a file named “Properties.csv” to determine if they are in there, and if so, to provide some information; however, if the item is not in the file, then it will use the socket
library to query the information instead.
I can run the script fine to look over the file by itself and pull the information (the if
statement in the script below), but when I introduce the else
statement it will fail out; however, the information in the if
statement is within the file, but I keep getting the “Unable to map” message in my except
in the else
statement.
My input is a list of URLs that look something like this:
url0-function.domain.com,url1.com,url2.subdomain.domain.com,url3.domain.com,url3-function.domain.com,url4.domain.com,url4-function.domain.com,url5.domain.com,url5-function.domain.com
My desired output should be:
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url0-function.domain.com', 'origin_hostname': 'subdomain.function-domain.provider.net', 'origin_ip': '100.10.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url2.domain.com', 'origin_hostname': 'url2.sudbdomain.domain.com', 'origin_ip': '100.20.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'www.domain.com', 'origin_hostname': 'url2.sudbdomain.domain.com', 'origin_ip': '100.20.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url2.domain.com', 'origin_hostname': 'url2.sudbdomain.domain.com', 'origin_ip': '100.20.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'www.domain.com', 'origin_hostname': 'url2.sudbdomain.domain.com', 'origin_ip': '100.20.30.1'}
{'url': 'url3-function.domain.com', 'hostname': 'url3-function.domain.com', 'ipaddr': '100.30.30.1'}
{'url': 'url3-function.domain.com', 'hostname': 'url3-function.domain.com', 'ipaddr': '100.30.30.1'}
{'url': 'url3-function.domain.com', 'hostname': 'url3-function.domain.com', 'ipaddr': '100.30.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4-function.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4-function.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4-function.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4-function.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
Unable to map url5.domain.com moving on...
Unable to map url5-function.domain.com moving on...
However, I am getting the undesired output below when I have the else
statement present, skipping over the entire if
statement even though that the if
statement should be hit first and does have matches:
Unable to map url0-function.domain.com moving on...
{'url': 'url1.com', 'hostname': 'url1.com', 'ipaddr': '100.20.30.1'}
{'url': 'url1.com', 'hostname': 'url1.com', 'ipaddr': '100.20.30.1'}
{'url': 'url2.subdomain.domain.com', 'hostname': 'url2.subdomain.domain.com', 'ipaddr': '99.99.99.99'}
{'url': 'url3.domain.com', 'hostname': 'url3.domain.com', 'ipaddr': '100.30.30.1'}
{'url': 'url3.domain.com', 'hostname': 'url3.domain.com', 'ipaddr': '100.30.30.1'}
{'url': 'url3-function.domain.com', 'hostname': 'url3-function.domain.com', 'ipaddr': '100.30.30.1'}
{'url': 'url3-function.domain.com', 'hostname': 'url3-function.domain.com', 'ipaddr': '100.30.30.1'}
{'url': 'url3-function.domain.com', 'hostname': 'url3-function.domain.com', 'ipaddr': '100.30.30.1'}
{'url': 'url3-function.domain.com', 'hostname': 'url3-function.domain.com', 'ipaddr': '100.30.30.1'}
Unable to map url4.domain.com moving on...
Unable to map url4-function.domain.com moving on...
And when I remove the else
statement, I am getting this undesired output instead since it will review the file as expected and the if
statement will work properly, but the entire other half of the script that I need is now missing since I removed the else
statement:
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url0-function.domain.com', 'origin_hostname': 'subdomain.function-domain.provider.net', 'origin_ip': '100.10.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url2.domain.com', 'origin_hostname': 'url2.sudbdomain.domain.com', 'origin_ip': '100.20.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'www.domain.com', 'origin_hostname': 'url2.sudbdomain.domain.com', 'origin_ip': '100.20.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url2.domain.com', 'origin_hostname': 'url2.sudbdomain.domain.com', 'origin_ip': '100.20.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'www.domain.com', 'origin_hostname': 'url2.sudbdomain.domain.com', 'origin_ip': '100.20.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url3.domain.com', 'origin_hostname': 'url3.sudomain-function.domain.com.provider.net', 'origin_ip': '100.40.20.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4-function.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4-function.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4-function.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
{'property_name': 'SomeInternalConfigRef', 'property_hostname': 'url4-function.domain.com', 'origin_hostname': 'url4.subdomain-function-domain.com.provider.net', 'origin_ip': '100.50.30.1'}
Any advice on how to make this work as intended for my desired output would be appreciated, thank you. Here is my script:
import re
import socket
from urllib.parse import urlparse
#
defined_url = input("What is the URL being queried for the mapping? DO NOT include the sheme (e.g., http://), DO NOT include a port, DO NOT include a path (e.g., /somepath/file.php), DO NOT include parameters (e.g., ?param1=foo¶m2=bar), and DO NOT include an anchor (e.g., #foobarinthisdoc). Enter each URL separated by a comma and do not use any spaces: ")
defined_url_array = []
start_time = datetime.now()
defined_url_array.append(defined_url)
for defined_urls in defined_url_array:
url_array = defined_urls.split(",")
url_array_len = len(url_array)
index = 0
properties_file = r"A:\File\Path\Properties.csv"
#
while index < url_array_len:
#
defined_url_regex = re.compile(r"(?P<url>" + url_array[index] + r")")
with open(properties_file, "r") as file_obj:
for url_match in defined_url_regex.finditer(file_obj.read()):
file_obj.seek(0)
url = url_match.groupdict()
for row_obj in file_obj:
if url["url"] in row_obj:
config_ref = row_obj.split(",")[1].strip()
hostname_ref = row_obj.split(",")[2].strip()
hostname = row_obj.split(",")[3].strip()
ipaddr = row_obj.split(",")[4].strip()
properties_dictionary = {"config_ref":config_ref,"hostname_ref":hostname_ref,"hostname":hostname,"ipaddr":ipaddr}
print(properties_dictionary)
break
else:
hostname_parse = urlparse(url_array[index])
try:
ipaddr = socket.gethostbyname(hostname_parse.path)
mapping_dictionary = {"url":url_array[index],"hostname":hostname_parse.path,"ipaddr":ipaddr}
print(mapping_dictionary)
break
except socket.gaierror:
print("Unable to map", hostname_parse.path,"moving on...")
break
index += 1
else:
print("Script completed.")
print("Script reviewed", url_array_len,"URLs.")