I’m working on a Python script to normalize street addresses that include corner street names. I’m using regex patterns to extract the relevant information, but I’m encountering difficulties with certain cases.
For example, I have following addresses like this:
"corner hampstead road and east parkway northfield sa 5085",
"burleigh west shopping centre, corner west burleigh road & reedy creek road, burleigh west qld 4219",
"coburn central shopping centre, corner high & melton streets, melton vic 3337",
"corner armadale road & alex wood drive forrestdale wa 6112",
"corner beach road & hanna road, noarlunga centre, noarlunga sa 5168",
"corner bray street & pacific highway, coffs harbour nsw 2450",
"corner bruce hwy & geaney lane",
"corner bruce hwy & highway plaza, mackay qld 4740",
"shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads",
"corner princes fwy and hughes road",
<code>addresses = [
"corner hampstead road and east parkway northfield sa 5085",
"burleigh west shopping centre, corner west burleigh road & reedy creek road, burleigh west qld 4219",
"coburn central shopping centre, corner high & melton streets, melton vic 3337",
"corner armadale road & alex wood drive forrestdale wa 6112",
"corner beach road & hanna road, noarlunga centre, noarlunga sa 5168",
"corner bray street & pacific highway, coffs harbour nsw 2450",
"corner bruce hwy & geaney lane",
"corner bruce hwy & highway plaza, mackay qld 4740",
"shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads",
"61 sallys corner road",
"corner princes fwy and hughes road",
]
</code>
addresses = [
"corner hampstead road and east parkway northfield sa 5085",
"burleigh west shopping centre, corner west burleigh road & reedy creek road, burleigh west qld 4219",
"coburn central shopping centre, corner high & melton streets, melton vic 3337",
"corner armadale road & alex wood drive forrestdale wa 6112",
"corner beach road & hanna road, noarlunga centre, noarlunga sa 5168",
"corner bray street & pacific highway, coffs harbour nsw 2450",
"corner bruce hwy & geaney lane",
"corner bruce hwy & highway plaza, mackay qld 4740",
"shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads",
"61 sallys corner road",
"corner princes fwy and hughes road",
]
I want to normalize it to the following format:
<code>corner hampstead road and east parkway northfield sa 5085 -> {'street_number': 'cnr', 'street_name1': 'hampstead', 'street_type1': 'road', 'street_name2': 'east', 'street_type2': 'parkway'}
burleigh west shopping centre, corner west burleigh road & reedy creek road, burleigh west qld 4219 -> {'street_number': 'cnr', 'street_name1': 'west burleigh', 'street_type1': 'road', 'street_name2': 'reedy creek', 'street_type2': 'road'}
coburn central shopping centre, corner high & melton streets, melton vic 3337 -> {'street_number': 'cnr', 'street_name1': 'high', 'street_type1': 'street', 'street_name2': 'melton', 'street_type2': 'street'}
corner armadale road & alex wood drive forrestdale wa 6112 -> {'street_number': 'cnr', 'street_name1': 'armadale', 'street_type1': 'road', 'street_name2': 'alex wood', 'street_type2': 'drive'}
corner beach road & hanna road, noarlunga centre, noarlunga sa 5168 -> {'street_number': 'cnr', 'street_name1': 'beach', 'street_type1': 'road', 'street_name2': 'hanna', 'street_type2': 'road'}
corner bray street & pacific highway, coffs harbour nsw 2450 -> {'street_number': 'cnr', 'street_name1': 'bray', 'street_type1': 'street', 'street_name2': 'pacific', 'street_type2': 'highway'}
corner bruce hwy & geaney lane -> {'street_number': 'cnr', 'street_name1': 'bruce', 'street_type1': 'hwy', 'street_name2': 'geaney', 'street_type2': 'lane'}
corner bruce hwy & highway plaza, mackay qld 4740 -> {'street_number': 'cnr', 'street_name1': 'bruce', 'street_type1': 'hwy', 'street_name2': 'plaza', 'street_type2': 'highway'}
shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads -> {'street_number': 'cnr', 'street_name1': 'broardbeach nerang', 'street_type1': 'roads', 'street_name2': 'chisholm', 'street_type2': 'roads'}
61 sallys corner road -> {'street_number': '61', 'street_name1': 'sallys', 'street_type1': 'roads', 'street_name2': 'N/A', 'street_type2': 'N/A'}
corner princes fwy and hughes road-> {'street_number': 'cnr', 'street_name1': 'princes', 'street_type1': 'fwy', 'street_name2': 'hughes', 'street_type2': 'roads'}
<code>corner hampstead road and east parkway northfield sa 5085 -> {'street_number': 'cnr', 'street_name1': 'hampstead', 'street_type1': 'road', 'street_name2': 'east', 'street_type2': 'parkway'}
burleigh west shopping centre, corner west burleigh road & reedy creek road, burleigh west qld 4219 -> {'street_number': 'cnr', 'street_name1': 'west burleigh', 'street_type1': 'road', 'street_name2': 'reedy creek', 'street_type2': 'road'}
coburn central shopping centre, corner high & melton streets, melton vic 3337 -> {'street_number': 'cnr', 'street_name1': 'high', 'street_type1': 'street', 'street_name2': 'melton', 'street_type2': 'street'}
corner armadale road & alex wood drive forrestdale wa 6112 -> {'street_number': 'cnr', 'street_name1': 'armadale', 'street_type1': 'road', 'street_name2': 'alex wood', 'street_type2': 'drive'}
corner beach road & hanna road, noarlunga centre, noarlunga sa 5168 -> {'street_number': 'cnr', 'street_name1': 'beach', 'street_type1': 'road', 'street_name2': 'hanna', 'street_type2': 'road'}
corner bray street & pacific highway, coffs harbour nsw 2450 -> {'street_number': 'cnr', 'street_name1': 'bray', 'street_type1': 'street', 'street_name2': 'pacific', 'street_type2': 'highway'}
corner bruce hwy & geaney lane -> {'street_number': 'cnr', 'street_name1': 'bruce', 'street_type1': 'hwy', 'street_name2': 'geaney', 'street_type2': 'lane'}
corner bruce hwy & highway plaza, mackay qld 4740 -> {'street_number': 'cnr', 'street_name1': 'bruce', 'street_type1': 'hwy', 'street_name2': 'plaza', 'street_type2': 'highway'}
shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads -> {'street_number': 'cnr', 'street_name1': 'broardbeach nerang', 'street_type1': 'roads', 'street_name2': 'chisholm', 'street_type2': 'roads'}
61 sallys corner road -> {'street_number': '61', 'street_name1': 'sallys', 'street_type1': 'roads', 'street_name2': 'N/A', 'street_type2': 'N/A'}
corner princes fwy and hughes road-> {'street_number': 'cnr', 'street_name1': 'princes', 'street_type1': 'fwy', 'street_name2': 'hughes', 'street_type2': 'roads'}
</code>
corner hampstead road and east parkway northfield sa 5085 -> {'street_number': 'cnr', 'street_name1': 'hampstead', 'street_type1': 'road', 'street_name2': 'east', 'street_type2': 'parkway'}
burleigh west shopping centre, corner west burleigh road & reedy creek road, burleigh west qld 4219 -> {'street_number': 'cnr', 'street_name1': 'west burleigh', 'street_type1': 'road', 'street_name2': 'reedy creek', 'street_type2': 'road'}
coburn central shopping centre, corner high & melton streets, melton vic 3337 -> {'street_number': 'cnr', 'street_name1': 'high', 'street_type1': 'street', 'street_name2': 'melton', 'street_type2': 'street'}
corner armadale road & alex wood drive forrestdale wa 6112 -> {'street_number': 'cnr', 'street_name1': 'armadale', 'street_type1': 'road', 'street_name2': 'alex wood', 'street_type2': 'drive'}
corner beach road & hanna road, noarlunga centre, noarlunga sa 5168 -> {'street_number': 'cnr', 'street_name1': 'beach', 'street_type1': 'road', 'street_name2': 'hanna', 'street_type2': 'road'}
corner bray street & pacific highway, coffs harbour nsw 2450 -> {'street_number': 'cnr', 'street_name1': 'bray', 'street_type1': 'street', 'street_name2': 'pacific', 'street_type2': 'highway'}
corner bruce hwy & geaney lane -> {'street_number': 'cnr', 'street_name1': 'bruce', 'street_type1': 'hwy', 'street_name2': 'geaney', 'street_type2': 'lane'}
corner bruce hwy & highway plaza, mackay qld 4740 -> {'street_number': 'cnr', 'street_name1': 'bruce', 'street_type1': 'hwy', 'street_name2': 'plaza', 'street_type2': 'highway'}
shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads -> {'street_number': 'cnr', 'street_name1': 'broardbeach nerang', 'street_type1': 'roads', 'street_name2': 'chisholm', 'street_type2': 'roads'}
61 sallys corner road -> {'street_number': '61', 'street_name1': 'sallys', 'street_type1': 'roads', 'street_name2': 'N/A', 'street_type2': 'N/A'}
corner princes fwy and hughes road-> {'street_number': 'cnr', 'street_name1': 'princes', 'street_type1': 'fwy', 'street_name2': 'hughes', 'street_type2': 'roads'}
I’ve tried using regex patterns like b(?:cn?r|d+-d+|d+)b
to match street numbers and r'b(?:corner|cn?r|shop|road|drive|highway|street|hwy|lane|plaza|roads|avenue|centre|center|place|square|boulevard|way|trailer park|terrace|crescent|roadhouse|crossing|wharf|court|fwy)b'
to match street names, but I’m not getting the desired results.
Here is my current code:
def parse_australian_address(address):
# Normalize common terms for corner addresses
address = re.sub(r"b[Cc]ornerb", "cnr", address)
# Regex pattern to match the required address components
(?P<street_number>cnr|d+)s+
(?P<street_name1>(?:bw+bs?)+?)s+
(?P<street_type1>road|street|avenue|drive|lane|highway|hwy|fwy|parkway|roadways|streets|plaza)s+
(?P<street_name2>(?:bw+bs?)+?)s+
(?P<street_type2>road|street|avenue|drive|lane|highway|hwy|fwy|parkway|roadways|streets|plaza)
re.VERBOSE | re.IGNORECASE,
match = pattern.search(address)
# If no match found, attempt to parse without the second street
single_street_pattern = re.compile(
(?P<street_name1>(?:bw+bs?)+?)s+
(?P<street_type1>road|street|avenue|drive|lane|highway|hwy|fwy|parkway|roadways|streets|plaza)
re.VERBOSE | re.IGNORECASE,
single_match = single_street_pattern.search(address)
"street_number": single_match.group("street_number"),
"street_name1": single_match.group("street_name1").strip(),
"street_type1": single_match.group("street_type1").strip(),
# Extract matched components
"street_number": match.group("street_number").lower(),
"street_name1": match.group("street_name1").strip(),
"street_type1": match.group("street_type1").strip(),
"street_name2": match.group("street_name2").strip(),
"street_type2": match.group("street_type2").strip(),
"corner hampstead road and east parkway northfield sa 5085",
"burleigh west shopping centre, corner west burleigh road & reedy creek road, burleigh west qld 4219",
"coburn central shopping centre, corner high & melton streets, melton vic 3337",
"corner armadale road & alex wood drive forrestdale wa 6112",
"corner beach road & hanna road, noarlunga centre, noarlunga sa 5168",
"corner bray street & pacific highway, coffs harbour nsw 2450",
"corner bruce hwy & geaney lane",
"corner bruce hwy & highway plaza, mackay qld 4740",
"shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads",
"corner princes fwy and hughes road",
for address in addresses:
print(parse_australian_address(address))
<code>import re
def parse_australian_address(address):
# Normalize common terms for corner addresses
address = re.sub(r"b[Cc]ornerb", "cnr", address)
# Regex pattern to match the required address components
pattern = re.compile(
r"""
(?:.*?bcnrbs)?
(?P<street_number>cnr|d+)s+
(?P<street_name1>(?:bw+bs?)+?)s+
(?P<street_type1>road|street|avenue|drive|lane|highway|hwy|fwy|parkway|roadways|streets|plaza)s+
(?:and|&|near|at)s+
(?P<street_name2>(?:bw+bs?)+?)s+
(?P<street_type2>road|street|avenue|drive|lane|highway|hwy|fwy|parkway|roadways|streets|plaza)
""",
re.VERBOSE | re.IGNORECASE,
)
match = pattern.search(address)
if not match:
# If no match found, attempt to parse without the second street
single_street_pattern = re.compile(
r"""
(?P<street_number>d+)s+
(?P<street_name1>(?:bw+bs?)+?)s+
(?P<street_type1>road|street|avenue|drive|lane|highway|hwy|fwy|parkway|roadways|streets|plaza)
""",
re.VERBOSE | re.IGNORECASE,
)
single_match = single_street_pattern.search(address)
if not single_match:
return None
result = {
"street_number": single_match.group("street_number"),
"street_name1": single_match.group("street_name1").strip(),
"street_type1": single_match.group("street_type1").strip(),
"street_name2": "N/A",
"street_type2": "N/A",
}
return result
# Extract matched components
result = {
"street_number": match.group("street_number").lower(),
"street_name1": match.group("street_name1").strip(),
"street_type1": match.group("street_type1").strip(),
"street_name2": match.group("street_name2").strip(),
"street_type2": match.group("street_type2").strip(),
}
return result
addresses = [
"corner hampstead road and east parkway northfield sa 5085",
"burleigh west shopping centre, corner west burleigh road & reedy creek road, burleigh west qld 4219",
"coburn central shopping centre, corner high & melton streets, melton vic 3337",
"corner armadale road & alex wood drive forrestdale wa 6112",
"corner beach road & hanna road, noarlunga centre, noarlunga sa 5168",
"corner bray street & pacific highway, coffs harbour nsw 2450",
"corner bruce hwy & geaney lane",
"corner bruce hwy & highway plaza, mackay qld 4740",
"shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads",
"61 sallys corner road",
"corner princes fwy and hughes road",
]
for address in addresses:
print(parse_australian_address(address))
</code>
import re
def parse_australian_address(address):
# Normalize common terms for corner addresses
address = re.sub(r"b[Cc]ornerb", "cnr", address)
# Regex pattern to match the required address components
pattern = re.compile(
r"""
(?:.*?bcnrbs)?
(?P<street_number>cnr|d+)s+
(?P<street_name1>(?:bw+bs?)+?)s+
(?P<street_type1>road|street|avenue|drive|lane|highway|hwy|fwy|parkway|roadways|streets|plaza)s+
(?:and|&|near|at)s+
(?P<street_name2>(?:bw+bs?)+?)s+
(?P<street_type2>road|street|avenue|drive|lane|highway|hwy|fwy|parkway|roadways|streets|plaza)
""",
re.VERBOSE | re.IGNORECASE,
)
match = pattern.search(address)
if not match:
# If no match found, attempt to parse without the second street
single_street_pattern = re.compile(
r"""
(?P<street_number>d+)s+
(?P<street_name1>(?:bw+bs?)+?)s+
(?P<street_type1>road|street|avenue|drive|lane|highway|hwy|fwy|parkway|roadways|streets|plaza)
""",
re.VERBOSE | re.IGNORECASE,
)
single_match = single_street_pattern.search(address)
if not single_match:
return None
result = {
"street_number": single_match.group("street_number"),
"street_name1": single_match.group("street_name1").strip(),
"street_type1": single_match.group("street_type1").strip(),
"street_name2": "N/A",
"street_type2": "N/A",
}
return result
# Extract matched components
result = {
"street_number": match.group("street_number").lower(),
"street_name1": match.group("street_name1").strip(),
"street_type1": match.group("street_type1").strip(),
"street_name2": match.group("street_name2").strip(),
"street_type2": match.group("street_type2").strip(),
}
return result
addresses = [
"corner hampstead road and east parkway northfield sa 5085",
"burleigh west shopping centre, corner west burleigh road & reedy creek road, burleigh west qld 4219",
"coburn central shopping centre, corner high & melton streets, melton vic 3337",
"corner armadale road & alex wood drive forrestdale wa 6112",
"corner beach road & hanna road, noarlunga centre, noarlunga sa 5168",
"corner bray street & pacific highway, coffs harbour nsw 2450",
"corner bruce hwy & geaney lane",
"corner bruce hwy & highway plaza, mackay qld 4740",
"shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads",
"61 sallys corner road",
"corner princes fwy and hughes road",
]
for address in addresses:
print(parse_australian_address(address))
Basically the current code covering most of the addresses except 2 addresses which is:
<code>"coburn central shopping centre, corner high & melton streets, melton vic 3337","shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads"
<code>"coburn central shopping centre, corner high & melton streets, melton vic 3337","shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads"
</code>
"coburn central shopping centre, corner high & melton streets, melton vic 3337","shop 2 7 eleven service station. corner broardbeach nerang & chisholm roads"
Can someone please help me with the correct regex pattern and logic to extract the street names and types in this format?
Thank you!