I am trying to format fake data from the Faker API. I am using pandas to store it into a data frame. I have figured out how to strip and replace any errors like no spacing, improper spacing, removing parenthesis, country code, and extensions. The main issue I am having is when numbers are coming out like this: xxxxxx-xxxx
instead of like xxx-xxx-xxxx
.
This is my code:
from faker import Faker
import pandas as pd
fake = Faker()
name = []
address = []
number = []
ssn = []
for i in range(100):
name.append(fake.name())
address.append(fake.address().replace('n', " "))
ssn.append(fake.ssn())
phone_number = fake.phone_number().split('x')[0].replace('.',"-").replace("(",'').replace(')','').replace(' ','')
# How do to deal with the opposite of ^
if phone_number.startswith('+1'):
phone_number = phone_number[2:]
if phone_number.startswith('1'):
phone_number = phone_number[1:]
if phone_number.startswith('001'):
phone_number = phone_number[3:]
phone_number = phone_number.lstrip('-')
if phone_number.isdigit() and len(phone_number) == 12: #10 or 12 work here for some reason, but I get diff. formats
phone_number = f"{phone_number[:3]}-{phone_number[3:6]}-{phone_number[6:]}"
else:
phone_number = phone_number
number.append(phone_number)
df = pd.DataFrame({'Name':name, 'Address':address,'Phone Number':number, 'SSN':ssn})
df
Not too sure where my issue is.
I initially thought it was the length or if it was being read as an int instead of a string, but when I do this:
if phone_number.isdigit() and len(phone_number) == 12:
phone_number = f"{phone_number[:3]}-{phone_number[3:6]}-{phone_number[6:]}"
elif len(phone_number) != 12:
phone_number = f"{phone_number[:3]}-{phone_number[3:6]}{phone_number[6:]}"
else:
phone_number = phone_number
it formats is like xxx-xxxxxxx
.
If I do it this way:
if phone_number.isdigit() and len(phone_number) == 10:
phone_number = f"{phone_number[:3]}-{phone_number[3:6]}-{phone_number[6:]}"
elif len(phone_number) != 10:
phone_number = f"{phone_number[:3]}-{phone_number[3:6]}{phone_number[6:]}"
else:
phone_number = phone_number
then the numbers are formatted like xxx--xxx-xxxx
.
if phone_number.isdigit() and len(phone_number) == 10:
phone_number = f"{phone_number[:3]}-{phone_number[3:6]}-{phone_number[6:]}"
else:
phone_number = phone_number
And if I remove the elif
then I get this output:’xxxxxx-xxxx’.
I am pretty much stuck and not sure what to do.
3
In this code:
if phone_number.startswith('+1'):
phone_number = phone_number[2:]
if phone_number.startswith('1'):
phone_number = phone_number[1:]
if phone_number.startswith('001'):
phone_number = phone_number[3:]
you’re applying each test to the result of the previous assignment of phone_number
. So if the phone number starts with +11
, the first if
block will remove +1
, then the second test will succeed and remove the 1
.
You should use elif
so you’re only testing the original number, not making incremental checks.
if phone_number.startswith('+1'):
phone_number = phone_number[2:]
elif phone_number.startswith('1'):
phone_number = phone_number[1:]
elif phone_number.startswith('001'):
phone_number = phone_number[3:]
Another option is to use a regular expression that removes any of these prefixes:
phone_number = re.sub(r'^(+1|1|001)', '', phone_number)
I haven’t looked through the rest of your code, there may be more problems.
5