Let’s say I have a dataframe like this:
import pandas as pd
import numpy as np
df = pd.DataFrame(
[[1, np.nan, 2],
[2, 3, 'I'],
[3, 4, 'II']],
columns=['A', 'B', 'C'])
df
how do I identify all the strings exclude null values and if it’s a roman number, convert it into the integer correspond to this roman number?
I tried this to locate the coordination of string but it includes the null values.
res = np.argwhere(df_pre.values.astype('str'))
1
Mixed dtypes will result in an 'object'
column, so the first step is to filter on that, then you can just apply a function that checks the type of the value.
I’m going to ignore the requirement to convert Roman numerals to int; that’s covered elsewhere, e.g. Converting Roman Numerals to integers in python. In place of a function that does that, I’ll use len
, which coincidentally gets the right result.
df1 = df.select_dtypes('object').applymap(
lambda x: len(x) if isinstance(x, str) else x)
df1
C
0 2
1 1
2 2
You can then insert the columns back, e.g.
df.assign(**df1)
A B C
0 1 NaN 2
1 2 3.0 1
2 3 4.0 2
You could map
type
onto the values and compare to str
:
df.map(type) == str
Output:
A B C
0 False False False
1 False False True
2 False False True
To get the integer indexes:
np.argwhere(df.map(type) == str)
Output:
array([[1, 2],
[2, 2]], dtype=int64)
As for converting those strings to numbers, there are many good solutions here.
0
I would do the map and would use the roman
library.
The simple code would be:
import roman
converted_df = df.map(lambda x: roman.fromRoman(x) if type(x) == str else x)
Using lambda function iterates only if the type of the value within the cell is str
. The function to use is fromRoman
.
You can install roman
with pip or your favorite tool.
1
I solved it by doing this:
First step, creating a function that takes in a roman numeral as a string and makes it an integer.
Second step: trying to apply romanToInt
to all possible values in the dataframe
# Define the function to convert Roman numerals to integers
def romanToInt(s):
values = {'I': 1, 'V': 5, 'X': 10, 'L': 50, 'C': 100, 'D': 500, 'M': 1000}
result = 0
for i in range(len(s)):
if i + 1 < len(s) and values[s[i]] < values[s[i + 1]]:
result -= values[s[i]]
else:
result += values[s[i]]
return result
# Apply a function element-wise to the entire DataFrame
def convert_value(value):
if pd.notnull(value) and isinstance(value, str):
try:
# Try to make the string an integer
return romanToInt(value)
except KeyError:
pass
return value
df = df.map(convert_value)
print(df)
A B C
0 1 NaN 2
1 2 3.0 1
2 3 4.0 2
2