i am trying to make a fraud detection model using an imbalanced data set with fraud being so much higher than un fraud the data set is as follow
RangeIndex: 628343 entries, 0 to 628342
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Time 628343 non-null object
1 Card Number 628343 non-null float64
2 merchant 628343 non-null object
3 category 628343 non-null object
4 Amount 628343 non-null float64
5 firstName 628343 non-null object
6 lastName 628343 non-null object
7 is_fraud 628342 non-null float64
i tried to preprocess the data first by dropping the merchant,name , category,trans_num.card num fields and to convert time from this format 1/1/2019 0:00 to hour,second,year,month,day and dropping time then using logistic regression but the accuracy is low any ideas how to preprocess the data and what algorithm to use?