How to swap the columns according to the data type?

The sample data is as below (The data is fake, not real data):

Key	Death indicator	Date Death	Exact date of death	Death Cause
00	Alive
02	Death hos	Y	25/9/2011	N00
03	Alive
09	Death hos	Y	J189	28/8/2015
07	Death nonhos	12/6/2018	Y	C20

From the table, you can see the types of data within the same columns are not consistent. Date Death should be in date format; Exact date of death should only contain Y, N or blank; Death Cause should be in string (i.e. ICD code).

I forgot to mention one important thing, the date format may not be consistent, e.g. ’01-05-2010′,’01 May 2010′ can also appear in the date columns.
I tried to perform some basic data cleaning:

Python:

import pandas as pd

death_y_n = death['Date Death'][pd.to_datetime(death['Date Death'], 
                                                                 format='%d/%m/%Y',
                                                                 errors = 'coerce')
                                                                 .isnull()]

death_disease_case = death['Exact date of death'][~((death['Exact date of death'].isin(['Y','N']))
                                                      |(death['Exact date of death'].isnull()))]

death['Death Cause'][~pd.to_datetime(
                                      death['Death Cause'], 
                                      format='%d/%m/%Y', errors = 'coerce')
                                      .isnull()] = 
                                      death_disease_case

death['Date Death'][pd.to_datetime(
                                      death['Date Death'], 
                                      format='%d/%m/%Y', errors = 'coerce')
                                      .isnull()] = 
                                      death_to_date[pd.to_datetime(
                                                                   death['Date Death'], 
                                                                   format='%d/%m/%Y', errors = 'coerce')
                                                                   .isnull()]

death['Exact date of death'][~death['Exact date of death'].isin(['Y','N'])] = 
                                  death_y_n[~death['Exact date of death'].isin(['Y','N'])]

death['Death Cause'][pd.to_datetime(
                                      death['Date Death'], 
                                      format='%d/%m/%Y', errors = 'coerce')
                                      .isnull()] = 
                                       death_y_n[pd.to_datetime(
                                      death['Date Death'], 
                                      format='mixed', errors = 'coerce')
                                      .isnull()]

library(tidyverse)
library(magrittr)
library(anytime)
library(Hmisc)

death_to_date = anytime(death$`Death Cause`) %>% as.character

death_y_n = death$`Date Death`[is.na(as_date(death$`Date Death`))]

death_disease_case = death$`Exact date of death`[death$`Exact date of death` %nin% c('Y','N')]

death$`Death Cause`[!is.na(as_date(death$`Death Cause` ))] = death_disease_case[!is.na(as_date(death$`Death Cause` ))]

death$`Date of Registered Death`[is.na(as_date(death$`Date Death`))] = death_to_date[is.na(as_date(death$`Date Death`))]

death$`Exact date of death`[death$`Exact date of death` %nin% c('Y','N')] = death_y_n[death$`Exact date of death` %nin% c('Y','N')]

However, due to multiple formats of date, some date formats cannot be parsed successfully. Is there a method to swap the columns without using to_datetime()/anytime()?

I am new to Python, if there are any mistakes I made, please point them out! Thank you.

Updated:

My python solution:

import pandas as pd

#for death date variable save as exact date of death:'Y'/'N'
death_to_date_index_exact = (death['Date Death'].isin(['Y','N']))
death_to_date_exact = death['Date Death'][death_to_date_index_exact]

#for death cause variable save as date of death
death_cause_index_date = (~death['Death Cause'].str.contains('^[A-Za-z].*[0-9]$',na=True))
death_cause_date = death['Death Cause'][death_cause_index_date]

#for exact date of death variable save as death cause
death_exact_index_cause = (death['Exact date of death'].str.contains('^[A-Za-z].*[0-9]$',na=False))
death_exact_cause = death['Exact date of death'][death_exact_index_cause]

death['Date Death'][death_cause_index_date] = death_cause_date
death['Exact date of death'][death_to_date_index_exact] = death_to_date_exact
death['Death Cause'][death_exact_index_cause] = death_exact_cause

#Convert the date in death cause into empty
death['Death Cause'][~death['Death Cause'].str.contains('^[A-Za-z].*[0-9]$',na=True)] = np.nan

There’s a few different ways you can do this. The simplest way to me, which doesn’t require creating a new vector three times, is to create a named list in R, then unnest it wider, as follows:

library(tidyverse)
  
df |>
  mutate(sorted = pmap(list(Date.Death, Exact.date.of.death, Death.Cause), (...) {out <- list()
  for (x in c(...)) { # for each row of the three columns above
    if (x %in% c("Y", "N", NA)) { # we do this one first, otherwise the NAs throw things off
      out$Exact.date.of.death <- x 
    } else if (grepl("/", x)) {
      out$Date.Death <- x
    } else {
      out$Death.Cause <- x
    }
  }
  out
  }), .keep = "unused") |> # drop the columns we used (i.e. Date.Death, Exact.date.of.death, Death.Cause)
  unnest_wider(sorted) # then unnest the named list column, making new versions of them!

Output:

# A tibble: 5 × 5
  Key   Death.indicator Exact.date.of.death Date.Death Death.Cause
  <chr> <chr>           <chr>               <chr>      <chr>      
1 Alive NA              NA                  NA         NA         
2 Death hos             Y                   25/9/2011  N00        
3 Alive NA              NA                  NA         NA         
4 Death hos             Y                   28/8/2015  J189       
5 Death nonhos          Y                   12/6/2018  C20

Data:

df <- read.table(text="Key  Death.indicator     Date.Death  Exact.date.of.death     Death.Cause
00  Alive NA NA NA NA
02  Death hos   Y   25/9/2011   N00
03  Alive NA NA NA NA       
09  Death hos   Y   J189    28/8/2015
07  Death nonhos    12/6/2018   Y   C20", header=TRUE)

Notes:

You’ve said you’ve had trouble parsing your dates. It would be helpful if you could give some examples of the dates you are struggling to parse.
Your data still needs some cleaning – for instance, "Y" and "N" should be TRUE and FALSE. Also if the data is death data, then it would make sense to remove the Alive ones, they add nothing. Also, blanks should NA.

Dates are tricky, but you might want to check out the lubridate package. What errors does the R code throw? If you’re getting stopped by errors I would suggest starting the dates out as a character column, getting all the data in the same column, then parse the NA values. Once you have everything in the correct format, then coerce the column to date format. Since the character column will accept your changes, you’ll be able to do whatever cleaning you need to do. The problem with coercing to date format on the first step is that often these generic conversion functions like as.date() are very picky. If any one value is in the wrong format, the whole thing will fail and throw an error. So if you’re getting caught up in that, try doing that conversion with anytime in the last step and see if you don’t have better results.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: Kiến thức lập trình - @ 09:20

Thẻ: pythonrpandasdataframedplyr

Thiết kế website giá rẻ

Danh mục

How to swap the columns according to the data type?