My dataframe looks something like this, only much larger.
d = {'Col_1' : pd.Series(['A', 'B']),
'Col_2' : pd.Series(['B', 'A', 'C']),
'Col_3' : pd.Series(['B', 'A']),
'Col_4' : pd.Series(['C', 'A', 'B', 'D']),
'Col_5' : pd.Series(['A', 'C']),}
df = pd.DataFrame(d)
Col_1 Col_2 Col_3 Col_4 Col_5
A B B C A
B A A A C
NaN C NaN B NaN
NaN NaN NaN D NaN
First, I’m trying to sort each column individually. I’ve tried playing around with something like: df.sort([lambda x: x in df.columns], axis=1, ascending=True, inplace=True)
however have only ended up with errors. How do I sort each column individually to end up with something like:
Col_1 Col_2 Col_3 Col_4 Col_5
A A A A A
B B B B C
NaN C NaN C NaN
NaN NaN NaN D NaN
Second, I’m looking to concatenate the rows within the columns
df = pd.concat([df,pd.DataFrame(df.sum(axis=0),columns=['Concatenation']).T])
I can combine everything with the line above after replacing np.nan with ”, but the result comes out smashed (‘AB’) together and would require an additional step to clean (into something like ‘A:B’).
pandas.Series.order
is deprecated since pandas=0.17. Instead, use sort_values
as follows:
for col in df:
df[col] = df[col].sort_values(ignore_index=True)
Here is one way:
>>> pandas.concat([df[col].order().reset_index(drop=True) for col in df], axis=1, ignore_index=True)
11: 0 1 2 3 4
0 A A A A A
1 B B B B C
2 NaN C NaN C NaN
3 NaN NaN NaN D NaN
[4 rows x 5 columns]
However, what you’re doing is somewhat strange. DataFrames aren’t just collections of unrelated columns. In a DataFrame, each row represents a record, so the value in one column is semantically linked to the values in other columns in that same row. By sorting the columns independently, you’re discarding this information, so the rows are now meaningless. That’s why the reset_index
is needed in my example. Also, because of this, there’s no way to do this in-place, which your example suggests you want.
2
If your data were numeric you could use Numpy’s sort function:
df[:] = np.sort(df.values)
But this fails in your case because it can’t handle a mixture of floats and strings.
Here’s one workaround:
df[:] = np.sort(df.fillna('xffxffxff').values)
df = df.replace('xffxffxff', np.nan)
I just replace the NaNs with a string of characters with ASCII value 255 so that they will almost certainly be at the end after the sort. Then I replace them with NaNs.
Output:
Col_1 Col_2 Col_3 Col_4 Col_5
0 A A B B C
1 A A A B C
2 B C NaN NaN NaN
3 D NaN NaN NaN NaN
Another solution would be:
df.apply(lambda x: x.sort_values().reset_index(drop=True), axis=0)
(Thanks @BrenBarn for the remark on reset_index
.)
Using @Andy Hayden’s comment about values
instead of reset_index
leads to
df.apply(lambda x: x.sort_values().values, axis=0)
I don’t know if this is any better, but here are a couple other ways to do it.
pd.DataFrame({key: sorted(value.values(), reverse=True)
for key, value in df.to_dict().iteritems()})
pd.DataFrame({key: sorted(values, reverse=True)
for key, values in df.transpose().iterrows()})