I am trying to get a list of file names to sort with a specific column in Excel which has all the file names in it (though in a different order than file explorer has sorted them). The files are a bunch of images that I need to be sorted so that the data in the Excel matches the picture that is displayed. However, Python seems to only sort the first 10 files correctly then begins to, seemingly randomly, rearrange the last 20 or so file names.
As of right now my code is as follows:
import pandas as pd
import os
image_dir = r"C:UsersusernamePycharmProjectsProjectIMAGES"
file_list = os.listdir(image_dir)
imgarr = []
images = [i for i in (os.listdir(image_dir)) if '.png' in i]
for filename in os.listdir(image_dir):
if filename.endswith(".jpg") or filename.endswith(".png"):
imgarr.append(os.path.join(image_dir, filename))
sorted_list = [i for i, i in sorted(zip(df["DESCRIPTION"], file_list))]
Which is giving me this output when I print(sorted_list)
[...'10973406 ^ MRE001.png', '10973401 ^ MRE001.png', '10973402 ^ MRE001.png', '10973403 ^ MRE001.png', '10973429 ^ MRE000.png', '10973427 ^ MRE000.png', '10973405 ^ MRE000.png'...]
However, when printing df["DESCRIPTION"]
, which is what the list needs to become, I get this order:
...
10 10973402 ^ MRE001
11 10973403 ^ MRE001
12 10973404 ^ MRE000
13 10973409 ^ MRE001
14 10973408 ^ MRE001
15 10973401 ^ MRE001
16 10973410 ^ MRE001
17 10973418 ^ MRE001
...
Disclaimer: I have only selected the 10th to 17th item on both lists as the list has over 30 file names and adding all of them would make this thread very long
Is there anything that can make this sort correctly?
7
I think it’s a better shot to inverse the problem and to get the data directly from the dataframe.
list_of_files = df["DESCRIPTION"]
list_of_files = [ i+".png" for i in list_of_files ]