Question

I have a bunch of files with file-names as

companyname-date_somenumber.txt

I have to sort the files according to company name, then according to date, and copy their content in this sorted order to another text file.

Here's the approach I'm trying :

From each file-name, extract company name and then date, put these two fields in a dictionary, append this dictionary to a list and then sort this list according to the two columns of companyname and then date.

Then once I have the sorted order, I think I could search for the files in the folder according to the file-order I just obtained, then copy each files content into a txt file and I'll have my final txt file.

Here's the code I have so far :

myfiles = [ f for f in listdir(path) if isfile(join(path,f)) ]
file_list=[]

for file1 in myfiles:

    # find indices of companyname and date in the file-name
    idx1=file1.index('-',0)
    idx2=file1.index('_',idx1)
    company=file1[0:idx1]  # extract companyname
    thisdate=file1[idx1+1:idx2]  #extract date, which is in format MMDDYY
    dict={}
    # extract month, date and year from thisdate 
    m=thisdate[0:2]
    d=thisdate[2:4]
    y='20'+thisdate[4:6]
    # convert into date object
    mydate = date(int(y), int(m), int(d))
    dict['date']=mydate
    dict['company']=company
    file_list.append(dict)  

I checked the output of file_list at the end of this block of code and I think I have my list of dicts. Now, how do I sort by companyname and then by date? I looked up sorting by multiple keys online but how would I get the increasing order by date?

Is there any other way that I could sort a list by a string and then a date field?

Was it helpful?

Solution

import os
from datetime import datetime

MY_DIR = 'somedirectory'

# my_files = [ f for f in os.listdir(MY_DIR) if os.path.isfile(os.path.join(MY_DIR,f)) ]
my_files = [
    'ABC-031814_01.txt',
    'ABC-031214_02.txt',
    'DEF-010114_03.txt'
]
file_list = []

for file_name in my_files:
    company,_,rhs = file_name.partition('-')
    datestr,_,rhs = rhs.partition('_')
    file_date = datetime.strptime(datestr,'%m%d%y')
    file_list.append(dict(file_date=file_date,file_name=file_name,company=company))

for row in sorted(file_list,key=lambda x: (x.get('company'),x.get('file_date'))):
    print row

The function sorted takes a keyword argument key that is a function applied to each item in the sequence you're sorting. If this function returns a tuple, the sequence will be sorted by the items in the tuple in turn.

Here lambda x: (x.get('company'),x.get('file_date')) allows sorted to sort by company name and then by date.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top