Question

I am trying to rename files so that they contain an ID followed by a -(int). The files generally come to me in this way but sometimes they come as 1234567-1(crop to bottom).jpg.

I have been trying to use the following code but my regular expression doesn't seem to be having any effect. The reason for the walk is because we have to handles large directory trees with many images.

def fix_length():
    for root, dirs, files in os.walk(path):
        for fn in files:
            path2 = os.path.join(root, fn)
            filename_zero, extension = os.path.splitext(fn)
            re.sub("[^0-9][-]", "", filename_zero)
            os.rename(path2, filename_zero + extension) 

fix_length()

I have inserted print statements for filename_zero before and after the re.sub line and I am getting the same result (i.e. 1234567-1(crop to bottom) not what I wanted)

This raises an exception as the rename is trying to create a file that already exists.

I thought perhaps adding the [-] in the regex was the issue but removing it and running again I would then expect 12345671.jpg but this doesn't work either. My regex is failing me or I have failed the regex.

Any insight would be greatly appreciated.

As a follow up, I have taken all the wonderful help and settled on a solution to my specific problem.

path = 'C:\Archive'
errors = 'C:\Test\errors'
num_files = []

def best_sol():
    num_files = []
    for root, dirs, files in os.walk(path):
        for fn in files:
            filename_zero, extension = os.path.splitext(fn)
            path2 = os.path.join(root, fn)
            ID = re.match('^\d{1,10}', fn).group()
            if len(ID) <= 7:
                if ID not in num_files:
                    num_files = []
                    num_files.append(ID)
                    suffix = str(len(num_files))
                    os.rename(path2, os.path.join(root, ID + '-' + suffix + extension)) 
                else:
                    num_files.append(ID)
                    suffix = str(len(num_files))
                    os.rename(path2, os.path.join( root, ID + '-' + suffix +extension)) 
            else:
                shutil.copy(path2, errors)
                os.remove(path2)

This code creates an ID based upon (up to) the first 10 numeric characters in the filename. I then use lists that store the instances of this ID and use the, length of the list append a suffix. The first file will have a -1, second a -2 etc...

I am only interested (or they should only be) in ID's with a length of 7 but allow to read up to 10 to allow for human error in labelling. All files with ID longer than 7 are moved to a folder where we can investigate.

Thanks for pointing me in the right direction.

Was it helpful?

Solution

re.sub() returns the altered string, but you ignore the return value.

You want to re-assign the result to filename_zero:

filename_zero = re.sub("[^\d-]", "", filename_zero)

I've corrected your regular expression as well; this removes anything that is not a digit or a dash from the base filename:

>>> re.sub(r'[^\d-]', '', '1234567-1(crop to bottom)')
'1234567-1'

Remember, strings are immutable, you cannot alter them in-place.

If all you want is the leading digits, plus optional dash-digit suffix, select the characters to be kept, rather than removing what you don't want:

filename_zero = re.match(r'^\d+(?:-\d)?', filename_zero).group()

OTHER TIPS

new_filename = re.sub(r'^([0-9]+)-([0-9]+)', r'\g1-\g2', filename_zero)

Try using this regular expression instead, I hope this is how regular expressions work in Python, I don't use it often. You also appear to have forgotten to assign the value returned by the re.sub call to the filename_zero variable.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top