In python how can I scramble a files name and content and why did my attempt produce weird results?

https://stackoverflow.com/questions/12452424

02-07-2021
|

Question

I am trying to make a script that can scramble a folders files and the files content on a windows machine.

This was my first attempt at trying to scramble file names in a folder. I know performance wise its probably terrible and it looks pathetic but I'm new and trying to teach it to myself.

import os
import sys
import re
root = 'C:/Users/Any/Desktop/test'

for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' A', ' ಌ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' B', ' ௷'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' C', ' അ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' D', 'ጯ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' E', 'ᚙ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' F', ' ᚘ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' G', ' ௲ '))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' H', ' ණ '))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' I', ' ┩'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' J', ' ວ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' K', ' ʥ '))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' L', ' ቄ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' M', ' ఈ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' N', '㏁'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' O', ' Ꮄ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' P', '♙'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' Q', ' Ꮬ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' R', ' ꡤ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' S', ' ⏎'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' T', ' ௷'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' U', ' ヌ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' V', ' ஹ '))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' W', '  ̉'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' X', ' ฟ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' Y', ' ॢ'))
for item in os.listdir(root):
    fullpath = os.path.join(root, item)
    os.rename(fullpath, fullpath.replace(' Z', ' ╔'))

The folder content before running the script was:

FILENAMEABCDEFGHIJKLMNOPQRSTUVWSTXYZ.docx
TEST PICTURE.jpg
TEST SCRIPT.bat
TEST TEXT.txt

After running the script:

FILENAMEABCDEFGHIJKLMNOPQRSTUVWSTXYZ.docx
TEST à¯·EXT.txt
TEST âŽCRIPT.bat
TESTâ™™ICTURE.jpg

So what the heck happened? It was suppose to be so simple how could it produce results like this? What should I do to try to make a scrambling script, it doesn't have to be advance cause I want to understand it.

Solution

There are several problems with your approach.

Each search string starts with a space, so it will only replace a space and the character immediately following it.
Your replacement characters are unicode literals, but you haven't specified an encoding in your script (or used unicode literals). The result will likely be UTF-8 encoded bytes from your text editor interpreted as latin-1 by Python and sent to the OS as Unicode codepoints -- i.e. mojibake.
You're using a hugely inefficient method of performing the replacements. Use the .translate method of strings and pass in a mapping table of characters to Unicode replacements; then you only have to loop over your files once, and perform the translation using an efficient lookup instead of a lengthy series of replaces. Any time you find yourself needing to copy-paste a piece of code 3 or more times, ask yourself if a loop or some other technique might work better -- there's never any good reason to repeat yourself 26 times.
You import re but don't actually use it.

Here's what I'd write the code as, taking into account all of the notes above:

import os

# unicode.translate translates *code points* to unicode literals,
# so we apply ord to the letters to get code points
# We also specify our Unicode literals using escape notation to avoid encoding issues.
TRANSTABLE = {
    ord(u'A'): u'\u0123',
    ord(u'B'): u'\u2931',
    # etc
}

# Unicode literal so that os.listdir produces Unicode filenames
# Raw (r) literal so that backslashes are interpreted literally
ROOT = ur'C:\Users\Any\Desktop\test'

for filename in os.listdir(ROOT):
    newname = filename.translate(TRANSTABLE)
    # Don't translate ROOT (avoids translating e.g. the C in C:\)
    os.rename(os.path.join(ROOT, filename), os.path.join(ROOT, newname))

OTHER TIPS

Each of your search and replacement strings has a blank in front of it, so it will only match the first letter after a space.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow