Frage

Hi I recently recovered my site's files after a server crash. I had a wordpress site with too many uploaded image files with Persian/Arabic file-names. The problem is the file names are renamed after the recovery. The names are so weird. I can't identify a specific encoding so that I can revert the original names. It does look like Hex, but it isn't! Can you help me?

Here's an example recovered file name:

#U06a9#U0627#U0646#U06a9#U0633-#U0633#U0627#U0646#U062f#U0648#U06cc#U0686-#U067e#U0646#U0644-150x103.jpg

Thanks.

War es hilfreich?

Lösung

The file name looks like id's of Unicode character. Let's look at the first character.

U06a9

  • U: Unicode character.
  • 06a9: Id of character (hexadecimal.)

In this format the letter a would be U0061

Here is a python script to convert that name to the original.

import sys

obf = sys.argv[1]

name = obf.split("-")[0]

recovered = ""

working = ""

for c in name:

    if c == 'U':

        working = "\u"

        continue

    working += c

    if len(working) == 6:

            recovered += working.decode("unicode-escape")

print recovered + obf.split(".")[len(obf.split(".")) - 1]

Example usage:

python recover.py U06a9#U0627#U0646#U06a9#U0633-#U0633#U0627#U0646#U062f#U0648#U06cc#U0686-#U067e#U0646#U0644-150x103.jpg

Note input can not start with a hash.

Lizenziert unter: CC-BY-SA mit Zuschreibung
Nicht verbunden mit StackOverflow
scroll top