Question

I have a Perl application that parses MediaWiki SQL tables and displays data from multiple wiki pages. I need to be able to re-create the absolute image path to display the images, eg: .../f/fc/Herbs.jpg/300px-Herbs.jpg

From MediaWiki Manual:

Image_Authorisation: "the [image] path can be calculated easily from the file name and..."

How is the path calculated?

Was it helpful?

Solution

One possible way would be to calculate the MD5 signature of the file (or the file ID in a database), and then build/find the path based on that.

For example, say we get an MD5 signature like "1ff8a7b5dc7a7d1f0ed65aaa29c04b1e"

The path might look like "/1f/f" or "/1f/ff/8a"

The reason is that you don't want to have all the files in 1 folder, and you want to have the ability to "partition" them across different servers, or a SAN or whatever in an equally-spread-out way.

The MD5 signature is a string of 16 "hex" characters. So our example of "/1f/ff/8a" gives us 256*256*256 folders to store the files in. That ought to be enough for anybody :)


Update, due to popular demand:

NOTE - I just realized we are talking specifically about how MediaWiki does it. This is not now MediaWiki does it, but another way in which it could have been done.

By "MD5 signature" I mean doing something like this (code examples in Perl):

use Digest::MD5 'md5_hex';
my $sig = md5_hex( $file->id );

$sig is now 32 alpha-numeric characters long: "1ff8a7b5dc7a7d1f0ed65aaa29c04b1e"

Then build a folder structure like this:

my $path = '/usr/local/media';
map { mkdir($path, 0666); $path .= "/$_" } $sig =~ m/^(..)(..)(..)/;
open my $ofh, '>', "$path/$sig"
  or die "Cannot open '$path/$sig' for writing: $!";
print $ofh "File contents";
close($ofh);

Folder structure looks like

/
  usr/
    local/
      media/
        1f/
          f8/
            a7/
              1ff8a7b5dc7a7d1f0ed65aaa29c04b1e

OTHER TIPS

The accepted answer is incorrect:

  • The MD5 sum of a string is 32 hex characters (128 bits), not 16
  • The file path is calculated from the MD5 sum of the filename, not the contents of the file itself
  • The first directory in the path is the first character, and the second directory is the first and second characters. The directory path is not a combination of the first 3 or 6 characters.

The MD5 sum of 'Herbs.jpg' is fceaa5e7250d5036ad8cede5ce7d32d6. The first 2 characters are 'fc', giving the file path f/fc/, which is what is given in the example.

In PHP you can call the following function to get the URL. You may want to look at the php code to figure out how they calculate the path.

$url = wfFindFile(Title::makeTitle(NS_IMAGE, $fileName))->getURL();

I created a small Bash script called reorder.sh which moves files from inside "images" to the specific sub folders:

#!/bin/bash

cd /opt/mediawiki/mediawiki-cur/images

for i in `find -maxdepth 1 -type f ! -name .htaccess ! -name README ! -name reorder.sh -printf '%f\n'`; do
    path1=$(echo -n $i | md5sum | head -c1)    &&
    path2=$(echo -n $i | md5sum | head -c2)    &&
    mkdir -p $path1/$path2/                    &&
    mv $i $path1/$path2/;
done
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top