Question

I stumbled across my rather ancient photo objects disks, and sadly found out the company (hemera) doesn't provide support for it anymore. this has left me with a whole pile of .hpi files. Luckily, I found this information on extracting the jpg and png components of the file.

Unfortunately, I haven't been able to get it to work. Can anyone figure out what's wrong with this code? I'd be happy with a PHP or Python solution if Perl isn't your thing. :)

open(I, "$name") || die;
binmode(I);
$_ = <I>;
close(I);

my ($j, $p) = m|^.{32}(.*)(\211PNG.*)$|s;
open(J, ">$name.jpg") &&
    do { binmode(J); print J $j; close J; };
open(P, ">$name.png") &&
    do { binmode(P); print P $p; close P; };

The hexdump of the current test file I snagged off a CD is here, if it helps at all:

0000000 89 48 50 49 0d 0a 1a 0a 64 00 00 00 20 00 00 00
0000010 45 89 00 00 65 89 00 00 0a 21 00 00 00 d0 d0 00
Was it helpful?

Solution

It seems the regexp is wrong. That's why I wrote a little C program to do it for me:

#include <stdio.h>
#include <stdlib.h>

#define MAX_SIZE 1048576

char stuff[MAX_SIZE];

int main (int argc, char **argv)
{
    unsigned int j_off, j_len, p_off, p_len;
    FILE *fp, *jp, *pp;
    fp = fopen (argv[1], "r");
    if (!fp)    goto error;
    if (fseek (fp, 12, SEEK_SET))   goto error;
    if (!fread (&j_off, 4, 1, fp))  goto error;
    if (!fread (&j_len, 4, 1, fp))  goto error;
    if (!fread (&p_off, 4, 1, fp))  goto error;
    if (!fread (&p_len, 4, 1, fp))  goto error;
    fprintf (stderr, "INFO %s \t%d %d %d %d\n",
        argv[1], j_off, j_len, p_off, p_len);
    if (j_len > MAX_SIZE || p_len > MAX_SIZE) {
        fprintf (stderr, "%s: Chunk size too big!\n", argv[1]);
        return EXIT_FAILURE;
    }

    jp = fopen (argv[2], "w");
    if (!jp)    goto error;
    if (fseek (fp, j_off, SEEK_SET))    goto error;
    if (!fread (stuff, j_len, 1, fp))   goto error;
    if (!fwrite (stuff, j_len, 1, jp))  goto error;
    fclose (jp);

    pp = fopen (argv[3], "w");
    if (!pp)    goto error;
    if (fseek (fp, p_off, SEEK_SET))    goto error;
    if (!fread (stuff, p_len, 1, fp))   goto error;
    if (!fwrite (stuff, p_len, 1, pp))  goto error;
    fclose (pp);
    fclose (fp);
    return EXIT_SUCCESS;

error:
    perror (argv[1]);
    return EXIT_FAILURE;
}

It works with the command line parameters input.hpi output.jpg output.png. The error handling is not 100% correct, but it is good enough to always tell you if something's wrong, and most times what it is. For large files, you will have to enlarge MAX_SIZE.

Here is a shell script which you can call with *.hpi:

#!/bin/bash

dest=<destination-folder>

for arg in "$@"
do
  base=`echo $arg | cut -d'.' -f1`
  <executable> $arg $dest/original/$base.jpg $dest/mask/$base.png 2>>$dest/log
  #composite -compose CopyOpacity $dest/mask/$base.png $dest/original/$base.jpg $dest/rgba/$base.png
done

The optional composite command (comes with ImageMagick) will create a new PNG image which has the mask applied as alpha channel. Note that this file will be about 5 times larger than the original files.

Note that some HPI files come without mask. In this case, my program will still work, but give an empty PNG file.

OTHER TIPS

I had a similar problem extracting images from an MS Word document. Here's the program I wrote for that. It only extracts PNGs, though:

#!/usr/bin/perl
use strict;

my $HEADER = "\211PNG";
my $FOOTER = "IEND\xAEB`\x82";

foreach my $file ( @ARGV )
     {
     print "Extracting $file\n";
     (my $image_base = $file) =~ s/(.*)\..*/$1/;

     my $data = do { local $/; open my( $fh ), $file; <$fh> };

     my $count = 0;

     while( $data =~ m/($HEADER.*?$FOOTER)/sg )
        {
        my $image      = $1;
        $count++;
        my $image_name = "$image_base.$count.png";
        open my $fh, "> $image_name" or warn "$image_name: $!", next;
        print "Writing $image_name: ", length($image), " bytes\n";
        print $fh $image;
        close $fh;
        }

    }


__END__

Not a program-your-own solution, but this application, which is freeware for personal use, states that it can convert hpi files.

For those arriving by Google here, I've written a Python script that solves this problem for PNG images only:

#!/usr/bin/python
# -*- coding: utf-8 -*-
import re, sys

def main():
  if len(sys.argv) < 2:
    print """Usage:
  {0} BINARY_FILE PNG_PATH_TEMPLATE
Example:
  {0} bin/program 'imgs/image.{{0:03d}}.png'""".format(__file__)
    return
  binfile, pngpath_tpl = sys.argv[1:3]

  rx = re.compile("\x89PNG.+?IEND\xAEB`\x82", re.S)
  bintext = open(binfile, "rb").read()
  PNGs = rx.findall(bintext)

  for i, PNG in enumerate(PNGs):
    f = open(pngpath_tpl.format(i), "wb") # Simple string format.
    f.write(PNG)
    f.close()

if __name__ == "__main__":
  main()

For .jpeg and .mov files there is recoverjpeg, which I tested on linux (but may be compatible with other platforms).

On some debian systems it's available through apt get install recoverjpeg

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top