Question

Long story short, I need to convert a bytea from PostgreSQL to an int[][] using Java 6 (or SQL.)

If you want context, feel free to read below, but that's really all there is to it.

Context

Somebody who used to work in the team I am in wrote a bean like this:

public class FourPointBean {
  @Column(name = "points", length = "65535")
  private int[][] xy;
  ... other irrelevant stuff
}

Where the int[][] was really just four points (no idea why he chose an int[][]...)

(xy[0][0], xy[0][1]) := (x0, y0)
(xy[1][0], xy[1][1]) := (x1, y1)
(xy[2][0], xy[2][1]) := (x2, y2)
(xy[3][0], xy[3][1]) := (x3, y3)

And apparently for years Hibernate had been throwing an exception that everybody ignored because it was "only a warning" --

WARNING: duplicate class definition bug occured? Please report this :
com/company/blah/SomeBean$JaxbAccessorM_getXy_setXy_Array_Of_[I
java.lang.ClassFormatError: Illegal class name
"com/company/blah/SomeBean$JaxbAccessorM_getXy_setXy_Array_Of_[I" in class file
com/company/blah/SomeBean$JaxbAccessorM_getXy_setXy_Array_Of_[I
      at java.lang.ClassLoader.defineClass1(Native Method)
...

The result was Hibernate inserting the int array as a byta in the table. Eventually this bean definition actually caused some issues, so I changed it --

public class FourPointBean {
  @Type(type = "com.company.blah.PointArrayType")
  @Column(name = "points", length = 65535, columnDefinition = "int[][]")
  private int[][] xy;
  ... other irrelevant stuff
}

And then I changed the DTO used by the UI to:

public class FourPointDTO {

  private List<Point> points = Lists.newArrayListWithCapacity(4);
  ...
}

Which is all fine and dandy, but now I have to write something to migrate all of the messy existing byte arrays in the table in question to a PostgreSQL int[]... using Java 6.

Everything I saw online was involving 1D-arrays that did not translate cleanly... right now I can't even figure out how to turn int[][] into a byte array for testing purposes.

As always, help is much appreciated...


Edit

Here are a couple translations --


(00525bc5f039-2d70-40f0-922c-0ef7060816be)

int[][]

[0][0] := 538
[0][1] := 760
[1][0] := 676
[1][1] := 760
[2][0] := 676
[2][1] := 890
[3][0] := 538
[3][1] := 890

Bytes

 \xaced0005757200035b5b4917f7e44f198f893c020000787000000004757200025b494dba602676eab2a50200007870000000020000021a000002f87571007e000200000002000002a4000002f87571007e000200000002000002a40000037a7571007e0002000000020000021a0000037a

(005276c1cb74-2476-43bf-856e-43912e969000)

int[][]

[0][0] := 544
[0][1] := 638
[1][0] := 657
[1][1] := 638
[2][0] := 657
[2][1] := 743
[3][0] := 544
[3][1] := 743

Bytes

 \xaced0005757200035b5b4917f7e44f198f893c020000787000000004757200025b494dba602676eab2a5020000787000000002000002200000027e7571007e000200000002000002910000027e7571007e00020000000200000291000002e77571007e00020000000200000220000002e7
Was it helpful?

Solution

Short version: Looks like Hibernate was saving a serialized Java object of type int[][]. Gross. Deserialize it in Java using a ByteArrayInputStream wrapped in an ObjectInputStream.


To get the raw bytes, a Python snippet like:

points_hex = open("/path/to/points.txt").readlines()
points = [ p[2:-1].strip().decode("hex") for p in points_hex ]

works. I suspected there was a common prefix, so I checked. Courtesty of this nice easy longest-common-prefix algo:

from itertools import takewhile,izip

def allsame(x):
    return len(set(x)) == 1

r = [i[0] for i in takewhile(allsame ,izip(*points))]
''.join(r).encode("hex")

it's confirmed to be:

\xaced0005757200035b5b4917f7e44f198f893c020000787000000004757200025b494dba602676eab2a50200007870000000020000

The presence of a prefix like this strongly suggests that we're dealing with a serialized Java object, not a bytea representation of a series of points in an array. That's easily confiremed with file, after writing a point's raw binary to a file:

open("/tmp/point","w").write(points[0])

then in the shell:

$ file /tmp/point 
/tmp/point: Java serialization data, version 5

You should decode these points by de-serializing them into an int[][] using Java. It'll be possible for a simple object like int[][] in SQL, but there's no point doing it manually when you can just ask Java to handle it for you.

Update:

I was feeling nice, so here's Java code to decode it:

import java.io.*;
import java.util.Arrays;

public class Deserialize {

    // Credit: https://stackoverflow.com/a/140861/398670
    public static byte[] hexStringToByteArray(String s) {
        int len = s.length();
        byte[] data = new byte[len / 2];
        for (int i = 0; i < len; i += 2) {
            data[i / 2] = (byte) ((Character.digit(s.charAt(i), 16) << 4)
                                 + Character.digit(s.charAt(i+1), 16));
        }
        return data;
    }


    public static void main( String[] args ) throws Exception {
        if (args.length != 1) {
            System.err.println("Usage: java Deserialize aced....hexstring...");
            System.exit(1);
        }

        String hex = args[0];
        if (hex.startsWith("\\x")) {
            hex = hex.substring(2);
        }

        ByteArrayInputStream bis = new ByteArrayInputStream(hexStringToByteArray(hex));
        ObjectInput in = new ObjectInputStream(bis);
        Object obj_read = in.readObject();

        if (obj_read instanceof int[][]) {
            int[][] obj_arr = (int[][]) obj_read;
            System.err.println("Array contents are: " + Arrays.deepToString(obj_arr) );
        }
    }

}

Usage:

$ javac Deserialize.java 
$ java Deserialize '\xaced0005757200035b5b4917f7e44f198f893c020000787000000004757200025b494dba602676eab2a50200007870000000020000021a000002f87571007e000200000002000002a4000002f87571007e000200000002000002a40000037a7571007e0002000000020000021a0000037a'
Array contents are: [[538, 760], [676, 760], [676, 890], [538, 890]]

Of course, in reality you'll be using PgJDBC, which hands you a byte[] directly, so you don't have to do the hex decoding I do above.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top