Question

I need to write Byte Array value into Cassandra using Java code. Then I will be having my C++ program which will retrieve that Byte Array data from Cassandra and then it will deserialize it.

That Byte Array which I will be writing into Cassandra is made up of three Byte Arrays as described below-

short employeeId = 32767;
long lastModifiedDate = "1379811105109L";
byte[] attributeValue = os.toByteArray();

Now, I will write employeeId , lastModifiedDate and attributeValue together into a single Byte Array and that resulting Byte Array I will write into Cassandra and then I will be having my C++ program which will retrieve that Byte Array data from Cassandra and then deserialize it to extract employeeId , lastModifiedDate and attributeValue from it.

I am not sure whether I should use Big Endian here in my Java code while writing to Cassandra so that C++ code get simplified while reading it back?

I have given a try on the Java side to make sure it is following certain format (Big Endian) while writing everything into a single Byte Array and then this Byte Array will be written back to Cassandra as well but not sure whether this is right or not?

public static void main(String[] args) throws Exception {

    String os = "Byte Array Test";
    byte[] attributeValue = os.getBytes();

    long lastModifiedDate = 1379811105109L;
    short employeeId = 32767;

    ByteArrayOutputStream byteOsTest = new ByteArrayOutputStream();
    DataOutputStream outTest = new DataOutputStream(byteOsTest);

    // merging everything into one Byte Array here
    outTest.writeShort(employeeId);
    outTest.writeLong(lastModifiedDate);
    outTest.writeInt(attributeValue.length);
    outTest.write(attributeValue);

    byte[] allWrittenBytesTest = byteOsTest.toByteArray();

    // initially I was writing allWrittenBytesTest into Cassandra...

    ByteBuffer bb = ByteBuffer.wrap(allWrittenBytesTest).order(ByteOrder.BIG_ENDIAN);

    // now what value I should write into Cassandra?
    // or does this even looks right?

    // And now how to deserialize it?

}

Can anyone help me with this ByteBuffer thing here? Thanks..

I might be mising minute details about Byte Buffer here as this is the first time I am working with it..

  1. First of all, should I be using ByteByffer here at all in my use case?
  2. Secondly, if yes, then what's the best way to use it within my use case...?

The only thing that I am trying to make sure is, I am writing correctly into Cassandra by following Big-Endians byte order format so that on C++ side, I am not facing any problem at all while deserializing that Byte Array...

Was it helpful?

Solution 2

First of all, I never used cassandra, I will only answer in regard to the ByteBuffer part.

You should put everything into the bytebuffer first before sending the bytes, otherwise you cannot control the endianess of what you are storing, and thats exactly the point of using the ByteBuffer.

To send the bytes use:

int size = 2 + 8 + 4 + attributeValue.length; // short is 2 bytes, long 8 and int 4

ByteBuffer bbuf = ByteBuffer.allocate(size); 
bbuf.order(ByteOrder.BIG_ENDIAN);

bbuf.putShort(employeeId);
bbuf.putLong(lastModifiedDate);
bbuf.putInt(attributeValue.length);
bbuf.put(attributeValue);

bbuf.rewind();

// this is a bad approach because if you modify the returned array
// you are directly modifying the ByteBuffer's internal array.
byte[] bytesToStore = bbuf.array();

// best approach is copy the internal buffer
byte[] bytesToStore = new byte[size];
bbuf.get(bytesToStore);

now you can store bytesToStore, sending them to cassandra.

To read them back:

byte[] allWrittenBytesTest = magicFunctionToAcquireDataFromCassandra();

ByteBuffer bb = ByteBuffer.wrap(allWrittenBytesTest);
bb.order(ByteOrder.BIG_ENDIAN);
bb.rewind();

int size = allWrittenBytesTest.length - 14;
short employeeId = bb.getShort();
long lastModifiedDate = bb.getLong();
int attributeValueLen = bb.getInt();
byte[] attributeValue = new byte[size];
bb.get(attributeValue); // read attributeValue from the remaining buffer

You don't even need to store the attributeValue length, because the length can be determined again, by subtracting 14 from allWrittenBytesTest.length (being 14 the sum of the other fields size [2 + 4 + 8]).

Edited the code, I had some typos.

OTHER TIPS

Instead of serializing ByteBuffers for Thrift manually, use the native CQL driver for Cassandra: http://github.com/datastax/java-driver

For byte-array endiness doesn't make sense at all. So if casandra doesn't try to interpret your data you can use whether big/little endian. So the encoding makes sense for multi-byte values only.

If you are going to use the data with different clients and probably on different platfroms, I would recommend to take some agreement (use BIG endian for instance) and use same endiness on all of your clients. For instance, java client code would look like this:

ByteBuffer bb = ByteBuffer.allocate(attributeValue.length + 14).order(ByteOrder.BIG_ENDIAN);
    bb.putShort(employeeId);
    bb.putLong(lastModifiedDate);
    bb.putInt(attributeValue.length);
    bb.put(attributeValue);

You have to use ByteBuffer if you are going to use an API which requires it. For instance NIO channels work with ByteBuffers, so if you are going to connect using SocketChannel you can use ByteBuffer. You can also use ByteBuffer for correctly formatting your multi-byte values. For instance for the code above you can get byte array from the buffer and send it through a socket where 3 first fields packed using big-endian notation:

sendByteArray(bb.array());
...
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top