Simple data serialization in C

https://stackoverflow.com/questions/6382626

28-10-2019
|

Question

I am currently re-designing an application and stumbled upon a problem serializing some data.

Say I have an array of size mxn

double **data;

that I want to serialize into a

char *dataSerialized

using simple delimiters (one for rows, one for elements).

De-serialization is fairly straightforward, counting delimiters and allocating size for the data to be stored. However, what about the serialize function, say

serialize_matrix(double **data, int m, int n, char **dataSerialized);

What would be the best strategy to determine the size needed by the char array and allocate the appropriate memory for it?

Perhaps using some fixed width exponential representation of double's in a string? Is it possible to just convert all bytes of double into char's and have a sizeof(double) aligned char array? How would I keep the accuracy of the numbers intact?

NOTE:

I need the data in a char array, not in binary, not in a file.

The serialized data will be sent over the network using ZeroMQ between a C server and a Java client. Would it be possible, given the array dimensions and sizeof(double) that it can always be accurately reconstructed between those two?

Solution

Java has pretty good support for reading raw bytes and converting into whatever you want. You can decide on a simple wire-format, and then serialize to this in C, and unserialize in Java.

Here's an example of an extremely simple format, with code to unserialize and serialize.

I've written a slightly larger test program that I can dump somewhere if you want; it creates a random data array in C, serializes, writes the serialized string base64-encoded to stdout. The much smaller java-program then reads, decodes and deserializes this.

C code to serialize:

/* 
I'm using this format:
32 bit signed int                   32 bit signed int                   See below
[number of elements in outer array] [number of elements in inner array] [elements]

[elements] is buildt like
[element(0,0)][element(0,1)]...[element(0,y)][element(1,0)]...

each element is sendt like a 64 bit iee754 "double". If your C compiler/architecture is doing something different with its "double"'s, look forward to hours of fun :)

I'm using a couple non-standard functions for byte-swapping here, originally from a BSD, but present in glibc>=2.9.
*/

/* Calculate the bytes required to store a message of x*y doubles */
size_t calculate_size(size_t x, size_t y)
{
    /* The two dimensions in the array  - each in 32 bits - (2 * 4)*/
    size_t sz = 8;  
    /* a 64 bit IEE754 is by definition 8 bytes long :) */
    sz += ((x * y) * 8);    
    /* and a NUL */
    sz++;
    return sz;
}

/* Helpers */
static char* write_int32(int32_t, char*);
static char* write_double(double, char*);
/* Actual conversion. That wasn't so hard, was it? */
void convert_data(double** src, size_t x, size_t y, char* dst)
{

    dst = write_int32((int32_t) x, dst);    
    dst = write_int32((int32_t) y, dst);    

    for(int i = 0; i < x; i++) {
        for(int j = 0; j < y; j++) {
            dst = write_double(src[i][j], dst);
        }
    }
    *dst = '\0';
}


static char* write_int32(int32_t num,  char* c)
{
    char* byte; 
    int i = sizeof(int32_t); 
    /* Convert to network byte order */
    num = htobe32(num);
    byte = (char*) (&num);
    while(i--) {
        *c++ = *byte++;
    }
    return c;
}

static char* write_double(double d, char* c)
{
    /* Here I'm assuming your C programs use IEE754 'double' precision natively.
    If you don't, you should be able to convert into this format. A helper library most likely already exists for your platform.
    Note that IEE754 endianess isn't defined, but in practice, normal platforms use the same byte order as they do for integers.
*/
    char* byte; 
    int i = sizeof(uint64_t);
    uint64_t num = *((uint64_t*)&d);
    /* convert to network byte order */
    num = htobe64(num);
    byte = (char*) (&num);
    while(i--) {
        *c++ = *byte++; 
    }
    return c;
}

Java code to unserialize:

/* The raw char array from c is now read into the byte[] `bytes` in java */
DataInputStream stream = new DataInputStream(new ByteArrayInputStream(bytes));

int dim_x; int dim_y;
double[][] data;

try {   
    dim_x = stream.readInt();
    dim_y = stream.readInt();
    data = new double[dim_x][dim_y];
    for(int i = 0; i < dim_x; ++i) {
        for(int j = 0; j < dim_y; ++j) {
            data[i][j] = stream.readDouble();
        }
    }

    System.out.println("Client:");
    System.out.println("Dimensions: "+dim_x+" x "+dim_y);
    System.out.println("Data:");
    for(int i = 0; i < dim_x; ++i) {
        for(int j = 0; j < dim_y; ++j) {
            System.out.print(" "+data[i][j]);
        }
        System.out.println();
    }


} catch(IOException e) {
    System.err.println("Error reading input");
    System.err.println(e.getMessage());
    System.exit(1);
}

OTHER TIPS

If you are writing a binary file, you should think of a good way to serialize the actual binary data (64bit) of your double. This could go from directly writing the content of the double to the file (minding endianness) to some more elaborate normalizing serialization schemes (e.g. with a well-defined representation of NaN). That's up to you really. If you expect to be basically among homogeneous architectures, a direct memory dump would probably suffice.

If you want to write to a text file and a are looking for an ASCII representation, I would strongly discourage a decimal numerical representation. Instead, you could convert the 64-bit raw data to ASCII using base64 or something like that.

You really want to keep all the precision that you have in your double!

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow