Write values of different data types in sequence in memory? or, Array with multiple data types?

https://stackoverflow.com/questions/7182402

11-01-2021
|

Question

I am relatively new to writing in C. I have self taught myself using what resources I have found online and in print. This is my first real project in C programming. Gotta love on-the-job training.

I am writing some code in C that is being used on a Texas Instruments C6701 Digital Signal Processor. Specifically, I am writing a set of communication functions to interface through a serial port.

The project I'm on has an existing packet protocol for sending data through the serial port. This works by handing over a pointer to the data to be transmitted and its length in bytes. All I have to do is write in the bytes to be transmitted into an "array" in memory (the transmitter copies that sequence of bytes into a buffer and transmits that).

My question pertains to how best to format the data to be transmitted, the data I have to send is composed of several different data types (unsigned char, unsigned int, float etc...). I can't expand everything up to float (or int) because I have a constrained communication bandwidth and need to keep packets as small as possible.

I originally wanted to use arrays to format the data,

unsigned char* dataTx[10];
dataTx[0]=char1;
dataTx[1]=char2;
etc...

This would work except not all my data is char, some is unsigned int or unsigned short.

To handle short and int I used bit shifting (lets ignore little-endian vs big-endian for now).

unsigned char* dataTx[10];
dataTx[0]=short1>>8;
dataTx[1]=short1;
dataTx[2]=int1>>24;
dataTx[3]=int1>>16;
etc...

However, I believe another (and better?) way to do this is to use pointers and pointer arithmetic.

unsigned char* dataTx[10]
*(dataTx+0) = int1;
*(dataTx+4) = short1;
*(dataTx+6) = char1;
etc...

My question (finally) is, is which method (bit shifting or pointer arithmetic) is the more acceptable method? Also, is one faster to run? (I also have run-time constraints).

My requirement: The data be located in memory serially, without gaps, breaks or padding.

I don't know enough about structures yet to know if a structure would work as a solution. Specifically, I don't know if a structure always allocates memory locations serially and without breaks. I read something that indicates they allocates in 8 byte blocks, and possibly introduce padding bytes.

Right now I'm leaning towards the pointer method. Thanks for reading this far into what seems to be a long post.

Solution

Usually you would use the bit shifting approach, because many chips do not allow you to copy, for example, a 4-byte integer to an odd byte address (or, more accurately, to a set of 4 bytes starting at an odd byte address). This is called alignment. If portability is an issue, or if your DSP does not allow misaligned access, then shifting is necessary. If your DSP incurs a significant performance hit for misaligned access, you might worry about it.

However, I would not write the code with the shifts for the different types done longhand as shown. I would expect to use functions (possibly inline) or macros to handle both the serialization and deserialization of the data. For example:

unsigned char dataTx[1024];
unsigned char *dst = dataTx;

dst += st_int2(short1, dst);
dst += st_int4(int1, dst);
dst += st_char(str, len, dst);
...

In function form, these functions might be:

size_t st_int2(uint16_t value, unsigned char *dst)
{
    *dst++ = (value >> 8) & 0xFF;
    *dst   = value & 0xFF;
    return 2;
}

size_t st_int4(uint32_t value, unsigned char *dst)
{
    *dst++ = (value >> 24) & 0xFF;
    *dst++ = (value >> 16) & 0xFF;
    *dst++ = (value >>  8) & 0xFF;
    *dst   = value & 0xFF;
    return 4;
}

size_t st_char(unsigned char *str, size_t len, unsigned char *dst)
{
    memmove(dst, str, len);
    return len;
}

Granted, such functions make the code boring; on the other hand, they reduce the chance for mistakes too. You can decide whether the names should be st_uint2() instead of st_int2() -- and, indeed, you can decide whether the lengths should be in bytes (as here) or in bits (as in the parameter types). As long as you're consistent and boring, you can do as you will. You can also combine these functions into bigger ones that package entire data structures.

The masking operations (& 0xFF) may not be necessary with modern compilers. Once upon a very long time ago, I seem to remember that they were necessary to avoid occasional problems with some compilers on some platforms (so, I have code dating back to the 1980s that include such masking operations). Said platforms have probably gone to rest in peace, so it may be pure paranoia on my part that they're (still) there.

Note that these functions are passing the data in big-endian order. The functions can be used 'as is' on both big-endian and little-endian machines, and the data will be interpreted correctly on both types, so you can have diverse hardware talking over the wire, using this code, and there will be no miscommunication. If you have floating point values to convey, you have to worry a bit more about the representations over the wire. Nevertheless, you should probably aim to have the data transferred in a platform-neutral format so that interworking between chip types is as simple as possible. (This is also why I used the type sizes with numbers in them; 'int' and 'long' in particular can mean different things on different platforms, but 4-byte signed integer remains a 4-byte signed integer, even if you are unlucky - or lucky - enough to have a machine with 8-byte integers.)

OTHER TIPS

You might want to use an array of unions.

The easiest and most traditional way to handle your problem is to set up the data you want to send, and then pass a pointer to your data on to the transmission routine. The most common example would be the POSIX send() routine:

ssize_t send(int socket, const void *buffer, size_t length, int flags);

Which for your case you can simplify to:

ssize_t send(const void *buffer, size_t length);

And then use something like:

send(&int1, sizeof int1);
send(&short1, sizeof short1);

To send it out. An example (but pretty naive) implementation for your situation might be:

ssize_t send(const void *buffer, size_t length)
{
  size_t i;
  unsigned char *data = buffer;

  for (i = 0; i < length; i++)
  {
     dataTx[i] = data[i];
  }
}

In other words, use the automatic conversion to void * and then back to char * to get byte-wise access to your data, and then send it out appropriately.

Long question, I'll try shorter answer.

Don't go on *(dataTx+4) = short1; etc. because this method may fail because most chips may do read/write only on some aligned positions. You can access by 16bit to positions aligned by 2, and 32bit on positions aligned by 4, but take an example of: "int32 char8 int32" - the second int32 have a position of (dataTx+5) - which is not 4-byte aligned, and you probably get the "bus error" or something like that (depending of CPU you'll use). Hope you understand this issue.

1st way - you can try struct, if you declare:

struct
{
    char a;
    int b;
    char c;
    short d;
};

you are now out-of-trouble, as the compilator itself would take care about struct alignment. Of course, read about alignment-related options in your compiler (if this is gcc, then this is simply called alignment), because there is probably a setting which force some alignment of struct fields or packing of struct fields. The GCC can even define alignment-per-struct (more here).

The other way is to use some "buffer-like approach" - something like in answer-post of Carl Norum (I won't be duplicating that answer), but also considering of use of memcpy() calls when more data is copied (e.g. long long or string), as this may be faster than copying byte-by-byte.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow