Question

My professor showed us an example of a program that reads in particle structure objects and prints the details of each particle. I understand how the C program works but am confused about the "filea" binary file that contains the "structure objects". How is the data being automatically assigned to the values of the structs in the C program? The filea, being binary, isn't comprehensible so I'm not sure exactly how it is working and when I asked him about it I didn't get a clear answer.

Here is the program:

#include <unistd.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>

struct vector{
        float x;
        float y;
        float z;
};

struct particle {
        float mass;
        struct vector pos;
        struct vector vel;
};

int main(int argc, int *argv[]) {
    int cnt = 0;
    int fd, nbytes;
    struct particle *buf = (struct particle *)malloc(sizeof(struct particle)); 
    fd = open("filea",O_RDONLY); 

    while ((nbytes = read(fd,buf,sizeof(struct particle))) > 0){
        printf("Particle %d\n", cnt++);
        printf("\tmass\t%.1f\n",buf->mass);
        printf("\tpos\t(%.1f,%.1f,%.1f)\n",buf->pos.x,buf->pos.y,buf->pos.z);
        printf("\tvel\t(%.1f,%.1f,%.1f)\n",buf->vel.x,buf->vel.y,buf->vel.z);
    }   
    close(fd);
    free(buf);

    return 1;
}

The slide said "Each particle is represented by the structures:"

struct vector {
    float x;
    float y;
    float z;
};

struct particle {
    float mass;
    struct vector pos;
    struct vector vel;
};
Était-ce utile?

La solution 2

The two structures, vector and particle are fixed length structures. vector is 3 floats, so if we assume a 4 byte float, that structure is 12 bytes, and particle is made up of 1 float and 3 vectors, so 4 bytes + 3 * 12, for a total of 40 bytes.

read takes the pointer to the file stream, a memory address (in this case a buffer the size of a particle), and a size (again, the size of a particle). It returns the number of bytes read (I think, it may return the number of blocks of data read).

So, read literally transfers the bytes from the file in to the buffer pointed at by buf. buf happens to be typed as a pointer to particle, so all of the structure operators conveniently work (as seen by the printf statements).

When the read reaches the end of file, it will "fail" and return a 0 instead of the count of the data read, and that terminates the loop.

The data on the disk must match the internal, binary layout of the structures and the floats within those structures, otherwise you will get garbage data. For example, if you wrote the file on a machine that is "little endian" and read the file on a "big endian" machine, it's very likely the data would be corrupted, since the internal representations likely differ due to the endianess,

This technique is an efficient, and simple, mechanism for store and reading data, but is not portable.

Autres conseils

It works by reading the variables in exactly the same order as they have been declared. The particle struct starts with a variable called mass of type float, so the first thing that is read is a total of 4 bytes (assuming that a float is 4 bytes) and that is assigned to mass. Then comes a struct vector called pos, and it contains three floats, so these are read next in that same order. That is, the next 4 bytes are assigned to pos.x, then the next 4 bytes are assigned to pos.y, and the next 4 to pos.z. The same thing is repeated for vel.

This is all done in one sigle step: an entire block with the size of the struct particle is read and copied to buf, and everything is expected to get copied to its correct location. This works if the declaration of struct particle has not changed, otherwise it would not. This technique is fast, but depends heavily on a fixed declaration of the struct. And, because it is using a binary file, it also is machine-dependant.

This call:

read(fd,buf,sizeof(struct particle))

...fills the contents pointed to by buf with the data from the file. buf is aliased to a struct particle, so you should expect filea's contents to be a series of seven single-precision floating point values (one value for mass, one ordinate triple for pos followed by one ordinate triple for vel). The sequence of the values in struct particle is the expected sequence of values in the contents of filea.

Note that this practice is sometimes frowned upon because of problems with portability. Endianness (not at issue with this example) and alignment/padding are opportunities for portability issues when using a struct's in-memory representation as its serialized-to-disk (or network) representation. A way to mitigate this is to define a file format which is independent of the in-memory representation. The unfortunate cost is that you must often write each field independently.

When you declare structs in C, the memory is laid out in a very specific fashion. The variables are right next to each other in memory. This answer has a decent illustration.

Your professor wrote a c program to write that out to that file. One can then read the file because you know the size of the struct (and since the variables were next to each other in memory) you can recover the entire data structure.

This line reads sizeof(struct particle) bytes into fd, and places them into buf.

read(fd,buf,sizeof(struct particle))

Thus loading the struct that (presumably) your professor wrote to the file.

Because of difference in platform implementation of primitive types in c, this binary file may not be portable to other systems.

Your professor saved the particle data to the file as bytes. Since the size of struct particle is fixed, the example code can then read the file in chunks with the size sizeof(struct particle) that match the structure of struct particle.

This approach is convenient for an example like this, but there are too many assumptions on the machine's memory model for it to be a portable solution.

Licencié sous: CC-BY-SA avec attribution
Non affilié à StackOverflow
scroll top