Matlab read heterogenious binary data

https://stackoverflow.com/questions/21424938

04-10-2022
|

문제

I'd like to read heterogenious binary data into matlab. I do know from the beginning how much it is and in which datatype each segment is. For example:

%double %double %int32 ...

and then this get repeated about a million times. Easy enoug to handle with fread since the know the number of bites for each segment and can therefore calculate the skip value for each row.

But now the data segment looks something like this :

%double %int32%*char %double %double ...

Whereby the int prior to the *char is the length of the said string. This brings the problem that I cannot calculate the skip anymore and I'm stuck be reading in the whole file line by line therefore needing to make a lot more file access and slowing everthing down.

In order to get at least some speed up I wan't to read in all %double %double ... (Around 30 elements) at a time and then use those from a buffer to fill up the array's. In C this would be a rather easy task here, without memcpy and not so direct access to pointers...

Do you know any way to achive this, not using mex files?

해결책

You can't solve the problem that the record size is unknown, and thus you don't know how much to read ahead of time. But you can batch up the reads, and if you have a reasonable max size for the string, you can always read that amount, and ignore the unneeded bytes at the end. typecast is the trick:

readlen = 1024;
buf = fread(fid, readlen, '*uint8');   % the asterisk keeps the returned array as uint8
rec.val1 = typecast(buf(1:8), 'double');
string_len = typecast(buf(9:12), 'int32');
rec.str1 = typecast(buf(13:13+string_len-1), 'uint8');

pos = 13+string_len;
rec.val2 = typecast(buf(pos:pos+8-1), 'double');

You might wrap a simple function around this technique to track the current offset automatically.

라이센스 : CC-BY-SA ~와 함께 속성

제휴하지 않습니다 StackOverflow