I am working on a project where I am logging some signals (1D arrays) into a binary file. I can store large amount of data (usually several giga bytes). Now I would like to load those files back.

I have two visual components. One is called Overview and other is just a plain Y/T chart. Overview should give users idea what is in the complete file (large file) while chart just shows a part of the file usually part which is selected in overview with resizable rectangle/band.

Because files can be really large loading to memory is not optimal at all so the main idea is to load to memory (max. several MB) only important data / visible data. So loading and displaying is done on user demand. If user zooms on chart, data needs to be reloaded with more data points from file.

My question is how to draw Overview component to show the whole content of file in the best way (without really loading all samples from file). Let's say my files are larger than 10GB and I would like to give users idea what is in file but I can draw max. 16k samples on Overview component?

Is there any method to store any additional data (like indexing, smaller data chunks, images...) during logging for later loading and drawing Overview component? Currently I am storing only samples but adding additional data wouldn't be a problem. Do you have any experiences with that an how you did it?

To get an idea what I am doing:

enter image description here

有帮助吗?

解决方案

You may use multi-level decimation - just store every Nth, N^2th and so on samples (for example, 10th, 100, 1000, 10000...). When user changes window size, choose appropriate level, which contains about 1000 samples in this window, load and show these samples quickly (1000 points is just reasonable number for a chart on screen).

If your data have some features, peculiar properties, it is possible to get bigger (level-up) data set and apply Douglas-Peucker polyline simplification algorithm to preserve the features.

其他提示

When I write to file I write several data chunks with fixed size (depends on DAQ samplerate). After fixed count of those chunks was written I add one statistic chunk which holds information about how many data chunks has been written and Maximum, Minimum, Average and Variance calculated from all written data chunks together. Then I repeat it... until recording is stopped by user.

File struct is:

[File header]
[DataChunk1]
[DataChunk2]
...
[DataChunkN]
[StatsChunk1]
[DataChunkN+1]
[DataChunkN+2]
...
[DataChunkN+..]
[StatsChunk2]
....

When I want to load file and draw data I just recalc what data/px ratio I have currently set with zoom on my chart. There are two situations. If zoom is really in thus the data/px ratio is <=1 I need to load appropriate amount of data from file (from data chunks) and display it on chart (do some interpolation if neccesary). Drawing is simple... just line from point to point because we are showing all data.

In case when ratio is >1 I load appropriate count of statistic chunks (instead of data chunks) and use them to draw a chart. First I draw an envelope with Minimum and Maximum (two series with painted area between them), then I draw one series which is Average +/- std. deviation (sqrt(Variance). That way I can show an overview of data in this chunks. Because I am reading only statistic chunks reading performance is really good (fast). If file is large enough that statistic chunks count / px ratio gets greater than 1 I can simply decimate statistic chunks and do the drawing.

About count of chunks to load etc... I have to experiment a bit to see what gives me the best results but first trials were really encouraging. Later I will add some comments about end results and a photo to show it. Thanks for your ideas and contributions.

许可以下: CC-BY-SA归因
不隶属于 StackOverflow
scroll top