Matlab: how handle abnormal data files
-
25-09-2019 - |
Question
I am trying to import a large number of files into Matlab for processing. A typical file would look like this:
mass intensity
350.85777 238
350.89252 3094
350.98688 2762
351.87899 468
352.17712 569
352.28449 426
Some text and numbers here, describing the experimental setup, eg
Scan 3763 @ 81.95, contains 1000 points:
The numbers in the two columns are separated by 8 spaces. However, sometimes the experiment will go wrong and the machine will produce a datafile like this one:
mass intensity
Some text and numbers here, describing the experimental setup, eg
Scan 3763 @ 81.95, contains 1000 points:
I found that using space-separated files with a single header row, ie
importdata(path_to_file,' ', 1);
works best for the normal files. However, it totally fails on all the abnormal files. What would the easiest way to fix this be? Should I stick with importdata (already tried all possible settings, it just doesn't work) or should I try writing my own parser? Ideally, I would like to get those values in a Nx2 matrix for normal files and [0 0] for abnormal files.
Thanks.
Solution
I don't think you need to create your own parser, nor is this all that abnormal. Using textscan is your best option here.
fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);
fclose(fid);
mass = data{1};
intensity = data{2};
Yields:
mass =
350.8578
350.8925
350.9869
351.8790
352.1771
352.2845
intensity =
238
3094
2762
468
569
426
For your 1st file and:
mass =
Empty matrix: 0-by-1
intensity =
Empty matrix: 0-by-1
For your empty one.
By default, text scan reads whitespace as a delimiter, and it only reads what you tell it to until it can no longer do so; thus it ignores the final lines in your file. You can also run a second textscan after this one if you want to pick up those additional fields:
fid = fopen('input.txt', 'rt');
data = textscan(fid, '%f %u', 'Headerlines', 1);
mass = data{1};
intensity = data{2};
data = textscan(fid, '%*s %u %*c %f %*c %*s %u %*s', 'Headerlines', 1);
scan = data{1};
level = data{2};
points = data{3};
fclose(fid);
Along with your mass and intensity data gives:
scan =
3763
level =
81.9500
points =
1000
OTHER TIPS
what do you mean 'totally failes on abnormal files'?
you can check if importdata finds any data using e.g.
>> imported = importdata(path_to_file,' ', 1);
>> isfield(imported, 'data')