Question

I have a question regarding to read a txt file in matlab were the format is not know , but each row in the txt file always start like this:

2012-11-01 00:00:00.00 XX YY  00.000s  

Then some different stuff is logged and the txt file can look different, for example

Ex1:    2012-11-01 00:00:00.00 XX YY  00.000s  000.00deg  0.00rpm  0.00rpm
Ex2:    2012-11-01 00:00:00.00 XX YY  00.000s  000.00deg  0.00rpm   
Ex3:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00deg 0.00rpm 0.00rpm 0.0deg      
Ex4:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00rpm

I handle this with textscan and use:

Fid = fopen('text.txt');
initfrm = {'%s%s%s%s %.3f %s'};
frm = repmat('%.2f %s',1,NCol);
frm = strcat(initfrm, frm);
Tmp = textscan(fid,frm{1});
Fclose(fid);

In the file its calculated how many col (NCol) we have logged but is not showed here

But sometimes the text file includes 0.0%, for example:

Ex1:    2012-11-01 00:00:00.00 XX YY  00.000s 000.00deg   0.00rpm  0.00rpm  0.0%

Now '%.2f' won’t work. I don’t know when the log is like this. Is there a better way to separate the float and string when they are printed together; I just want collect the data (float) so I can plot.

How can I get all float values when it varies with %.2f and %.1f; you don't know the pattern.

Was it helpful?

Solution

Importing text like this can be a real pain; usually, this is a good test of your knowledge of string manipulation :)

I believe the following commands will do nicely:

% Read in entire file as string
fid = fopen('yourFile.txt');
    C = textscan(fid, '%s', 'delimiter', '');
fclose(fid);
C = C{1};

% Remove first part (from column 39 onwards in your example; 
% adjust to match your actual data)
C = cellfun(@(x)x(39:end), C, 'UniformOutput',false);

% Remove unwanted junk
% NOTE: this removes all occurrences of 'rpm', 'deg', 
% 's', and the trailing '0.0%'
C = regexprep(C, {'deg' 'rpm' 's' '([0-9]+\.[0-9]+%)$'}, '');

% Tokenize string and convert to double
C = cellfun(@(x)textscan(x, '%f'), C);

I tested this with yourFile.txt:

Ex1:    2012-11-01 00:00:00.00 XX YY  00.000s  000.00deg  0.00rpm  0.00rpm
Ex2:    2012-11-01 00:00:00.00 XX YY  00.000s  000.00deg  0.00rpm   
Ex3:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00deg    0.00rpm  0.00rpm 0.0deg      
Ex3:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00deg    0.00rpm  0.00rpm 0.0deg    0.0%
Ex4:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00rpm
Ex4:    2012-11-01 00:00:00.00 XX YY  00.000s  0.00rpm

The final contents of C with the commands above is

>> C{:}
ans =
     0
     0
     0
     0
ans =
     0
     0
     0
ans =
     0
     0
     0
     0
     0
ans =
     0
     0
     0
     0
     0
ans =
     0
     0
ans =
     0
     0

OTHER TIPS

I am not sure I have interpreted your question correctly. It seems to me that you have a variable number of tokens,either N or N+1 (N+m, perhaps?), in each line of text.

If so, I would suggest an approach based on extracting tokens from each line.

Consider this:

  1. you use fgets to extract each line from your file;
  2. you use strtok to iteratively separate tokens (i.e., tokenize your string. You use ' ' as token delimiter);
  3. because you have an initial pattern which is fixed, you may want to re-merge the first N tokens and parse them as you already do. Then, you may want to check if the token in position N+1 is present and finally parse it.
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top