Domanda

I have a large set of text files (tab delimited data) I need to parse. They are mostly well formatted. However, there are randomly interspersed rows that include erroneous characters, like what is shown below. The location of the bad rows is different in each file, but the characters added are always the same.

1   3
2   873
3   46
23  99798
23  1
353 79
"23 ,"  967
35  8028
253 615
"235 ," 3924
345 188
345 579
345 419
56  16835
23  449

importdata(filename) imports all of the data up to the first badly formatted line, then ignores the rest of the file. I think I could do what I am trying to do with a combination of fopen and textscan, but I can't seem to get the right combination of arguments to make it work.

È stato utile?

Soluzione

Have a go at using textread function with the %q format string. Assuming the test data in the question is saved as test.txt:

>> [a, b] = textread('test.txt', '%q %q');

>> a'

ans = 

  Columns 1 through 9

    '1'    '2'    '3'    '23'    '23'    '353'    '23 ,'    '35'    '253'

  Columns 10 through 15

    '235 ,'    '345'    '345'    '345'    '56'    '23'

>> b'

ans = 

  Columns 1 through 9

    '3'    '873'    '46'    '99798'    '1'    '79'    '967'    '8028'    '615'

  Columns 10 through 15

    '3924'    '188'    '579'    '419'    '16835'    '449'

Then you can use str2double to remove the trailing columns in a. For example:

>> str2double(a)'

ans =

  Columns 1 through 13

     1     2     3    23    23   353    23    35   253   235   345   345   345

  Columns 14 through 15

    56    23
Autorizzato sotto: CC-BY-SA insieme a attribuzione
Non affiliato a StackOverflow
scroll top