Question

I have multiple lines in some text files such as

.model sdata1 s tstonefile='../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p' passive=2

I want to extract the text between the single quotes in MATLAB.

Much help would be appreciated.

Was it helpful?

Solution

To get all of the text inside multiple '' blocks, regexp can be used as follows:

regexp(txt,'''(.[^'']*)''','tokens')

This says to get text surrounded by ' characters, which does not include a ' in the captured text. For example, consider this file with two lines (I made up different file name),

txt = ['.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'' passive=2 ', char(10), ...
'.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_3port_via_minstub.s00p'' passive=2']
>> stringCell = regexp(txt,'''(.[^'']*)''','tokens');
>> stringCell{:}
ans = 
    '../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'
ans = 
    '../data/s_element/isdimm_rcv_via_3port_via_minstub.s00p'
>> 

Trivia:

  • char(10) gives a newline character because 10 is the ASCII code for newline.
  • The . character in regexp (regex in the rest of the coding word) pattern usually does not match a newline, which would make this a safer pattern. In MATLAB, a dot in regexp does match a newline, so to disable this, we could add 'dotexceptnewline' as the last input argument to `regexp``. This is convenient to ensure we don't get the text outside of the quotes instead, but not needed since the first match sets precedent.
  • Instead of excluding a ' from the match with [^''], the match can be made non-greedy with ? as follows, regexp(txt,'''(.*?)''','tokens').

OTHER TIPS

If you plan to use textscan:

fid = fopen('data.txt','r');
rawdata = textscan(fid,'%s','delimiter','''');
fclose(fid);

output = rawdata{:}(2)

As also used in other answers the single apostrophe 'is represented by a double one: '', e.g. for delimiters.

considering the comment:

fid = fopen('data.txt','r');
rawdata = textscan(fid,'%s','delimiter','\n');
fclose(fid);

lines = rawdata{1,1};
L = size(lines,1);
output = cell(L,1);
for ii=1:L
    temp = textscan(lines{ii},'%s','delimiter','''');
    output{ii,1} = temp{:}(2);
end

One easy way is to split the string with single quote delimiter and take the even-numbered strings in the output:

str = fileread('test.txt');
out = regexp(str, '''', 'split');
out = out(2:2:end);

You can do this using regular expressions. Assuming that there is only one occurrence of text between quotation marks:

% select all chars between single quotation marks.
out = regexp(inputString,'''(.*)''','tokens','once');

After identifing which lines you want to extract info from, you could tokenize it or do something like this if they all have the same form:

test='.model sdata1 s tstonefile=''../data/s_element/isdimm_rcv_via_2port_via_minstub.s50p'' passive=2';
a=strfind(test,'''')
test=test(a(1):a(2))
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top