How to use regexp to find unique combinations of letters and use them as variables in Matlab?

StackOverflow https://stackoverflow.com/questions/22207470

  •  09-06-2023
  •  | 
  •  

Question

I have the file names of four files stored in a cell array called F2000. These files are named:

  • L14N_2009_2000MHZ.txt
  • L8N_2009_2000MHZ.txt
  • L14N_2010_2000MHZ.txt
  • L8N_2009_2000MHZ.txt

Each file consists of an mxn matrix where m is the same but n varies from file to file. I'd like to store each of the L14N files and each of the L8N files in two separate cell arrays so I can use dlmread in a for loop to store each text file as a matrix in an element of the cell array. To do this, I wrote the following code:

idx2009=cellfun('isempty',regexp(F2000,'L\d{1,2}N_2009_2000MHZ.txt'));
F2000_2009=F2000(idx2009);
idx2010=~idx2009;
F2000_2010=F2000(idx2010);
cell2009=cell(size(F2000_2009));
cell2010=cell(size(F2000_2010));
for k = 1:numel(F2000_2009)
  cell2009{k}=dlmread(F2000_2009{k});
end

and repeated a similar "for" loop to use on F2000_2010. So far so good. However.

My real data set is much larger than just four files. The total number of files will vary, although I know there will be five years of data for each L\d{1,2}N (so, for instance, L8N_2009, L8N_2010, L8N_2011, L8N_2012, L8N_2013). I won't know what the number of files is ahead of time (although I do know it will range between 50 and 100), and I won't know what the file names are, but they will always be in the same L\d{1,2}N format.

In addition to what's already working, I want to count the number of files that have unique combinations of numbers in the portion of the filename that says L\d{1,2}N so I can further break down F2000_2010 and F2000_2009 in the above example to F2000_2010_L8N and F2000_2009_L8N before I start the dlmread loop.

Can I use regexp to build a list of all of my unique L\d{1,2}N occurrences? Next, can I easily change these list elements to strings to parse the original file names and create a new file name to the effect of L14N_2009, where 14 comes from \d{1,2}? I am sure this is a beginner question, but I discovered regexp yesterday! Any help is much appreciated!

Was it helpful?

Solution

Here is some code which might help:

% Find all the files in your directory

files = dir('*2000MHZ.txt');
files = {files.name};

% match identifiers

 ids = unique(cellfun(@(x)x{1},regexp(files,'L\d{1,2}N','match'),...
              'UniformOutput',false));

% find all years

years = unique(cellfun(@(x)x{1},regexp(files,'(?<=L\d{1,2}N_)\d{4,}','match'),...
               'UniformOutput',false));

% find the years for each identifier

for id_ix = 1:length(ids) 
    % There is probably a better way to do this
    list = regexp(files,['(?<=' ids{id_ix} '_)\d{4,}'],'match');
    ids_years{id_ix} = cellfun(@(x)x{1},list(cellfun(...
                               @(x)~isempty(x),list)),'uniformoutput',false);
end

% If you need dynamic naming, I would suggest dynamic struct names:

for ix_id = 1:length(ids)
    for ix_year = 1:length(ids_years{ix_id})
        % the 'Y' is in the dynamic name becuase all struct field names must start with a letter
        data.(ids{ix_id}).(['Y' ids_years{ix_id}{ix_year}]) =...
                                         'read in my data here for each one';
    end
end

Also, if anyone is interested in mapping keys with values try looking into the containers.map class.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top