How to parse the file name and rename in Matlab

https://stackoverflow.com/questions/2456033

20-09-2019
|

Question

I am reading a .xls file and then procesing it inside and rewriting it in the end of my program. I was wondering if someone can help me to parse the dates as my input file name is like file_1_2010_03_03.csv

and i want my outputfile to be

newfile_2010_03_03.xls

is there a way to incorporate in matlab program so i do not have to manually write the command
xlswrite('newfile_2010_03_03.xls', M); everytime and change the dates as i input files with diff dates
like file_2_2010_03_04.csv.

Maybe i was not clear> I am using uigetfile to input 3 diff files in format file_1_2010_03_03.csv,file_2_2010_03_03.csv,file_3_2010_03_03.csv

Now i am processing the file inside my program and writing 4 output files with names newfileX_3_2010_03_03.xls,newfileXY_3_2010_03_03.xls,newfileXZ_3_2010_03_03.xls, newfileYZ_3_2010_03_03.xls

so my dates are not current date , but i need that from the input file and append that to newname for my xlswrite.

so was wondering if there is a way i can write a generic

xlswrite ('xxx ' M); which will pick the name i want instead of me having 2 modify the name' xxx' everytime i input a new file

Thanks

Solution

It looks like I misunderstood what you meant with 'file_1', 'file_2' - I thought the numbers 1 and 2 had some kind of importance.

oldFileName = 'something_2010_03_03.csv';
%# extract the date (it's returned in a cell array
theDate = regexp(oldFileName,'(\d{4}_\d{2}_\d{2})','match');
newFileName = sprintf('newfile_%s.xls',theDate{1});

Older Version with Explanations

I assume that the date in all your files is the same. So your program would go

%# load the files, put the names into a cell array
fileNames = {'file_1_2010_03_03.csv','file_2_2010_03_03.csv','file_3_2010_03_03.csv'};

%# parse the file names for the number and the date
%# This expression looks for the n-digit number (1,2, or 3 in your case) and puts
%# it into the field 'number' in the output structure, and it looks for the date
%# and puts it into the field 'date' in the output structure
%# Specifically, \d finds digits, \d+ finds one or several digits, _\d+_
%# finds one or several digits that are preceded and followed by an underscore
%# _(?<number>\d+)_ finds one or several digits that are preceded and follewed 
%# by an underscore and puts them (as a string) into the field 'number' in the 
%# output structure. The date part is similar, except that regexp looks for 
%# specific numbers of digits
tmp = regexp(fileNames,'_(?<number>\d+)_(?<date>\d{4}_\d{2}_\d{2})','names');
nameStruct = cat(1,tmp{:}); %# regexp returns a cell array. Catenate for ease of use

%# maybe you want to loop, or maybe not (it's not quite clear from the question), but 
%# here's how you'd do with a loop. Anyway, since the information about the filenames
%# is conveniently stored in nameStruct, you can access it any way you want.
for iFile =1:nFiles
   %# do some processing, get the matrix M

   %# and create the output file name
   outputFileX = sprintf('newfileX_%s_%s.xls',nameStruct(iFile).number,nameStruct(iFile).date);
   %# and save
   xlswrite(outputFileX,M)
end

See regular expressions for more details on how to use them. Also, you may be interested in uipickfiles to replace uigetfile.

OTHER TIPS

I don't understand if you want to build the file name based on the date or not. If you just want to change the name of the file you read, you can do this:

filename = 'file_1_2010_03_03.csv';
newfilename = strrep(filename,'file_1_', 'newfile_');
xlswrite(newfilename,M)

UPDATE:

To parse the date from the file name:

dtstr = strrep(filename,'file_1_','');
dtstr = strrep(dtstr,'.csv','');
DT = datenum(dtstr,'yyyy_mm_dd');
disp(datestr(DT))

To build file name based on date (today's for example):

filename = ['file_', datestr(date,'yyyy_mm_dd') '.csv'];

Presumably, all of these files are sitting in a directory somewhere and you'd like to process them in batch. You can use code like this to read the files in a particular directory and find the ones that end in 'csv'. That way, you don't have to change your code at all if you'd like to process a new file -- you just drop it in the directory and run your program.

extension = 'csv';

files = dir();  % e.g. use current directory

% find files with the proper extension
extLength = length(extension);
for k = 1:length(files)
    nameLength = length(files(k).name);
    if nameLength > extLength
        if (files(k).name((nameLength - extLength + 1):nameLength) == extension)
            a(k).name
            % process file here...
        end
    end
end

You can make it more compact by incorporating the regexp processing that Jonas suggested.

If your 3 files from UIGETFILE all have the same date in their name, then you can just use one of them to do the following (after you have processed all your data from the 3 files):

fileName = 'file_1_2010_03_03.csv';          %# One of your 3 file names
data = textscan(fileName,'%s',...            %# Split string at '_' and '.'
                'Delimiter','_.');
fileString = sprintf('_%s_%s_%s.xls',..      %# Make the date part of the name
                     data{1}{(end-3):(end-1)});
xlswrite(['newfileX' fileString],dataX);     %# Output "X" data
xlswrite(['newfileXY' fileString],dataXY);   %# Output "XY" data
xlswrite(['newfileXZ' fileString],dataXZ);   %# Output "XZ" data
xlswrite(['newfileYZ' fileString],dataYZ);   %# Output "YZ" data

The function TEXTSCAN is used to break up the old file name at the points where '_' or '.' characters occur. The function SPRINTF is then used to put the pieces for the date back together.

Licensed under: CC-BY-SA with attribution

Not affiliated with StackOverflow