Question

I have the following problem, i have develop a code to search into large files containing data, the process become too slow and even in some computers it consumes all the available computer resources.

nodo=str2num(get(handles.nodo,'string'));                                 
PATHNAME = uigetdir('', 'Selecciona el directorio donde están los bfins');
files = dir(fullfile(PATHNAME,'*.bfin') );                                
curr_folder=pwd;                                                          
cd(PATHNAME);                                                             
archivo={files.name}';                                                    
for i=1:numel(archivo)                                                    
        [fid{i}, errmsg]=fopen(files(i).name)                                 
        disp(errmsg);                                                         
        Datos{i}=textscan(fid{i}, '%s  %f %s %f %s %f ','Headerlines',2);     
        AllNodos{i}=Datos{1,i}{1,2};                                          
        AllTemp{i}=Datos{1,i}{1,4};                                           
end                                                                       
cd(curr_folder)                                                           
for i=1:size(AllNodos,2)                                                  
        sets{i}=cat(2, AllNodos{1,i}, AllTemp{1,i});                          
end                                                                       
for i=1:size(AllNodos,2)                                                  
        vectn{i}=AllNodos{1,i};                                               
        r{i}=find(vectn{i}==nodo);                                            
        Temps{i}=AllTemp{1,i}(r{i});                                          
end  
    %Write Excel File
[FileName, PathName] = uiputfile('*.xlsx', 'Escribe un archivo excel con las    temperaturas...')
savingas=fullfile(PathName,FileName);
a=archivo';
B=cat(1,a,Temps); 
xlswrite(savingas,[B])
e = actxserver ('Excel.Application'); %# open Activex server
ewb = e.Workbooks.Open(savingas); %# open file (enter full path!)
ewb.Worksheets.Item(1).Name = num2str(nodo); %# rename 1st sheet
ewb.Save %# save to the same file
ewb.Close(false)
e.Quit                                                              

What the code really does is to find in to the files the location of a string and then find another variable(just a cntrl+f operation) and reproduce them in an excel sheet any help regarding this would be appreciated.

EDIT---- Thank you very much for all your comments, i came up with the following code that saves a lot of unnecesary time that it used to take by storing variables:

    for i=1:num_archivo
   [fid(i), errmsg]=fopen(files(i).name)
   disp(errmsg);
   Datos=textscan(fid(i), '%s  %f %s %f','delimiter',',','HeaderLines',hl);
   AllNodos=Datos(1,2);
   AllTemp=Datos(1,4);
   for k=1:numel(nodo)
    r{i,k}=find(AllNodos{1,1}==nodo(k));
    Temps{i,k}=AllTemp{1,1}(r{i,k});
   end
   end
Was it helpful?

Solution

textscan itself is very fast so it may be eating CPU, but you're not likely to improve on it performance-wise. However, it looks like you're not performing any sort of pre-allocation for your cell arrays. This means that Matlab may be constantly reallocating memory. Use cell to create an empty cell array of the desired size:

num_archivo = numel(archivo);
Datos = cell(1,num_archivo);
AllNodos = cell(1,num_archivo);
AllTemp = cell(1,num_archivo);
for i = 1:num_archivo
    ...
end
...

(Though Datos should probably not be a cell at @Lazarus pointed out.) You should do the same sort of thing for the other loops with sets, vectn, etc.

One other thing that I'd be sure to do is call fclose(fid); immediately after you're done reading files(i).name so that you don't have a bunch of open file pointers. Also, there seems to be no reason for saving the file id fid to a cell array as you don't use it outside of the loop. Even if you did, a regular vector (allocated with numel(archive) elements) would be better.

OTHER TIPS

My answer would depend on how large your datasets are. But the first change I would make is to set Datos{i} to just Datos since you are only using it in the for loop. This way matlab doesn't have to allocate more space.

The second change I'd make is to include the the cat and find function within the first for loop, so that you can also replace Allnodos{i} with Allnodos and also save on the allocation issue.

This should help with the resource issue; it may not help with the speed issue if textscan is the limiting factor. Using tic and toc before and after textscan will let you know how long it is taking.

It depends what size your "large" files actually have and how many there are. Check your code using the profiler:

profile on;
<run your code>
profile viewer

I'd suspect that fopen and the textscan lines are the evil ones...

Generally in anything concerning reading files, the bottleneck is mostly disk i/o - not CPU. In that case you can't really do a lot except for doing things in parallel maybe, which unfortunately is not that easy in matlab - if you're not willing to pay.

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top