質問

I have multiple excel files like the one shown below (hourly data). I want to obtain the daily average (e.g. from 17:00 to 16:00 of next day).I only know a little Matlab. Currently my solution is below but it got some problem.

  1. Read each excel file and store the data in a variable.
  2. Merge all the data in a single variable.
  3. Find the row number (n) with 17:00. For this, I obtained the datenum of each row and then just search (find) the datenum corresponding to 17:00.
  4. Average the data between each two rows (n(i)).

It works fine if the raw data is right. But the problems are below.

  1. Some data at 17:00 are missing. So there will be error when it try to find the row with 17:00.
  2. When I have more than 100 excel file, it runs very very slow.

Can any one give some suggestion how to solve the problem? I prefer to use Matlab since it is the only tool I know a little. Many thanks!

Edit 1: Code provided

Below is the code to obtain daily average. I have combined all the data in the file "summary_file.xls" (the code of this part is not shown.

As the example table shown below.

  1. The average for the time between 17:00, 1 Sep. and 2 Sep. has no problem.
  2. For days with data missing, I would like to obtain the average for all available data before 17:00.
  3. Example: since the data of 17:00 3 Sep. is missing, the average for 2-3 Sep. cannot be obtained. For such situation, I would like to obtain the average for all data before 17:00 (i.e. untill 9:00 3 Sep.)
  4. Then for 3-4 Sep., I would like to calculate the average starting from 18:00.
  5. For days with all data missing, just make all data as 0 or marked as not available.

[num,txt,raw] = xlsread('summary_file.xls'); % read file

date_num = num(:,11); % read the column containing date number
starting = '2003/09/05-17:00'; %  starting time
starting = datenum(starting,formatIn_2); % convert starting time to date number
ending = '2003/09/09-17:00';%  ending time
ending = datenum(ending,formatIn_2); %convert ending time to date number

s = starting:1:ending; % All date number with 17:00

%% find the row number with 17:00
for i = 1:ending-starting+1
    [a(i) b(i)] = find(date_num==s(i));
end
%% Store the averaged data in variable p
for i = 1:ending-starting
    p(i,:) = mean(num(a(i):a(i+1)-1,:));
end

Sample input data -

+------+-------+----------+-------+-------+-------+-------+-------+-------+
| 2003 | 1-Sep |  15:00   | 100.2 |  29   | 70.5  | 3.903 |  728  |   0   |
+------+-------+----------+-------+-------+-------+-------+-------+-------+
| 2003 | 1-Sep | 16:00    | 100.1 | 29.31 | 70.7  | 4.328 | 611.8 | 0     |
| 2003 | 1-Sep | 17:00    | 100.1 | 29.64 | 67.06 | 3.719 | 434.8 | 0     |
| 2003 | 1-Sep | 18:00    | 100.1 | 29.67 | 64.4  | 3.005 | 172.4 | 0     |
| 2003 | 1-Sep | 19:00    | 100.1 | 29.06 | 68.22 | 2.292 | 19.89 | 0     |
| 2003 | 1-Sep | 20:00    | 100.2 | 28.43 | 74.7  | 2.436 | 0.428 | 0     |
| 2003 | 1-Sep | 21:00    | 100.2 | 27.92 | 76.2  | 1.931 | 0.006 | 0     |
| 2003 | 1-Sep | 22:00    | 100.3 | 27.67 | 77.3  | 1.825 | 0.007 | 0     |
| 2003 | 1-Sep | 23:00    | 100.4 | 27.55 | 77.9  | 1.622 | 0.007 | 0     |
| 2003 | 1-Sep | 24:00:00 | 100.4 | 27.69 | 77.8  | 0.863 | 0.008 | 0     |
| 2003 | 2-Sep | 1:00     | 100.4 | 27.55 | 78.3  | 0.879 | 0.008 | 0     |
| 2003 | 2-Sep | 2:00     | 100.3 | 27.05 | 82.1  | 1     | 0.016 | 0.762 |
| 2003 | 2-Sep | 3:00     | 100.3 | 26.41 | 86.8  | 0.805 | 0.006 | 0     |
| 2003 | 2-Sep | 4:00     | 100.2 | 26.6  | 85.5  | 0.522 | 0.011 | 0.508 |
| 2003 | 2-Sep | 5:00     | 100.2 | 25.53 | 83.8  | 2.158 | 0.011 | 0     |
| 2003 | 2-Sep | 6:00     | 100.3 | 24.5  | 86.6  | 2.711 | 0.016 | 0     |
| 2003 | 2-Sep | 7:00     | 100.4 | 24.85 | 86.9  | 2.562 | 0.016 | 4.318 |
| 2003 | 2-Sep | 8:00     | 100.6 | 21.11 | 94    | 8.15  | 9.96  | 26.67 |
| 2003 | 2-Sep | 9:00     | 100.6 | 22.23 | 91.9  | 5.065 | 31.67 | 0.254 |
| 2003 | 2-Sep | 10:00    | 100.6 | 23.51 | 88.8  | 5.742 | 39.16 | 0.254 |
| 2003 | 2-Sep | 11:00    | 100.6 | 24    | 87.7  | 4.494 | 97.8  | 0     |
| 2003 | 2-Sep | 12:00    | 100.6 | 24.69 | 85.3  | 4.709 | 142.2 | 0     |
| 2003 | 2-Sep | 13:00    | 100.5 | 25.57 | 82.8  | 5.66  | 259.1 | 0     |
| 2003 | 2-Sep | 14:00    | 100.4 | 25.69 | 81.9  | 5.634 | 157.5 | 0     |
| 2003 | 2-Sep | 15:00    | 100.3 | 26.18 | 79.1  | 5.564 | 308.2 | 0     |
| 2003 | 2-Sep | 16:00    | 100.3 | 26.08 | 78.3  | 6.283 | 135.3 | 0     |
| 2003 | 2-Sep | 17:00    | 100.3 | 25.75 | 81.2  | 4.595 | 55.68 | 0.762 |
| 2003 | 2-Sep | 18:00    | 100.3 | 25.01 | 84.5  | 4.843 | 55.21 | 1.778 |
| 2003 | 2-Sep | 19:00    | 100.3 | 25.15 | 86.1  | 1.433 | 22.43 | 0     |
| 2003 | 2-Sep | 20:00    | 100.3 | 24.98 | 86.1  | 1.985 | 0.301 | 0     |
| 2003 | 2-Sep | 21:00    | 100.3 | 24.75 | 85.1  | 0.712 | 0.009 | 0     |
| 2003 | 2-Sep | 22:00    | 100.4 | 24.76 | 85.3  | 1.546 | 0.011 | 0     |
| 2003 | 2-Sep | 23:00    | 100.5 | 24.92 | 84.5  | 1.186 | 0.008 | 0     |
| 2003 | 2-Sep | 24:00:00 | 100.5 | 24.96 | 84.9  | 1.31  | 0.007 | 0     |
| 2003 | 3-Sep | 1:00     | 100.5 | 25    | 85.3  | 0.702 | 0.012 | 0     |
| 2003 | 3-Sep | 2:00     | 100.5 | 24.99 | 86    | 0.35  | 0.017 | 0     |
| 2003 | 3-Sep | 3:00     | 100.4 | 25.07 | 86.1  | 0.69  | 0.008 | 0     |
| 2003 | 3-Sep | 4:00     | 100.3 | 24.92 | 86.5  | 1.347 | 0.011 | 0     |
| 2003 | 3-Sep | 5:00     | 100.3 | 25.27 | 85.5  | 0.834 | 0.009 | 0     |
| 2003 | 3-Sep | 6:00     | 100.3 | 24.97 | 86.9  | 0.627 | 0.012 | 0     |
| 2003 | 3-Sep | 7:00     | 100.3 | 24.8  | 87.7  | 0.755 | 0.108 | 0     |
| 2003 | 3-Sep | 8:00     | 100.4 | 25.54 | 85    | 0.202 | 37.11 | 0     |
| 2003 | 3-Sep | 9:00     | 100.4 | 26.72 | 81    | 1.853 | 219.4 | 0     |
| 2003 | 3-Sep | 18:00    | 100.2 | 29.67 | 56.39 | 2.856 | 456.2 | 0     |
| 2003 | 3-Sep | 19:00    | 100.2 | 30.17 | 53.66 | 2.204 | 266   | 0     |
+------+-------+----------+-------+-------+-------+-------+-------+-------+
役に立ちましたか?

解決

The following code might serve your requirements. Again, since you are interested in daily averages, the output would be much be smaller, as it's over a 24 hour period and I am assuming you need that. Also, it takes care of your missing data condition.

Code -

%% Setup params and data
start_hour = 17;
[num,txt,raw] = xlsread('summary_file.xls');

datenums = NaN(size(num,1),1);
for count = 1:size(num,1)
    year1 = cell2mat(raw(count,1));
    date1 = cell2mat(raw(count,2));
    time1 = cell2mat(raw(count,3));
    date_str = strcat( num2str(year1) , '-', date1 );
    datenums(count) = datenum( date_str, 'yyyy-dd-mmm')  + time1(:);
end

%% Take care of conditions
firstdata_start_hour = round(24*cell2mat(raw(1,3)));
if firstdata_start_hour > 17
    start1 =  floor(datenums(1))  + (start_hour/24);
elseif firstdata_start_hour < 17
    start1 =  floor(datenums(1))-1  + (start_hour/24);
else
    start1 = datenums(1);
end

ind1 = floor(datenums-start1) + 1;

%% Start Processing
num_items = size(num,2)-3;
num_days = max(ind1);

bins = NaN(num_days,num_items);
for count1 = 1:size(bins,2)
    for count2 = 1:size(bins,1)
        bins(count2,count1) = mean(num(find(ind1==count2),count1+3));
    end
end
bins(isnan(bins))=0;
average_nums = bins

Output of the averages for some data, compiled by OP on my request -

+-----------+---------------+------+-------------+------------+-----------+--------------+
|   Date    | Pressure(kPa) | Temp | Humidity(%) | W-spd(m/s) | Radiation | Rainfall(mm) |
+-----------+---------------+------+-------------+------------+-----------+--------------+
| 8/10/2009 | 100.1         | 25.8 | 79.1        | 1.4        | 82.6      | 1.7          |
| 8/11/2009 | 100.2         | 27.5 | 75.7        | 1.9        | 173.8     | 0.0          |
| 8/12/2009 | 100.1         | 28.4 | 73.5        | 2.1        | 177.1     | 0.0          |
| 8/13/2009 | 100.0         | 28.4 | 73.2        | 2.5        | 197.4     | 0.0          |
| 8/14/2009 | 100.0         | 28.5 | 73.5        | 2.2        | 151.2     | 0.0          |
| 8/15/2009 | 100.2         | 27.3 | 75.4        | 1.2        | 96.2      | 0.4          |
| 8/16/2009 | 100.2         | 27.1 | 75.5        | 1.4        | 122.6     | 0.0          |
| 8/17/2009 | 100.2         | 27.2 | 75.7        | 1.5        | 158.3     | 0.2          |
| 8/18/2009 | 100.2         | 27.5 | 72.2        | 1.4        | 186.4     | 0.0          |
| 8/19/2009 | 100.3         | 28.4 | 68.4        | 1.9        | 186.9     | 0.0          |
| 8/20/2009 | 100.3         | 28.1 | 69.2        | 2.0        | 184.8     | 0.0          |
| 8/21/2009 | 100.3         | 26.5 | 75.8        | 1.3        | 122.3     | 0.6          |
+-----------+---------------+------+-------------+------------+-----------+--------------+

For multiple excel files, you have to loop over all such files in a loop. There doesn't appear to have any other way out.

他のヒント

When you use Matlab's xlsread it opens and closes the Excel COM Server, so using this with multiple files will slow down execution. Read this post at Matlab Central which explains how can you avoid opening and closing Excel COM Server.

As for the missing data, maybe you could interpolate 17:00 with the data from find(time<17:00,'last') and find(time>17:00,'first')

I love using MATLAB but if you don't mind me asking, have you considered using the R statistical software Package? Its really cool in my opinion you don't have to agree with me.

It is very easy to do the tasks that you are asking for. R can even handle missing data and easily work around it and can read excel files easily and search for the data you want.

Where to get R:

http://cran.r-project.org/

ライセンス: CC-BY-SA帰属
所属していません StackOverflow
scroll top