I have to read in about 300 individual CSVs. I have managed to automate the process using a loop and structured CSV names. However each CSV has 14-17 lines of rubbish at the start and it varies randomly so hard coding a 'skip' parameter in the read.table command won't work. The column names and number of columns is the same for each CSV.
Here is an example of what I am up against:
QUICK STATISTICS:
Directory: Data,,,,
File: Final_Comp_Zn_1
Selection: SEL{Ox*1000+Doma=1201}
Weight: None,,,
,,Variable: AG,,,
Total Number of Samples: 450212 Number of Selected Samples: 277
Statistics
VARIABLE,Min slice Y(m),Max slice Y(m),Count,Minimum,Maximum,Mean,Std.Dev.,Variance,Total Samples in Domain,Active Samples in Domain AG,
6780.00, 6840.00, 7, 3.0000, 52.5000, 23.4143, 16.8507, 283.9469, 10, 10 AG,
6840.00, 6900.00, 4, 4.0000, 5.5000, 4.9500, 0.5766, 0.3325, 13, 13 AG,
6900.00, 6960.00, 16, 1.0000, 37.0000, 8.7625, 9.0047, 81.0848, 29, 29 AG,
6960.00, 7020.00, 58, 3.0000, 73.5000, 10.6931, 11.9087, 141.8172, 132, 132 AG,
7020.00, 7080.00, 23, 3.0000, 104.5000, 15.3435, 23.2233, 539.3207, 23, 23 AG,
7080.00, 7140.00, 33, 1.0000, 15.4000, 3.8152, 2.8441, 8.0892, 35, 35 AG,
Basically I want to read from the line VARIABLE,Min slice Y(m),Max slice Y(m),...
. I can think of a few solutions but I don't know how I would go about programming it. Is there anyway I can:
- Read the CSV first and somehow work out how many lines of rubbish there is and then re-read it and specify the correct number of lines to skip? Or
- Tell
read.table
to start reading when it finds the column names (since these are the same for each CSV) and ignore everything prior to that?
I think solution (2) would be the most appropriate, but I am open to any suggestions!