You might be able to use a combination of readLines
and grep
/grepl
to help you figure out which lines to read.
Here's an example. The first part is just to make up some sample data.
Create some sample data.
x <- tempfile(pattern="myFile", fileext=".csv")
cat("junk comment strings",
"",
"another junk comment string",
"This,Is,My,Data",
"1,2,3,4",
"5,6,7,8",
"",
"back to comments",
"This,Is,My,Data",
"12,13,14,15",
"15,16,17,18",
"19,20,21,22", file = x, sep = "\n")
Step 1: Use readLines()
to get the data into R
In this step, we'll also drop the lines that we are not interested in. The logic is that we are only interested in lines where there is information in the form of (for a four-column dataset):
something comma something comma something comma something
## Read the data into R
## Replace "con" with the actual path to your file
A <- readLines(con = x)
## Find and extract the lines where there are "data".
## My example dataset only has 4 columns.
## Modify for your actual dataset.
A <- A[grepl(paste(rep(".*", 4), collapse=","), A)]
Step 2: Identify the data ranges
## Identify the header rows. -1 for use with read.csv
HeaderRows <- grep("^This,Is", A)-1
## Identify the number of rows per data group
N <- c(diff(HeaderRows)-1, length(A)-1)
Step 3: Read the data in
Use the data range information to specify how many lines to skip before reading, and how many lines to read.
myData <- lapply(seq_along(HeaderRows),
function(x) read.csv(text = A, header = TRUE,
nrows = N[x], skip = HeaderRows[x]))
myData
# [[1]]
# This Is My Data
# 1 1 2 3 4
# 2 5 6 7 8
#
# [[2]]
# This Is My Data
# 1 12 13 14 15
# 2 15 16 17 18
# 3 19 20 21 22
If you want all of these in one data.frame
instead of a list
, use:
final <- do.call(rbind, myData)