Question

I am trying to read this file (3.8mb) using its fixed-width structure as described in the following link.

This command:

a <- read.fwf('~/ccsl.txt',c(2,30,6,2,30,8,10,11,6,8))

Produces an error:

line 37 did not have 10 elements

After replicating the issue with different values of the skip option, I figured that the lines causing the problem all contain the "#" symbol.

Is there any way to get around it?

Was it helpful?

Solution

As @jverzani already commented, this problem is probably the fact that the # sign often used as a character to signal a comment. Setting the comment.char input argument of read.fwf to something other than # could fix the problem. I'll leave my answer below as a more general case that you can use on any character that causes problems (e.g. the 's in the Dutch city name 's Gravenhage).

I've had this problem occur with other symbols. The approach I took was to simply replace the # by either nothing, or by a character which does not generate the error. In my case it was no problem to simply replace the character, but this might not be possible in your case.

So my approach would be to delete the symbol that generates the error, or replace by another character. This can be done using a text editor (find and replace), in an R script, or using some linux tools called grep and sed. If you want to do this in an R script, use scan or readLines to read the lines. Once the text is in memory, you can use sub to replace the character.

If you cannot replace the character, I would try the following approach: replace the character by a character that does not generate an error, read it into R using read.fwf, and finally replace the character by the # character.

OTHER TIPS

Following up on the answer above: to get all characters to be read as literals, use both comment.char="" and quote="" (the latter takes care of @PaulHiemstra's problem with single-quotes in Dutch proper nouns) in the call to read.fwf (this is documented in ?read.table).

Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top