How to split unequal columns in R
-
10-12-2019 - |
Question
I have a data set that should contain 14 columns, but when I read it into R it presents as two columns, with the latter columns reading in as one, and are all separated by "."
I read in using:
dat <- read.table ("/data/GER.female.RAWMACH", header = F, sep = "\t")
Below I have provided the output:
head (dat)
V1
TRAIT
CASE
CASE
CASE
CASE
CASE
CASE
V2
MARKER..........ALLELES..FREQ1....RSQR...EFFECT1..OR......STDERR..WALDCHISQ.PVALUE.....LRCHISQ.LRPVAL.NCASES.NCONTROLS
rs7 T A .9104 .0001 -3.944 0.019 19.634 0.0403 0.8408 0.0403 0.8409 260 446
rs6 A C .9114 .0002 -2.552 0.078 14.349 0.0316 0.8589 0.0316 0.8589 260 446
rs9 C T .8444 .0001 2.772 15.985 15.076 0.0338 0.8541 0.0338 0.8542 260 446
rs5 G A .9164 .0001 -3.683 0.025 18.039 0.0417 0.8382 0.0417 0.8383 260 446
rs2 T C .5168 .0001 -2.466 0.085 10.811 0.0520 0.8195 0.0520 0.8196 260 446
rs1 T G .8229 .0002 -1.727 0.178 12.241 0.0199 0.8878 0.0199 0.8878 260 446
I have tried a few things (rewriting the table, colsplit) with no success. What am I missing?
I appreciate any suggestions you may have!
Solution
You thought you had a tab separated file, but it wasn't. You also DO have a header. Just use the default white-space separator by dropping the sep="\t"
and setting header=TRUE
.
OTHER TIPS
It's hard to say for sure without more information, but I'm pretty confident that the best way to solve this will be through loading the table properly in the first place. Unless the actual structure of the data that you're loading is in the form that you're getting, you're loading it wrong; look at the documentation for read.table
and related methods, in particular the sep
and header
arguments. I'm guessing this will clear up your issue with the data import without requiring after-the-fact cleanup.