Question

Using R, what is the best way to read a symmetric matrix from a file that omits the upper triangular part. For example,

1.000
.505  1.000
.569  .422  1.000
.602  .467  .926  1.000
.621  .482  .877  .874  1.000
.603  .450  .878  .894  .937  1.000

I have tried read.table, but haven't been successful.

Was it helpful?

Solution

Here's a read.table and loopless and *apply-less solution:

txt <- "1.000
.505  1.000
.569  .422  1.000
.602  .467  .926  1.000
.621  .482  .877  .874  1.000
.603  .450  .878  .894  .937  1.000"
 # Could use clipboard or read this from a file as well.
mat <- data.matrix( read.table(text=txt, fill=TRUE, col.names=paste("V", 1:6))  )
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
> mat
        V1    V2    V3    V4    V5    V6 
[1,] 1.000 0.505 0.569 0.602 0.621 0.603
[2,] 0.505 1.000 0.422 0.467 0.482 0.450
[3,] 0.569 0.422 1.000 0.926 0.877 0.878
[4,] 0.602 0.467 0.926 1.000 0.874 0.894
[5,] 0.621 0.482 0.877 0.874 1.000 0.937
[6,] 0.603 0.450 0.878 0.894 0.937 1.000

OTHER TIPS

I copied your text, and then used tt <- file('clipboard','rt') to import it. For a standard file:

tt <- file("yourfile.txt",'rt')
a <- readLines(tt)
b <- strsplit(a,"  ") #insert delimiter here; can use regex
b <- lapply(b,function(x) {
  x <- as.numeric(x)
  length(x) <- max(unlist(lapply(b,length))); 
  return(x)
})
b <- do.call(rbind,b)
b[is.na(b)] <- 0
#kinda kludgy way to get the symmetric matrix
b <- b + t(b) - diag(b[1,1],nrow=dim(b)[1],ncol=dim(b)[2]

I'm posting but I like Blue Magister's approach wat better. But maybe there's something in this that's of use.

mat <- readLines(n=6)
1.000
.505  1.000
.569  .422  1.000
.602  .467  .926  1.000
.621  .482  .877  .874  1.000
.603  .450  .878  .894  .937  1.000

nmat <- lapply(mat, function(x) unlist(strsplit(x, "\\s+")))
lens <- sapply(nmat, length)
dlen <- max(lens) -lens
bmat <- lapply(seq_along(nmat), function(i) {
    as.numeric(c(nmat[[i]], rep(NA, dlen[i])))
})
mat <- do.call(rbind, bmat)
mat[upper.tri(mat)] <- t(mat)[upper.tri(mat)]
mat

Here is an approach which also works if the dimensions of the matrix are unknown.

# read file as a vector
mat <- scan("file.txt", what = numeric())

# calculate the number of columns (and rows)
ncol <- (sqrt(8 * length(mat) + 1) - 1) / 2

# index of the diagonal values
diag_idx <- cumsum(seq.int(ncol))

# generate split index
split_idx <- cummax(sequence(seq.int(ncol)))
split_idx[diag_idx] <- split_idx[diag_idx] - 1

# split vector into list of rows
splitted_rows <- split(mat, f = split_idx)

# generate matrix
mat_full <- suppressWarnings(do.call(rbind, splitted_rows))
mat_full[upper.tri(mat_full)] <- t(mat_full)[upper.tri(mat_full)]


   [,1]  [,2]  [,3]  [,4]  [,5]  [,6]
0 1.000 0.505 0.569 0.602 0.621 0.603
1 0.505 1.000 0.422 0.467 0.482 0.450
2 0.569 0.422 1.000 0.926 0.877 0.878
3 0.602 0.467 0.926 1.000 0.874 0.894
4 0.621 0.482 0.877 0.874 1.000 0.937
5 0.603 0.450 0.878 0.894 0.937 1.000
Licensed under: CC-BY-SA with attribution
Not affiliated with StackOverflow
scroll top