## Create matrix using pairwise calculations between columns in R

Question

New to R, and in over my head!

I am trying to write code that will combine the following steps:

a) Find the minimum values, per row, between two columns

b) Sum the minimum values found

c) Do this among many columns and construct a pairwise matrix of the results

Steps a & b are easy enough for two columns at a time. Like this:

column1 = c(0.08,   0.20,   0.09,   0.19,   0.25,   0.20,   0.00)
column2 = c(0.07,   0.19,   0.09,   0.21,   0.25,   0.19,   0.00)
ps = data.frame(column1, column2)

sum(pmin(ps\$column1,ps\$column2))

But for step c, I am having difficulty writing a code that will perform this operation for each pairwise column comparison in a dataframe consisting of 7 rows and 32 columns. This is what I've come up with so far:

d <- replicate(32, rnorm(7))
c <- combn(seq_len(ncol(d)),2)
mat1 <-  matrix(0,ncol=32,nrow=32,dimnames=list(colnames(d),colnames(d)))
v1 <- unlist(lapply(seq_len(ncol(c)),function(i) {d1<-d[,c[,i]];    length(which(d1[,1]!=0 & d1[,2]!=0)) }))

mat1[lower.tri(mat1)]<-v1

I am pretty sure my issues lie within the "function" command associated with "v1". But I'm stumped and could really use a bit of help!

Again, my goal is to have a 32x32 matrix of the summed minimum values between each pairwise column comparison.

Does this make sense?

Thank you so much.

Show source

## Answers to Create matrix using pairwise calculations between columns in R ( 2 )

1. I think you could try the following (it is a simplistic approach I have to admit):

column1 = c(0.08,   0.20,   0.09,   0.19,   0.25,   0.20,   0.00)
column2 = c(0.07,   0.19,   0.09,   0.21,   0.25,   0.19,   0.00)
column3 = c(0.05,   0.49,   0.39,   0.1,   0.5,   0.11,   0.01)
ps = data.frame(column1, column2, column3)

res <-matrix(nrow = ncol(ps), ncol = ncol(ps))

for (i in (1:ncol(ps))) {

for (j in (i:ncol(ps))){

res[i,j] <- sum(pmin(ps[,i],ps[,j]))
}

}

In order to make use of the fact that the matrix is symmetrical you can do:

res[lower.tri(res)] <- t(res)[lower.tri(res)]

(One thing to note that I also learnt thanks to @Aaron and his comment is that res[lower.tri(res)] <- res[upper.tri(res)] does not work because R is filling the values by column)

Or alternatively (again thanks to Aaron) you could do (and skip the last step):

for (i in (1:ncol(ps))) {

for (j in (i:ncol(ps))){

res[i,j] <- res[j,i] <- sum(pmin(ps[,i],ps[,j]))
}

}

2. The outer function will do this and keep track of the bookkeeping for you, but you have to pass it a vectorized function.

summin <- Vectorize(function(i, j) sum(pmin(ps[[i]], ps[[j]])))
outer(seq_len(ncol(ps)), seq_len(ncol(ps)), FUN=summin)
##      [,1] [,2]
## [1,] 1.01 0.98
## [2,] 0.98 1.00

I have no idea what's supposed to going on in your v1 code, it doesn't look like you're summing the minimums anymore.

If I was going to loop myself, I'd use expand.grid instead of combn, as then I get the diagonals and don't have to figure out how to populate the two sides of the matrix, though at the expense of doing all the computations twice. (The computer can do it twice faster than I can figure out how to ask it to do only once, anyway.) I'd also just make it as a vector and then convert to a matrix afterwards.

cc <- expand.grid(seq_len(ncol(d)), seq_len(ncol(d)))
out <- sapply(seq_len(nrow(cc)), function(k) {
i <- cc[k,1]
j <- cc[k,2]
sum(pmin(d[[i]],d[[j]]))
})
out <- matrix(out, ncol=ncol(d))