## calculate ratio of two factors for each visit using dplyr

Question

I'd like to use `dplyr` to calculate a vector of ratios of two factors for each visit of a subject. The mock data can be created below:

``````subj = c(rep("A", 10), rep("B", 4), rep("C", 6))
vist = c(rep(c("C0", "C1", "C2", "C3", "C4"), each=2),
rep(c("C0", "C1"), each=2),
rep(c("C0", "C1", "C2"), each=2))
factor = c(rep(c("L", "N"), 5), rep(c("L", "N"), 2), rep(c("L", "N"), 3))
set.seed(111)
aval = round(rnorm(n = 20, 0, 1), 2)

dat = data.frame(subj, vist, factor, aval, stringsAsFactors = FALSE)
dat
``````

Which looks like:

``````   subj vist factor  aval
1     A   C0      L  0.24
2     A   C0      N -0.33
3     A   C1      L -0.31
4     A   C1      N -2.30
5     A   C2      L -0.17
6     A   C2      N  0.14
7     A   C3      L -1.50
8     A   C3      N -1.01
9     A   C4      L -0.95
10    A   C4      N -0.49
11    B   C0      L -0.17
12    B   C0      N -0.41
13    B   C1      L  1.85
14    B   C1      N  0.39
15    C   C0      L  0.80
16    C   C0      N -1.57
17    C   C1      L -0.09
18    C   C1      N -0.36
19    C   C2      L -1.19
20    C   C2      N  0.36
``````

What is needed is the ratio of value (`aval`) for the factors (`factor`) "N" over "L", for each subject (`subj`) for each visit (`vist`). For example, the first ratio value would be `-1.375`, coming from `-0.33/0.24`. Thanks!

Show source

## Answers to calculate ratio of two factors for each visit using dplyr ( 4 )

1. You can reshape the data with `spread` from the tidyr package, then it's easy to calculate a new column:

``````library(tidyr)
library(dplyr)
dat %>%
mutate(ratio = N/L)

subj vist     L     N      ratio
1     A   C0  0.24 -0.33 -1.3750000
2     A   C1 -0.31 -2.30  7.4193548
3     A   C2 -0.17  0.14 -0.8235294
4     A   C3 -1.50 -1.01  0.6733333
5     A   C4 -0.95 -0.49  0.5157895
6     B   C0 -0.17 -0.41  2.4117647
7     B   C1  1.85  0.39  0.2108108
8     C   C0  0.80 -1.57 -1.9625000
9     C   C1 -0.09 -0.36  4.0000000
10    C   C2 -1.19  0.36 -0.3025210
``````
2. If there is only one `N` and `L` per group, you can do:

``````dat %>%
group_by(subj, vist) %>%
summarise(ratio = aval[factor == "N"]/aval[factor == "L"])

#Source: local data frame [10 x 3]
#Groups: subj [?]

#    subj  vist      ratio
#   <chr> <chr>      <dbl>
#1      A    C0 -1.3750000
#2      A    C1  7.4193548
#3      A    C2 -0.8235294
#4      A    C3  0.6733333
#5      A    C4  0.5157895
#6      B    C0  2.4117647
#7      B    C1  0.2108108
#8      C    C0 -1.9625000
#9      C    C1  4.0000000
#10     C    C2 -0.3025210
``````
3. In base R, you can use `aggregate` to construct the summary of the ratios or `ave` to fill in these ratios into your original data.frame. This assumes that the data.frame is regular and ordered correctly.

``````aggregate(dat\$aval, dat[c("subj", "vist")], FUN=function(x) x[2] / x[1])
subj vist          x
1     A   C0 -1.3750000
2     B   C0  2.4117647
3     C   C0 -1.9625000
4     A   C1  7.4193548
5     B   C1  0.2108108
6     C   C1  4.0000000
7     A   C2 -0.8235294
8     C   C2 -0.3025210
9     A   C3  0.6733333
10    A   C4  0.5157895
``````

or

``````dat\$rat <- ave(dat\$aval, dat\$subj, dat\$vist, FUN=function(x) x[2] / x[1])
``````

will add it as a variable.

4. If these are in the same order and have exactly one pair per each 'subj', 'vist'

``````dat\$ratio <- rep(dat\$aval[c(FALSE, TRUE)]/dat\$aval[c( TRUE, FALSE)], each = 2)
dat\$ratio
#[1] -1.3750000 -1.3750000  7.4193548  7.4193548 -0.8235294 -0.8235294
#[7]  0.6733333  0.6733333  0.5157895  0.5157895  2.4117647  2.4117647
#[13]  0.2108108  0.2108108 -1.9625000 -1.9625000  4.0000000  4.0000000
#[19] -0.3025210 -0.3025210
``````