calculate ratio of two factors for each visit using dplyr

Question

I'd like to use dplyr to calculate a vector of ratios of two factors for each visit of a subject. The mock data can be created below:

subj = c(rep("A", 10), rep("B", 4), rep("C", 6))
vist = c(rep(c("C0", "C1", "C2", "C3", "C4"), each=2),
         rep(c("C0", "C1"), each=2),
         rep(c("C0", "C1", "C2"), each=2))
factor = c(rep(c("L", "N"), 5), rep(c("L", "N"), 2), rep(c("L", "N"), 3))
set.seed(111)
aval = round(rnorm(n = 20, 0, 1), 2)

dat = data.frame(subj, vist, factor, aval, stringsAsFactors = FALSE)
dat

Which looks like:

   subj vist factor  aval
1     A   C0      L  0.24
2     A   C0      N -0.33
3     A   C1      L -0.31
4     A   C1      N -2.30
5     A   C2      L -0.17
6     A   C2      N  0.14
7     A   C3      L -1.50
8     A   C3      N -1.01
9     A   C4      L -0.95
10    A   C4      N -0.49
11    B   C0      L -0.17
12    B   C0      N -0.41
13    B   C1      L  1.85
14    B   C1      N  0.39
15    C   C0      L  0.80
16    C   C0      N -1.57
17    C   C1      L -0.09
18    C   C1      N -0.36
19    C   C2      L -1.19
20    C   C2      N  0.36

What is needed is the ratio of value (aval) for the factors (factor) "N" over "L", for each subject (subj) for each visit (vist). For example, the first ratio value would be -1.375, coming from -0.33/0.24. Thanks!


Show source
| R   | dplyr   2017-01-06 15:01 4 Answers

Answers ( 4 )

  1. 2017-01-06 16:01

    You can reshape the data with spread from the tidyr package, then it's easy to calculate a new column:

    library(tidyr)
    library(dplyr)
    dat %>%
      spread(factor, aval) %>%
      mutate(ratio = N/L)
    
    
       subj vist     L     N      ratio
    1     A   C0  0.24 -0.33 -1.3750000
    2     A   C1 -0.31 -2.30  7.4193548
    3     A   C2 -0.17  0.14 -0.8235294
    4     A   C3 -1.50 -1.01  0.6733333
    5     A   C4 -0.95 -0.49  0.5157895
    6     B   C0 -0.17 -0.41  2.4117647
    7     B   C1  1.85  0.39  0.2108108
    8     C   C0  0.80 -1.57 -1.9625000
    9     C   C1 -0.09 -0.36  4.0000000
    10    C   C2 -1.19  0.36 -0.3025210
    
  2. 2017-01-06 16:01

    If there is only one N and L per group, you can do:

    dat %>% 
           group_by(subj, vist) %>% 
           summarise(ratio = aval[factor == "N"]/aval[factor == "L"])
    
    #Source: local data frame [10 x 3]
    #Groups: subj [?]
    
    #    subj  vist      ratio
    #   <chr> <chr>      <dbl>
    #1      A    C0 -1.3750000
    #2      A    C1  7.4193548
    #3      A    C2 -0.8235294
    #4      A    C3  0.6733333
    #5      A    C4  0.5157895
    #6      B    C0  2.4117647
    #7      B    C1  0.2108108
    #8      C    C0 -1.9625000
    #9      C    C1  4.0000000
    #10     C    C2 -0.3025210
    
  3. 2017-01-06 16:01

    In base R, you can use aggregate to construct the summary of the ratios or ave to fill in these ratios into your original data.frame. This assumes that the data.frame is regular and ordered correctly.

    aggregate(dat$aval, dat[c("subj", "vist")], FUN=function(x) x[2] / x[1])
       subj vist          x
    1     A   C0 -1.3750000
    2     B   C0  2.4117647
    3     C   C0 -1.9625000
    4     A   C1  7.4193548
    5     B   C1  0.2108108
    6     C   C1  4.0000000
    7     A   C2 -0.8235294
    8     C   C2 -0.3025210
    9     A   C3  0.6733333
    10    A   C4  0.5157895
    

    or

    dat$rat <- ave(dat$aval, dat$subj, dat$vist, FUN=function(x) x[2] / x[1])
    

    will add it as a variable.

  4. 2017-01-06 16:01

    If these are in the same order and have exactly one pair per each 'subj', 'vist'

    dat$ratio <- rep(dat$aval[c(FALSE, TRUE)]/dat$aval[c( TRUE, FALSE)], each = 2)
    dat$ratio
    #[1] -1.3750000 -1.3750000  7.4193548  7.4193548 -0.8235294 -0.8235294
    #[7]  0.6733333  0.6733333  0.5157895  0.5157895  2.4117647  2.4117647
    #[13]  0.2108108  0.2108108 -1.9625000 -1.9625000  4.0000000  4.0000000
    #[19] -0.3025210 -0.3025210
    
◀ Go back