Tag Archives: R

Survival analysis in R

Last time I showed how you can use SAS to perform survival analysis, but SAS cost money. Many people are going to R because it is just as powerful and you do not have to pay for it.

OK, here I will use the same data and see if we can create the same results.

  1. External data file. The file should have the following data. Notice that I have not figured out how to do the loop like in SAS so we have to spell out each mite. The data here is mite being injected with dsRNA of GFP or another gene and their survival status (1=dead, o=censored, meaning the mite was still alive at the end of the experiment).

trt    age    dead
gfp    24    1
gfp    48    1
gfp    48    1
gfp    96    1
gfp    96    1
gfp    96    1
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
gfp    96    0
rpl8    24    1
rpl8    24    1
rpl8    24    1
rpl8    24    1
rpl8    24    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    48    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    1
rpl8    96    0
rpl8    96    0
rpl8    96    0

save this file as a csv format and name it “varroa.csv” (the default file path is in “My Document” for windows 7).

2. Inside R,

> library(survival)

This command loads the survival package.

> test <- read.csv(file="varroa.csv",head=TRUE, sep=",")

This line reads the file  and stores the data into “test”.

> survdiff(Surv(age,dead)~trt, data=test, rho=0)

 

“rho=0” is used for Log-Rank or Mantel-Haenszel test, while “rho=1” is used for the Peto & Peto modification of the Gehan-Wilcoxon test.

This produces the output, which is identical to the Log-Rank=28.3 outputed by SAS.

          N Observed Expected (O-E)^2/E (O-E)^2/V
trt=gfp  26        6     20.2      9.99      28.3
trt=rpl8 45       42     27.8      7.27      28.3

 Chisq= 28.3  on 1 degrees of freedom, p= 1.01e-07

 

Changing rho=1 produced a Chisq=24.4 (SAS had Wilcoxon 24.3984).

I also tested the same data set using SPSS. Again the same results. so now I can reassured that the three major software all produce the same results.

The next task is to figure out how to produce the same data by a loop like SAS did, so I do not need to generate 200 lines of numbers in excel for the 200 bees that did not die in an experiment…