Survival analysis in R: time-saving method 2

This blog continues from last time, in trying to generate “long data form” required by R (or any survival analysis, for that matter, such as SAS or SPSS), from “short data from” which is easier to input and can reduce errors.

Using “times=N” last time was not very neat, so this is another method, looping a second time within the 1 to 8 row in first loop.  No need to use data.frame, for example, and transpose a matrix.

sa-short-data.csv contains the following data (only 8 lines!).

trt age N censor
gfp  24    1      1
gfp  48    2      1
gfp  96    3      1
gfp  96   20      0
rpl8  24    5      1
rpl8  48   12      1
rpl8  96   25      1
rpl8  96    3      0


test < - read.csv(file="sa-short-data.csv",head=TRUE, sep=",")

test2=NULL

for (i in 1:nrow(test))
{
for (j in 1:test[i,3]) test2=rbind(test2, test[i,c(1,2,4)]) 
#loops for "N" times (element 3) in test and keeps rows of 1, 2 and 4 in the new matrix test2.
}

test2 now produces the correct output:
>test2
trt age censor
1    gfp  24      1
2    gfp  48      1
21   gfp  48      1
3    gfp  96      1
31   gfp  96      1
32   gfp  96      1
4    gfp  96      0
41   gfp  96      0
42   gfp  96      0
43   gfp  96      0
44   gfp  96      0
45   gfp  96      0
46   gfp  96      0
47   gfp  96      0
48   gfp  96      0
49   gfp  96      0
410  gfp  96      0
411  gfp  96      0
412  gfp  96      0
413  gfp  96      0
414  gfp  96      0
415  gfp  96      0
416  gfp  96      0
417  gfp  96      0
418  gfp  96      0
419  gfp  96      0
5   rpl8  24      1
51  rpl8  24      1
52  rpl8  24      1
53  rpl8  24      1
54  rpl8  24      1
6   rpl8  48      1
61  rpl8  48      1
62  rpl8  48      1
63  rpl8  48      1
64  rpl8  48      1
65  rpl8  48      1
66  rpl8  48      1
67  rpl8  48      1
68  rpl8  48      1
69  rpl8  48      1
610 rpl8  48      1
611 rpl8  48      1
7   rpl8  96      1
71  rpl8  96      1
72  rpl8  96      1
73  rpl8  96      1
74  rpl8  96      1
75  rpl8  96      1
76  rpl8  96      1
77  rpl8  96      1
78  rpl8  96      1
79  rpl8  96      1
710 rpl8  96      1
711 rpl8  96      1
712 rpl8  96      1
713 rpl8  96      1
714 rpl8  96      1
715 rpl8  96      1
716 rpl8  96      1
717 rpl8  96      1
718 rpl8  96      1
719 rpl8  96      1
720 rpl8  96      1
721 rpl8  96      1
722 rpl8  96      1
723 rpl8  96      1
724 rpl8  96      1
8   rpl8  96      0
81  rpl8  96      0
82  rpl8  96      0

except the first column is not 1 to 71 but these weird numbers. It does not affect the survival analysis though.

>survdiff(Surv(age,censor)~trt, data=test2, rho=0)
Call:
survdiff(formula = Surv(age, censor) ~ trt, data = test2, rho = 0)

N Observed Expected (O-E)^2/E (O-E)^2/V
trt=gfp  26        6     20.2      9.99      28.3
trt=rpl8 45       42     27.8      7.27      28.3

Chisq= 28.3  on 1 degrees of freedom, p= 1.01e-07 
Author: Zachary Huang

Leave a Reply

Your email address will not be published. Required fields are marked *