This blog continues from last time, in trying to generate “long data form” required by R (or any survival analysis, for that matter, such as SAS or SPSS), from “short data from” which is easier to input and can reduce errors.
Using “times=N” last time was not very neat, so this is another method, looping a second time within the 1 to 8 row in first loop. No need to use data.frame, for example, and transpose a matrix.
sa-short-data.csv contains the following data (only 8 lines!).
trt age N censor
gfp 24 1 1
gfp 48 2 1
gfp 96 3 1
gfp 96 20 0
rpl8 24 5 1
rpl8 48 12 1
rpl8 96 25 1
rpl8 96 3 0
test < - read.csv(file="sa-short-data.csv",head=TRUE, sep=",")
test2=NULL
for (i in 1:nrow(test))
{
for (j in 1:test[i,3]) test2=rbind(test2, test[i,c(1,2,4)])
#loops for "N" times (element 3) in test and keeps rows of 1, 2 and 4 in the new matrix test2.
}
test2 now produces the correct output:
>test2
trt age censor
1 gfp 24 1
2 gfp 48 1
21 gfp 48 1
3 gfp 96 1
31 gfp 96 1
32 gfp 96 1
4 gfp 96 0
41 gfp 96 0
42 gfp 96 0
43 gfp 96 0
44 gfp 96 0
45 gfp 96 0
46 gfp 96 0
47 gfp 96 0
48 gfp 96 0
49 gfp 96 0
410 gfp 96 0
411 gfp 96 0
412 gfp 96 0
413 gfp 96 0
414 gfp 96 0
415 gfp 96 0
416 gfp 96 0
417 gfp 96 0
418 gfp 96 0
419 gfp 96 0
5 rpl8 24 1
51 rpl8 24 1
52 rpl8 24 1
53 rpl8 24 1
54 rpl8 24 1
6 rpl8 48 1
61 rpl8 48 1
62 rpl8 48 1
63 rpl8 48 1
64 rpl8 48 1
65 rpl8 48 1
66 rpl8 48 1
67 rpl8 48 1
68 rpl8 48 1
69 rpl8 48 1
610 rpl8 48 1
611 rpl8 48 1
7 rpl8 96 1
71 rpl8 96 1
72 rpl8 96 1
73 rpl8 96 1
74 rpl8 96 1
75 rpl8 96 1
76 rpl8 96 1
77 rpl8 96 1
78 rpl8 96 1
79 rpl8 96 1
710 rpl8 96 1
711 rpl8 96 1
712 rpl8 96 1
713 rpl8 96 1
714 rpl8 96 1
715 rpl8 96 1
716 rpl8 96 1
717 rpl8 96 1
718 rpl8 96 1
719 rpl8 96 1
720 rpl8 96 1
721 rpl8 96 1
722 rpl8 96 1
723 rpl8 96 1
724 rpl8 96 1
8 rpl8 96 0
81 rpl8 96 0
82 rpl8 96 0
except the first column is not 1 to 71 but these weird numbers. It does not affect the survival analysis though.
>survdiff(Surv(age,censor)~trt, data=test2, rho=0) Call: survdiff(formula = Surv(age, censor) ~ trt, data = test2, rho = 0) N Observed Expected (O-E)^2/E (O-E)^2/V trt=gfp 26 6 20.2 9.99 28.3 trt=rpl8 45 42 27.8 7.27 28.3 Chisq= 28.3 on 1 degrees of freedom, p= 1.01e-07
