This blog continues from last time, in trying to generate “long data form” required by R (or any survival analysis, for that matter, such as SAS or SPSS), from “short data from” which is easier to input and can reduce errors.
Using “times=N” last time was not very neat, so this is another method, looping a second time within the 1 to 8 row in first loop. No need to use data.frame, for example, and transpose a matrix.
sa-short-data.csv contains the following data (only 8 lines!).
trt age N censor
gfp 24 1 1
gfp 48 2 1
gfp 96 3 1
gfp 96 20 0
rpl8 24 5 1
rpl8 48 12 1
rpl8 96 25 1
rpl8 96 3 0
test < - read.csv(file="sa-short-data.csv",head=TRUE, sep=",") test2=NULL for (i in 1:nrow(test)) { for (j in 1:test[i,3]) test2=rbind(test2, test[i,c(1,2,4)]) #loops for "N" times (element 3) in test and keeps rows of 1, 2 and 4 in the new matrix test2. } test2 now produces the correct output: >test2 trt age censor 1 gfp 24 1 2 gfp 48 1 21 gfp 48 1 3 gfp 96 1 31 gfp 96 1 32 gfp 96 1 4 gfp 96 0 41 gfp 96 0 42 gfp 96 0 43 gfp 96 0 44 gfp 96 0 45 gfp 96 0 46 gfp 96 0 47 gfp 96 0 48 gfp 96 0 49 gfp 96 0 410 gfp 96 0 411 gfp 96 0 412 gfp 96 0 413 gfp 96 0 414 gfp 96 0 415 gfp 96 0 416 gfp 96 0 417 gfp 96 0 418 gfp 96 0 419 gfp 96 0 5 rpl8 24 1 51 rpl8 24 1 52 rpl8 24 1 53 rpl8 24 1 54 rpl8 24 1 6 rpl8 48 1 61 rpl8 48 1 62 rpl8 48 1 63 rpl8 48 1 64 rpl8 48 1 65 rpl8 48 1 66 rpl8 48 1 67 rpl8 48 1 68 rpl8 48 1 69 rpl8 48 1 610 rpl8 48 1 611 rpl8 48 1 7 rpl8 96 1 71 rpl8 96 1 72 rpl8 96 1 73 rpl8 96 1 74 rpl8 96 1 75 rpl8 96 1 76 rpl8 96 1 77 rpl8 96 1 78 rpl8 96 1 79 rpl8 96 1 710 rpl8 96 1 711 rpl8 96 1 712 rpl8 96 1 713 rpl8 96 1 714 rpl8 96 1 715 rpl8 96 1 716 rpl8 96 1 717 rpl8 96 1 718 rpl8 96 1 719 rpl8 96 1 720 rpl8 96 1 721 rpl8 96 1 722 rpl8 96 1 723 rpl8 96 1 724 rpl8 96 1 8 rpl8 96 0 81 rpl8 96 0 82 rpl8 96 0
except the first column is not 1 to 71 but these weird numbers. It does not affect the survival analysis though.
>survdiff(Surv(age,censor)~trt, data=test2, rho=0) Call: survdiff(formula = Surv(age, censor) ~ trt, data = test2, rho = 0) N Observed Expected (O-E)^2/E (O-E)^2/V trt=gfp 26 6 20.2 9.99 28.3 trt=rpl8 45 42 27.8 7.27 28.3 Chisq= 28.3 on 1 degrees of freedom, p= 1.01e-07