
CME 




Year : 2010  Volume
: 35
 Issue : 2  Page : 217221 

Censoring in clinical trials: Review of survival analysis techniques
Shankar Prinja^{1}, Nidhi Gupta^{2}, Ramesh Verma^{3}
^{1} London School of Hygiene and Tropical Medicine, Keppel Street, London, UK ^{2} Department of Radiotherapy, Government Medical College, Chandigarh, India ^{3} Department of Community Medicine, Pt BDS PGIMS, Rohtak, Haryana, India
Date of Submission  25Apr2009 
Date of Acceptance  25Mar2010 
Date of Web Publication  29Jul2010 
Correspondence Address: Shankar Prinja G427, GH2, Sector 5, Mansa Devi Complex, Panchkula  134 109, Haryana, India
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/09700218.66859
How to cite this article: Prinja S, Gupta N, Verma R. Censoring in clinical trials: Review of survival analysis techniques. Indian J Community Med 2010;35:21721 
How to cite this URL: Prinja S, Gupta N, Verma R. Censoring in clinical trials: Review of survival analysis techniques. Indian J Community Med [serial online] 2010 [cited 2020 Nov 25];35:21721. Available from: https://www.ijcm.org.in/text.asp?2010/35/2/217/66859 
Introduction   
Medical and epidemiological studies are mostly conducted with an interest in measuring the occurrence of an outcome event. Studies conducting survival analysis, however, are focussed toward measuring time to event or outcome. Time to event could vary from time to fatal event i.e. death, or time to occurrence of a clinical endpoint such as disease, or attainment of a biochemical marker. Survival analysis studies originated with the publication of John Graunt's Weekly Bills of Mortality in London. First life table was prepared by Healy. ^{[1]} Survival analysis is also known as lifetime data analysis, time to event analysis, reliability and event history analysis depending on focus and stream where it is used. However, survival analysis is plagued by problem of censoring in design of clinical trials which renders routine methods of determination of central tendency redundant in computation of average survival time. The present essay attempts to highlight different methods of survival analysis used to estimate time to event in studies based on individual patient level data in the presence of censoring. Section 2 highlights types of censoring encountered in a clinical trial, its types and potential statistical solutions. Survival analysis techniques, its assumptions and suitability of methods under different data conditions are illustrated in sections 3 and 4. The next section 5 discusses the importance of techniques to extrapolate estimate of life expectancy derived over a period of time exceeding the duration of trial followup. Lastly, section 6 cites limitations and advantages of different methods and finally concludes by indicating possible future areas of research and practice for health economists and public health professionals. We reviewed articles published in PubMed, Science Direct and Ovid search engines using "censoring in clinical trials", "survival analysis" and "Kaplan Meier method" as key words. After a total of 213 articles retrieved, articles focussing only on methodology aspect were considered for the present review. Original articles were preferred following by subsequent discussion articles, which added substantially to the methodology.
Censoring in Clinical Trials   
Censoring is said to be present when information on time to outcome event is not available for all study participants. Participant is said to be censored when information on time to event is not available due to loss to followup or nonoccurrence of outcome event before the trial end. Broadly classifying two types of censoring are encountered, i.e. point and interval censoring. ^{[2]}
Point censoring is said to occur when despite continuous monitoring of outcome event, the patient is lost to followup or the event does not occur within the study duration. It is also known as right censoring which can be either endofstudy censoring or losstofollowup censoring. An individual is said to be left censored if the patient had been on risk for disease for a period before entering the study. However, left censoring is usually not a problem in clinical trials, since starting point is defined by an event such as entry of patient in trial, randomization or occurrence of a procedure or treatment. Individuals B and C are right censored while individual F is left censored [Figure 1].
Problem of interval censoring arises when time to event may be known only up to a time interval. This situation occurs in case the assessment of monitoring is done at a periodical frequency. This is illustrated by a hypothetical study done to ascertain the incubation period of AIDS after occurrence of HIV infection [Figure 2]. Practically, most observational studies dealing with nonlethal outcomes have periodical examination schedules and are thus interval censored. However, if the periodicity of examination is at a justified frequency, interval censored data can be dealt with as point censored. ^{[3]}
Statisticians have devised various methods to deal with censored data which includes complete data analysis, imputation techniques or analysis based on dichotomized data. ^{[2]} However, these methods are laden with problems and complexities for others. Detailed discussion of each of these methods is beyond the scope of the present essay due to space constraint; however, it is important to bear in mind the techniques available. The more effective methods that are widely used in survival studies encountering censored data are likelihoodbased approaches (survival analysis methods) which adjust for the occurrence of censoring in each observation, and thus are advantageous that it uses all available information.
Survival Analysis Techniques   
Survival analysis techniques used for dealing with censored data can be broadly classified into nonparamteric (Kaplan Meier product limit method), parametric (Weibull and exponential methods) and semiparamteric method (Coxproportional hazards method). The latter two can also be applied as regressionbased models. However, usual likelihoodbased functions of whole sample which are product of individual likelihood functions cannot be applied in the presence of complex censoring mechanisms, especially in the presence of both loss to followup and end of study censoring. In such situation, a joint distribution of survival and censoring times can be done. However, the distribution of survival time in such situation is considered as nonidentifiable. Hence, the assumption of independent censoring is imperative. ^{[2]} This assumption implies that time to censoring and survival times are independent. In other words, censoring is independent of unusual high or low risk for occurrence of event which implies that survival times for censored and uncensored individuals is same and removal of censored individuals from analysis would yield an unbiased estimate of survival time or time to event.
NonParametric Survival Analysis: Kaplan Meier Product Limit Method   
This method computes the probability of dying at a certain point of time conditional to the survival up to that point. ^{[4]} It utilizes the information of censored individuals till the point when the patient is censored. Thus it maximizes utilization of available information on time to event of the study sample. The equation used to derive survival probability at time 't' is derived by the following equation:
where d _{t} /n _{t} represents the probability of dying at time 't' conditional to being at risk (alive) at 't1' time.
The censored individuals are excluded from the denominator of 'at risk' individuals at the point when they are censored, however, are included at each preceding point. They are not included in numerator at any point. [Table 1] and [Table 2] illustrate a hypothetical dataset and computation of survival probability using the Kaplan Meier estimator, respectively.
Censoring affects the shape of survival curve in a situation when a large number of individuals are censored at a single point of time leading to sudden spurious large jumps or large flat section in survival curve. ^{[5]} Other factors leading to such spurious jump includes extremely low number of individuals at risk especially toward the end of the study or prearranged clinic visit schedule. Statistical significance of difference between two or more survival curves is determined by the logrank test.
Reliability of the different portions of survival curve is dependent on the number of individuals at risk at that stage. Majority of studies dealing with analysis of survival time are likely to have some individuals for which outcome event and thus time to event or outcome is not recorded. This is in view of paucity of resources (money and time) to carry forward the study till outcomes (especially fatal outcomes) are recorded for each and every study individual. To have a measure of maturity of data thus becomes imperative to show quality of data in terms of adequacy of individuals for which outcome event and thus time to event is recorded. Simpler measures for maturity include average (median) followup period. However, a more robust graphical technique involves constructing a survival curve by reversing censoring i.e. labelling patients died as censored and viceversa and computing time to followup for this dataset. Thus median time to followup is estimated using this technique which deals with censoring in the same manner as for time to survival. ^{[5]}
Parametric Survival Analysis: Weibull and Exponential Methods   
As in the nonparametric approaches in the analysis of time to event data, the models under parametric approach derive estimates of failure time statistics while accounting for the presence of censoring in the data. The main difference between the two approaches is that the latter attempt to derive estimates using a parametric model, which male specific assumptions about the distribution of failure time through assuming a particular functional form for the hazard rate. This functional form can specify the hazard rate as a function of time or can incorporate covariate information in which case the hazard rate is specified as a function of time and specific covariates. In this way, failure time is related to a set of covariates thus leading to a regression approach.
Parametric methods of survival analysis assume distribution of hazard rate as a function of time besides assumption of independent censoring. ^{[5]} Hazard rate is the instantaneous probability of dying in next short interval conditional upon having survived till time 't'. The expression d _{t} /n _{t} in Kaplan Meier product limit survival equation is hazard rate. The denominator n _{t} includes total followup time till time 't' for both individuals who are atrisk and censored. Numerator d _{t} is the number of individuals who die/ experience outcome event during the short time interval after 't'. Censoring plays a similar role in the models as in the case of the nonparametric hazard and the condition of independent censoring. The censoring is adjusted for in parametric models by incorporation of hazard rate which uses similar Kaplan Meier estimator. When the model incorporates covariates, the condition of independent censoring is assumed in the presence of covariates.
Weibull method assumes that hazard rate changes (increases or decreases) monotonically with time. Hazard rate and other parameters for Weibull distribution are shown in [Table 1]. Since hazard rate is time dependent, it increases for all values of P>1, decreases for p0 <1 and is constant for P=1 [Table 3].
Exponential distribution is a special case of Weibull distribution for which p0 =1, implying that the hazard rate does not vary with time. A constant value of hazard rate leads to a constant probability of dying, unchanged with passage of study duration or time. This highlights the fact that the probability of dying during any given interval is dependent only on the width of time interval, with directly proportional relationship. On the contrary, it is unaffected by the point of beginning and end of time interval.
It is important to ascertain the distribution of hazard rate for any dataset. Since the exponential distribution assumes constant hazard rate, this can be written as;
which can also be written as
Log [S(t)]= λt or  Log [S(t)]= λt. Taking logarithms on each side,
Log{ Log [S(t)]}= logλ + log t
This is an equation of straight line, with slope 'b' equal to 1. Thus complementary log transformation i.e. Log{ Log [S(t)]}, against log t, is an effective tool for assessing constancy of hazard rate implying exponential distribution. ^{[5]}
SemiParametric Survival Analysis Method: CoxProportional Hazard Method   
The parametric models discussed above specify a specific form for the hazard function over time. As such these models fully characterize the distribution of failure time as a function of time (and covariates). If however, interest is primarily in determining how a new treatment affects survival compared to an old treatment, then a functional form for the hazard over time need not be specified. This leads to a semiparametric regression models used to describe survival time in a comparative sense, the most commonly adopted of which is the Coxproportional hazards model.
Coxproportional hazard model is not a pure parametric method since it does not assume any functional form of distribution of hazard rate. However, it assumes that hazard function of any two individuals is proportional with the ratio being determined by the covariates that is constant over time. ^{[6],[7]} Clearly, if one is unsure of the functional form of hazard function, adopting a semiparametric approach would be the preferred alternative to imposing specific parametric assumptions. Considering the semiparametric assumption, the ratio of hazard functions for any two individuals is constant over time and is determined by the individuals' covariate values. The latter is nothing but an expression of relative risk.
The proportionality assumption of semiparametric method can be assessed either graphically by plotting survival curves or by means of a test statistic, ^{[8]} dependent on which the null hypothesis is rejected if the p0 value is less than significance level. In graphic representation, proportionality assumption holds true if the two survival curves do not cross each other. Alternatively, complementary log transformation, Log{ Log [S(t)]}, is useful. Plot of the double log of two survival curves i.e. Log{ Log [S(1)]} and Log{ Log [S(2)]}is a straight line contingent to correctness of proportionality assumption.
Besides, the parametric and semiparametric methods described above, rank regression methods have been used to deal with time to event data in the presence of censoring. Asymptotic normality of estimators is established under certain assumptions. This method makes use of martingale theory and a tightness lemma for stochastic integrals of multiparameter empirical process. ^{[9]}
Suitability of Survival Analysis Methods   
Nonparametric methods i.e. Kaplan Meier product limit method and the Coxproportional hazards model are the most commonly used methods for survival analysis in public health literature. Kaplan Meier method is suitable when no assumption about the functional distribution of hazard rate with time is made. However, since no assumption is made about hazard rate, no extrapolation of study results beyond the study period is possible. Parametric methods overcome this disadvantage and are used for extrapolation of study results beyond the study period discussed in subsequent section. Parametric and semiparametric methods also help in ascertaining the influence of covariates on survival times. However, it is inappropriate to assume an exponential distribution universally, especially in case of survival rates following birth and survival rate after a major surgery, contingent upon knowledge that mortality rates are higher in infancy and in immediate postoperative period, respectively, with gradual decline thereafter. Semiparametric methods can be useful in such situations. Coxproportional method has the additional advantage of being able to ascertain influence of covariates such as demographic factors, nature of diseases and treatment received on hazard/survival rate.
An advantage of Coxproportional hazard model over the logrank test following Kaplan Meier method is the ability to use the former with continuous and categorical (after creation of dummy variable) variables. ^{[5]} On the other hand the logrank test cannot deal with continuous variables.
Extrapolation of Survival Estimates Beyond Study FollowUp   
Epidemiological studies estimating survival are generally conducted for a short followup period which is contingent upon recruitment of sufficient number of individuals to provide adequate study endpoints which powers the study to ascertain difference in influence of covariates on time to event at a given significance level. However, despite this relatively shorter followup, it is relevant in many cases to extrapolate results at longer time duration. This is especially pertinent in economic evaluations where time horizon has to be sufficiently large, in many cases lifetime horizon, to guide decisionmakers for making sound decisions. The abovementioned statement particularly applies to preventive programs where benefits of a program start appearing and are yielded over longer time duration.
Extrapolation of results requires an assumption of how the present results would vary in course of time i.e. function of hazard rate over time. It may assume that the impact of covariate on hazard rate remains the same after the followup period; or it ceases to occur; or an intermediate situation. Alternatively a parametric method may be used to model observed data to extrapolate results. This is an application of Weibull and exponential methods. Lastly, survival curves from other observational studies are matched with present study data. Data from observational study which matches the present survival curve maximally is used to model the results beyond followup period.
Conclusion   
We acknowledge that this article does not cover indepth certain finer points related to methods for survival analysis. Nonetheless, the article can serve as a good note for the beginners who are interested to learn survival analysis. The present essay discusses the role of survival analysis techniques in individual level patient data amidst censoring which have been widely used by health economists, public health professionals, social and behavioral scientists. Right censoring is primarily dealt with by the application of these survival analysis methods, while interval censoring has been dealt with by statisticians using imputation techniques. Left censoring is usually not a problem in thoughtfully designed clinical trials since starting point or beginning of risk period is defined by an event such as randomization or performance of an intervention.
Independence of censoring (or noninformative censoring) is the most important assumption in all methods. However, this assumption may also be null in situations termed as 'frailty' i.e. when time between two recurrent events (for instance repeated infections) is the measure of interest, some individuals may have higher chances or an unknown tendency to have recurrent events. ^{[10]}
Nonparamteric methods such as Kaplan Meier, Log Rank and Wilcoxon tests are the most commonly used methods for survival analysis. Improved versions with application of prognostic covariate information are being applied which make Kaplan Meier estimates much more robust in the presence of censoring. ^{[11]} Impact of covariates affecting the survival rates are commonly dealt with using Coxproportional hazards method. Recent development of specialized softwares has generated greater interest in application of parametric methods which offer additional advantages and allow extrapolation of results beyond the study followup period.
References   
1.  Lee ET, Go OT. Survival analysis in public health research. Annu Rev Public Health 1997;18:10534. [PUBMED] [FULLTEXT] 
2.  Leung KM, Elashoff RM, Afifi AA. Censoring issues in survival analysis. Annu Rev Public Health 1997;18:83104. [PUBMED] [FULLTEXT] 
3.  de Gruttola V, Lagakos SW. Analysis of doubly censored survival data with application to AIDS. Biometrics 1989;45:111. [PUBMED] 
4.  Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc 1958;53:45781. 
5.  Machin D, Cheung YB, Parmar MK. Survival analysis a practical approach. 2 ^{nd} ed. West Sussex: John Wiley and Sons Ltd; 2006. 
6.  Cox DR. Regression models with lifetables (with discussion). J R Stat Soc Series B Stat Methodol 1972;34:26976. 
7.  Breslow NE. Contribution to discussion of paper by DR Cox. J R Stat Soc Series B Stat Methodol 1972;34:2167. 
8.  Grambsch PM, Therneau TM. Proportional hazards tests and diagnostics based on weighted residuals. Biometrika 1994;81:51526. 
9.  Lai L, Ying Z. Rank Regression Methods for left truncated and right censored data. Ann Stat 1991;19:5316. 
10.  McGilchrist CA, Aisbett CW. Regression with frailty in survival analysis. Biometrics 1991;47:4616. [PUBMED] 
11.  Murray S, Tsiatis AA. Nonparamteric survival estimation using prognostic longitudinal covariates. Biometrics 1996;52:13751. [PUBMED] 
[Figure 1], [Figure 2]
[Table 1], [Table 2], [Table 3]
This article has been cited by  1 
Bayesian Perspective on Random Censored Survival Data 

 Chris B. Guure,Samuel Bosomprah   International Scholarly Research Notices. 2014; 2014: 1   [Pubmed]  [DOI]   2 
result 1 Document Reinfection of Dogs with FishBorne Zoonotic Trematodes in Northern Vietnam following a Single Treatment with Praziquantel 

 Nissen, S., Nguyen, L.A.T., Thamsborg, S.M., Dalsgaard, A., Van Johansen, M.   Source of the Document PLoS Neglected Tropical Diseases. 2014;   [Pubmed]  


