Survival analysis what is




















In addition, individual references for the methods are presented throughout the series. Several introductory texts also describe the basis of survival analysis, for example, Altman and Piantadosi In many medical studies, time to death is the event of interest. However, in cancer, another important measure is the time between response to treatment and recurrence or relapse-free survival time also called disease-free survival time.

It is important to state what the event is and when the period of observation starts and finishes. For example, we may be interested in relapse in the time period between a confirmed response and the first relapse of cancer.

The specific difficulties relating to survival analysis arise largely from the fact that only some individuals have experienced the event and, subsequently, survival times will be unknown for a subset of the study group. This phenomenon is called censoring and it may arise in the following ways: a a patient has not yet experienced the relevant outcome, such as relapse or death, by the time of the close of the study; b a patient is lost to follow-up during the study period; c a patient experiences a different event that makes further follow-up impossible.

Such censored survival times underestimate the true but unknown time to event. Visualising the survival process of an individual as a time-line, their event assuming it were to occur is beyond the end of the follow-up period. This situation is often called right censoring. Censoring can also occur if we observe the presence of a state or condition but do not know where it began. For example, consider a study investigating the time to recurrence of a cancer following surgical removal of the primary tumour.

If the patients were examined 3 months after surgery to determine recurrence, then those who had a recurrence would have a survival time that was left censored because the actual time of recurrence occurred less than 3 months after surgery. Event time data may also be interval censored , meaning that individuals come in and out of observation. If we consider the previous example and patients are also examined at 6 months, then those who are disease free at 3 months and lost to follow-up between 3 and 6 months are considered interval censored.

Most survival data include right censored observations, but methods for interval and left censored data are available Hosmer and Lemeshow, In the remainder of this paper, we will consider right censored data only. In general, the feature of censoring means that special methods of analysis are needed, and standard graphical methods of data exploration and presentation, notably scatter diagrams, cannot be used. This data set relates to patients diagnosed with primary epithelial ovarian carcinoma between January and December at the Western General Hospital in Edinburgh.

Follow-up data were available up until the end of December , by which time Figure 1 shows data from 10 patients diagnosed in the early s and illustrates how patient profiles in calendar time are converted to time to event death data. Figure 1 left shows that four patients had a nonfatal relapse, one was lost to follow-up, and seven patients died five from ovarian cancer.

In the other plot, the data are presented in the format for a survival analysis where all-cause mortality is the event of interest. It is important to note that because overall mortality is the event of interest, nonfatal relapses are ignored, and those who have not died are considered right censored. Figure 1 right is specific to the outcome or event of interest. Here, death from any cause, often called overall survival, was the outcome of interest. If we were interested solely in ovarian cancer deaths, then patients 5 and 6 — those who died from nonovarian causes — would be censored.

In general, it is good practice to choose an end-point that cannot be misclassified. All-cause mortality is a more robust end-point than a specific cause of death. If we were interested in time to relapse, those who did not have a relapse fatal or nonfatal would be censored at either the date of death or the date of last follow-up.

Converting calendar time in the ovarian cancer study to a survival analysis format. These data originate from a phase III clinical trial of patients with surgically resected non-small cell lung cancer, randomised between and to receive radiotherapy either with or without adjuvant combination platinum-based chemotherapy Lung Cancer Study Group, ; Piantadosi, For the purposes of this series, we will focus on the time to first relapse including death from lung cancer.

Table 1 gives the time of the earliest 15 and latest five relapses for each treatment group, where it can be seen that some patients were alive and relapse-free at the end of the study. The relapse proportions in the radiotherapy and combination arms were However, these figures are potentially misleading as they ignore the duration spent in remission before these events occurred.

A sample of times days to relapse among patients randomised to receive radiotherapy with or without adjuvant chemotherapy. Survival data are generally described and modelled in terms of two related probabilities, namely survival and hazard.

The survival probability which is also called the survivor function S t is the probability that an individual survives from the time origin e. It is fundamental to a survival analysis because survival probabilities for different values of t provide crucial summary information from time to event data. These values describe directly the survival experience of a study cohort.

Put another way, it represents the instantaneous event rate for an individual who has already survived to time t. Note that, in contrast to the survivor function, which focuses on not having an event, the hazard function focuses on the event occurring. It is of interest because it provides insight into the conditional failure rates and provides a vehicle for specifying a survival model. In summary, the hazard relates to the incident current event rate, while survival reflects the cumulative non-occurrence.

The survival probability can be estimated nonparametrically from observed survival times, both censored and uncensored, using the KM or product-limit method Kaplan and Meier, As events are assumed to occur independently of one another, the probabilities of surviving from one interval to the next may be multiplied together to give the cumulative survival probability.

The value of S t is constant between times of events, and therefore the estimated probability is a step function that changes value only at the time of each event. This estimator allows each patient to contribute information to the calculations for as long as they are known to be event-free. Were every individual to experience the event i. Confidence intervals for the survival probability can also be calculated. The KM survival curve , a plot of the KM survival probability against time, provides a useful summary of the data that can be used to estimate measures such as median survival time.

The large skew encountered in the distribution of most survival data is the reason that the mean is not often used. Table 2 shows the essential features of the KM survival probability.

The estimator at any point in time is obtained by multiplying a sequence of conditional survival probabilities, with the estimate being unchanged between subsequent event times.

For example, the probability of a member of the radiotherapy alone treatment group surviving relapse-free 45 days is the probability of surviving the first 36 days multiplied by the probability of then surviving the interval between 36 and 45 days. The latter is a conditional probability as the patient needs to have survived the first period of time in order to remain in the study for the second.

The KM estimator utilises this fact by dividing the time axis up according to event times and estimating the event probability in each division, from which the overall estimate of the survivorship is drawn. Calculation of the relapse-free survival probability for patients in the lung cancer trial. There are a total of deaths observed among 5, participants. Descriptive statistics are shown below on the age and sex of participants at the start of the study classified by whether they die or do not die during the follow up period.

We now estimate a Cox proportional hazards regression model and relate an indicator of male sex and age, in years, to time to death. The parameter estimates are generated in SAS using the SAS Cox proportional hazards regression procedure 12 and are shown below along with their p-values. Note that there is a positive association between age and all-cause mortality and between male sex and all-cause mortality i.

Again, the parameter estimates represent the increase in the expected log of the relative hazard for each one unit increase in the predictor, holding other predictors constant. There is a 0. For interpretability, we compute hazard ratios by exponentiating the parameter estimates. For age, exp 0. There is an Similarly, exp 0.

The expected hazard is 1. Suppose we consider additional risk factors for all-cause mortality and estimate a Cox proportional hazards regression model relating an expanded set of risk factors to time to death. The parameter estimates are again generated in SAS using the SAS Cox proportional hazards regression procedure and are shown below along with their p-values. All of the parameter estimates are estimated taking the other predictors into account. After accounting for age, sex, blood pressure and smoking status, there are no statistically significant associations between total serum cholesterol and all-cause mortality or between diabetes and all-cause mortality.

This is not to say that these risk factors are not associated with all-cause mortality; their lack of significance is likely due to confounding interrelationships among the risk factors considered. Notice that for the statistically significant risk factors i. A prospective cohort study is run to assess the association between body mass index and time to incident cardiovascular disease CVD.

At baseline, participants' body mass index is measured along with other known clinical risk factors for cardiovascular disease e. Participants are followed for up to 10 years for the development of CVD. In a Cox proportional hazards regression analysis, we find the association between BMI and time to CVD statistically significant with a parameter estimate of 0. If we exponentiate the parameter estimate, we have a hazard ratio of 1. Because we model BMI as a continuous predictor, the interpretation of the hazard ratio for CVD is relative to a one unit change in BMI recall BMI is measured as the ratio of weight in kilograms to height in meters squared.

A one unit increase in BMI is associated with a 2. To facilitate interpretation, suppose we create 3 categories of weight defined by participant's BMI. The numbers of CVD events in each of the 3 groups are shown below. The incidence of CVD is higher in participants classified as overweight and obese as compared to participants of normal weight.

We now use Cox proportional hazards regression analysis to make maximum use of the data on all participants in the study. The latter two models are multivariable models and are performed to assess the association between weight and incident CVD adjusting for confounders. Because we have three weight groups, we need two dummy variables or indicator variables to represent the three groups.

In the models we include the indicators for overweight and obese and consider normal weight the reference group. In the unadjusted model, there is an increased risk of CVD in overweight participants as compared to normal weight and in obese as compared to normal weight participants hazard ratios of 1.

The same is true in the model adjusting for age, sex and the clinical risk factors. There are a number of important extensions of the approach that are beyond the scope of this text. In the previous examples, we considered the effect of risk factors measured at the beginning of the study period, or at baseline, but there are many applications where the risk factors or predictors change over time.

Suppose we wish to assess the impact of exposure to nicotine and alcohol during pregnancy on time to preterm delivery. Smoking and alcohol consumption may change during the course of pregnancy.

These predictors are called time-dependent covariates and they can be incorporated into survival analysis models. The Cox proportional hazards regression model with time dependent covariates takes the form:. Notice that each of the predictors, X 1 , X 2 , There are also many predictors, such as sex and race, that are independent of time. Survival analysis models can include both time dependent and time independent predictors simultaneously.

Many statistical computing packages e. A difficult aspect of the analysis of time-dependent covariates is the appropriate measurement and management of these data for inclusion in the models. A very important assumption for the appropriate use of the log rank test and the Cox proportional hazards regression model is the proportionality assumption.

Specifically, we assume that the hazards are proportional over time which implies that the effect of a risk factor is constant over time. There are several approaches to assess the proportionality assumption, some are based on statistical tests and others involve graphical assessments. In the statistical testing approach, predictor by time interaction effects are included in the model and tested for statistical significance. If one or more of the predictor by time interactions reaches statistical significance e.

An alternative approach to assessing proportionality is through graphical analysis. There are several graphical displays that can be used to assess whether the proportional hazards assumption is reasonable. These are often based on residuals and examine trends or lack thereof over time. More details can be found in Hosmer and Lemeshow 1. If either a statistical test or a graphical analysis suggest that the hazards are not proportional over time, then the Cox proportional hazards model is not appropriate, and adjustments must be made to account for non-proportionality.

One approach is to stratify the data into groups such that within groups the hazards are proportional, and different baseline hazards are estimated in each stratum as opposed to a single baseline hazard as was the case for the model presented earlier. Many statistical computing packages offer this option. The competing risks issue is one in which there are several possible outcome events of interest.

For example, a prospective study may be conducted to assess risk factors for time to incident cardiovascular disease. Cardiovascular disease includes myocardial infarction, coronary heart disease, coronary insufficiency and many other conditions. The investigator measures whether each of the component outcomes occurs during the study observation period as well as the time to each distinct event.

The goal of the analysis is to determine the risk factors for each specific outcome and the outcomes are correlated.

Interested readers should see Kalbfleisch and Prentice 10 for more details. Time to event data, or survival data, are frequently measured in studies of important medical and public health issues. Because of the unique features of survival data, most specifically the presence of censoring, special statistical procedures are necessary to analyze these data.

In survival analysis applications, it is often of interest to estimate the survival function, or survival probabilities over time. There are several techniques available; we present here two popular nonparametric techniques called the life table or actuarial table approach and the Kaplan-Meier approach to constructing cohort life tables or follow-up life tables.

Both approaches generate estimates of the survival function which can be used to estimate the probability that a participant survives to a specific time e. It is often of interest to assess whether there are statistically significant differences in survival between groups between competing treatment groups in a clinical trial or between men and women, or patients with and without a specific risk factor in an observational study.

There are many statistical tests available; we present the log rank test, which is a popular non-parametric test. It makes no assumptions about the survival distributions and can be conducted relatively easily using life tables based on the Kaplan-Meier approach. There are several variations of the log rank statistic as well as other tests to compare survival curves between independent groups.

We use the following test statistic which is distributed as a chi-square statistic with degrees of freedom k-1, where k represents the number of independent comparison groups:. The observed and expected numbers of events are computed for each event time and summed for each comparison group over time.

To compute the log rank test statistic, we compute for each event time t, the number at risk in each group, N jt e. Finally, there are many applications in which it is of interest to estimate the effect of several risk factors, considered simultaneously, on survival.

Cox proportional hazards regression analysis is a popular multivariable technique for this purpose. The Cox proportional hazards regression model is as follows:.

The associations between risk factors and survival time in a Cox proportional hazards model are often summarized by hazard ratios. The hazard ratio for a dichotomous risk factor e. For example, in a clinical trial with survival time as the outcome, if the hazard ratio is 0. The KM survival curve, a plot of the KM survival probability against time, provides a useful summary of the data that can be used to estimate measures such as median survival time.

The function survfit [in survival package] can be used to compute kaplan-Meier survival estimate. Its main arguments include:.

By default, the function print shows a short summary of the survival curves. It prints the number of observations, number of events, the median survival and the confidence limits for the median. The horizontal axis x-axis represents time in days, and the vertical axis y-axis shows the probability of surviving or the proportion of people surviving. The lines represent survival curves of the two groups. A vertical drop in the curves indicates an event. The vertical tick mark on the curves means that a patient was censored at this time.

The median survival times for each group represent the time at which the survival probability, S t , is 0. There appears to be a survival advantage for female with lung cancer compare to male. However, to evaluate whether this difference is statistically significant requires a formal statistical test, a subject that is discussed in the next sections.

Note that, the confidence limits are wide at the tail of the curves, making meaningful interpretations difficult. This can be explained by the fact that, in practice, there are usually patients who are lost to follow-up or alive at the end of follow-up.

Thus, it may be sensible to shorten plots before the end of follow-up on the x-axis Pocock et al, The cummulative hazard is commonly used to estimate the hazard probability. In other words, it corresponds to the number of events that would be expected for each individual by time t if the event were a repeatable process.

As mentioned above, you can use the function summary to have a complete summary of survival curves:. This makes it possible to facet the output of ggsurvplot by strata or by some combinations of factors.

Select personalised content. Create a personalised content profile. Measure ad performance. Select basic ads. Create a personalised ads profile. Select personalised ads. Apply market research to generate audience insights. Measure content performance. Develop and improve products. List of Partners vendors. Survival analysis, also known as time-to-event analysis, is a branch of statistics that studies the amount of time it takes before a particular event of interest occurs.

Insurance companies use survival analysis to predict the death of the insured and estimate other important factors such as policy cancellations, non-renewals, and how long it takes to file a claim. Results from such analyses can help providers calculate insurance premiums , as well as the lifetime value of clients.

Survival analysis mainly comes from the medical and biological disciplines, which leverage it to study rates of death, organ failure, and the onset of various diseases. Perhaps, for this reason, many people associate survival analysis with negative events.



0コメント

  • 1000 / 1000