Mean preservation in censored regression using preliminary nonparametric smoothing
In this thesis, we consider the problem of estimating the regression function in location-scale regression models.
This model assumes that the random vector (X,Y) satisfies Y = m(X) + s(X)e, where m(.) is an
unknown location function (e.g. conditional mean, median, truncated mean,...), s(.) is an unknown scale function,
and e is independent of X. The response Y is subject to random right censoring, and the covariate X is completely
In the first part of the thesis, we assume that
m(x) = E(Y|X=x) follows a polynomial model.
A new estimation
procedure for the unknown regression parameters is proposed, which extends the classical least squares procedure to
censored data. The proposed method is inspired by the method of Buckley and James (1979), but is, unlike the latter method, a
non-iterative procedure due to nonparametric preliminary estimation. The asymptotic normality of the estimators is established.
Simulations are carried out for both methods and they show that the proposed estimators have usually smaller variance and smaller
mean squared error than the Buckley-James estimators.
For the second part, suppose that m(.)=E(Y|.) belongs to some parametric class of
regression functions. A new estimation procedure for the true, unknown vector of parameters is proposed, that extends the
classical least squares procedure for nonlinear regression to the case where the response is subject to censoring. The proposed
technique uses new `synthetic' data points that are constructed by using a nonparametric relation between Y and X.
The consistency and asymptotic normality of the proposed estimator are established, and the estimator is compared via simulations
with an estimator proposed by Stute in 1999.
In the third part, we study the nonparametric estimation of the regression function m(.). It is well known that
the completely nonparametric estimator of the conditional distribution F(.|x) of Y given X=x suffers from inconsistency
problems in the right tail (Beran, 1981), and hence the location function m(x) cannot be estimated consistently in a completely
nonparametric way, whenever m(x) involves the right tail of F(.|x) (like e.g. for the conditional mean).
We propose two alternative estimators of m(x), that do not share the above inconsistency problems. The idea is to make use of the
assumed location-scale model, in order to improve the estimation of F(.|x), especially in the right tail.
We obtain the asymptotic properties of the two proposed estimators of m(x). Simulations show that the proposed estimators outperform
the completely nonparametric estimator in many cases.
School:Université catholique de Louvain
Source Type:Master's Thesis
Keywords:kernel estimation location scale model least squares survival analysis right censoring bandwidth selection linear regression nonparametric fatigue life data censored
Date of Publication:08/18/2005