# Discontinuities due to survey redesigns : a structural time series approach

Advisor:Jan van den Brakel

School:Universiteit Maastricht

School Location:Netherlands

Source Type:Master's Thesis

Keywords:Structural time series analysis

ISBN:

Date of Publication:10/03/2011

Discontinuities due to survey redesigns :

a structural time series approach

Master thesis MSc. Econometrics and Operations Research

Maastricht University

By Ibragim Goesjenov

under supervision of

Prof. dr. ir. ing. Jan van den Brakel (Statistics Netherlands / Maastricht University)

Abstract

In this paper discontinuities in time series which arise from a survey redesign are analyzed using

a structural time series framework. This framework is applied to a set of sample surveys conducted

consecutively by Statistics Netherlands on subjects including victimization and the number of criminal

offenses in the period from 1980 up until 2010. To estimate the discontinuities the Kalman filter

is implemented using the library Ssfpack in OxMetrics. Additionally, in the estimation of the effects

due to a survey redesign this paper also analyses the effects of explanatory variables such as police

registrations.

This project is set up in cooperation with Statistics Netherlands in Heerlen in the form of an internship.

I would like to take the opportunity and thank everyone at Statistics Netherlands for helping

me collecting the data, gathering the literature and performing the analysis in order to make this

project a success. Especially I would like to thank my supervisor Jan van den Brakel for the extensive

support and feedback during the entire project and Harrie Huys for assistance in data collection and

interpretation.

1 Introduction

National statistical offices conduct surveys in several fields, including transportation, demography

and crime. Statistics Netherlands (SN) was established in 1899 and currently has its locations in The

Hague and Heerlen, being responsible for collecting the statistics of the Dutch economy and society

which amongst others serve as a basis for policy makers and politicians. Whilst the main task of SN is

to collect data in an appropriate way, the Division of Methodology and Quality (DMQ) is responsible

for the whole process of survey sampling, research on the proper data collection methodologies and quality

control. One of the challenges is producing continuous time series out of a sequence of series which

contain discontinuities due to survey redesigns.

More specific, people are usually interested in particular finite population parameters such as the percentage

of unemployed of a population, which could be the population of a country but also companies

or municipalities. In theory the population parameters could be obtained by collecting data of every

individual in the population, which is called a census. However, due to practical and financial limitations

national statistical offices are not able to consider the whole population when collecting data. Interviewing

the almost 17 billion Dutch citizens every year is not realistic and on top of that inefficient.

Alternatively, statistical offices consider samples from a population, meaning that the population parameters

of interest are estimated from the data collected. Compared to a census, considering a sample has

several advantages, including that it is less expensive and the information is usually available much faster

since not all individuals in a population have to be interviewed. The major disadvantage of considering a

sample is that the estimates based on the sample almost never exactly coincide with the parameters for

the entire population. The error in the analysis which can be addressed to the fact that only a sample is

considered instead of all individuals in a population is also called the sampling error. This is an inevitable

implication of using samples for the estimation of population parameters, but it can be controlled for

by using a sufficiently large sample size in order to minimize it. In many cases exact information is not

needed. Instead of that researchers are generally more interested in the confidence interval in which the

parameter of interest is likely to lie. However, constructing such a confidence interval is only possible if

the sample is selected by some kind of lottery mechanism, or probability samples, Särndal et al. (1992)

and Renssen (1998). The basic example of such a lottery mechanism is the simple random sample, where

every element of the population has the same probability to be chosen into the sample (Bethlehem, 2009).

On the other hand, the non-sampling errors are deviations from the true value of the parameter which

cannot be addressed to sampling errors. These errors are not necessarily related to a sample survey and

are also likely to occur in a census. Whereas the expectation of the sampling error is zero, non-sampling

errors generally introduce a bias in the estimate. Non-sampling errors can be divided in non-response,

measurement and coverage errors. For example, the data collection method used in a survey, as discussed

in de Leeuw (2005), accounts for a part of the measurement error. It appears that especially the

1

presence of an interviewer has a large influence on the interviewee, and it is particularly present in case

the interview questions get more sensitive.

Changes in the survey process might lead to systematic differences in the outcomes of a survey, such

that the non-sampling error is changed. This in turn might lead to a discontinuity in the series, such

that comparing the new series to the series before the change-over is not appropriate anymore (van den

Brakel and Roels, 2010). Sometimes changes in the survey process are inevitable, for example to improve

the quality of the survey. Note that this could also happen due to budget cuts. An example of such a

change in the survey process is when the data collection method has been changed from a web based

interview to personal interviewing. The interview speed of the latter data collection method is usually

lower, such that the respondents are given more time to think about their answers. In such, compared to

the series before the redesign, the measurement error in the new series might have been partly changed.

There exist several methods to quantify the discontinuities and then correct the series to get a continuous

one. First, if the micro data observed under the regular and the new survey are consistent, it is

possible to quantify the discontinuities by recalculating the observed series (van den Brakel et al., 2008).

For example, if a new classification of publication domains is introduced, using a domain indicator it is

possible to quantify the discontinuities and then recalculate the values according to the old classification.

However, in many real life situations, after a survey redesign the micro data are not consistent under the

old and the new approach, and hence recalculation is not applicable. Therefore, other methods have to be

used. One method is to conduct an experiment and run it parallel for a certain time period with the new

survey, called a parallel run. The difference between the two surveys is the discontinuity. This approach

might be rather expensive and therefore it is not always possible to afford an appropriate parallel series.

Another methodology to quantify discontinuities which does not require a parallel run is a structural

time series approach. It assumes that a series can be modeled in a state space model including several

components, namely a trend, cyclic components, an intervention variable and an irregular term. The

intervention variable accounts for the discontinuities, assuming it is known at which time points the new

surveys were introduced. Then, after the state space model is set up, one can apply a filtering technique

called the Kalman filter, which attempts to disentangle the discontinuity from the real development of

the parameter. In this paper a structural time series model as described in Durbin and Koopman (2001)

is applied to model a series of crime victimization.

This paper is set up as follows. First, section 2 describes the relevant sampling theory techniques. The

data considered in this paper is discussed in section 3, after that the literature on state space models

and the Kalman filter will be reviewed in section 4. The application to the data and the results will be

presented in the fifth section. The paper concludes with a discussion in section 6.

2

2 Sampling theory

Ever since the existence of statistical offices, sampling theory has been a significant part of research.

In this paragraph sampling theory as discussed in Bethlehem (2009), Särndal et al. (1992) and Renssen

(1998) is summarized. In short, sampling theory boils down to choosing a sampling strategy, which

is the combination of a sampling design and an estimator, in such a way that the estimates for the

unknown population parameters are as precise as possible given a certain budget. These finite population

parameters could be e.g. means, totals or fractions.

In this section first the basics of survey sampling are shortly discussed. After the main mathematical

tools are introduced in subsection 2.2, in subsection 2.3 the sampling design will be addressed, whereas

subsection 2.4 will explain the use of auxiliary information in the estimator.

2.1 Survey sampling

The objective of a survey research is ”to draw a sample, carry out measurements on the sampled population

elements, and, on basis of that information, draw conclusions about some population parameter”

(Bethlehem, 2009). More specific, in a survey process the following steps can be distinguished.

1. Determining the population of interest, the target variables and the auxiliary variables and constructing

the questionnaire

2. Obtaining a sample frame

3. Determining the sampling design and the estimator

4. Drawing the sample

5. Doing the field work

6. Processing and analysing the sample data

Following step 1, it is crucial to define the population from where one is taking a sample and the variables

one is interested in. This is because it is not possible to change these parameters after the collection process

has been started. Moreover, the design of the survey questionnaires is also done in step 1.

A finite population is a union of units, called sampling units. To draw a sample, a list with sampling units

is needed, which is called a sample frame. Hence, the sample frame is usually an administrative register

from which the sample can be drawn, this could be the phone book if the population consists of Dutch

citizens or the files of the Chamber of Commerce, if companies are considered as the population. This

could lead to immediate bias in the sample, since one can imagine that not all households are listed in

a phone book, for example. On the other hand, some individuals could be included unintentionally, such

as non-Dutch citizens having a Dutch phone number. Non-sampling errors which are due to an omission

of some units belonging to the target population and inclusion of units not belonging to the population

3

are called coverage errors (Bethlehem, 2009).

Step 3 is the main focus of this section. Here, the aforementioned probability samples are set up, depending

on the population parameters and the budget of the statistical institute. The combination of

the sampling design and the estimator is called the sampling strategy. In both parts of the strategy

auxiliary information can be used to improve precision of the estimator. First, the sampling design is the

selection procedure of the lottery mechanism, and in an ideal case every population element should be

assigned a probability to be selected. Auxiliary information is used by more advanced sampling designs.

Then, the estimator is closely related to the sampling design, it is namely the formula which converts

the sample measurements into an estimate of the unknown population parameter of interest. Depending

on the sampling design, the estimator calculates the best possible value given the sample observed and

the auxiliary information available. This implies that the auxiliary information must be somehow related

to the target variable. The following subsections will deepen the theory on choosing the best strategy

making the sampling error as small as possible.

After the strategy from step 3 is fixed, the sample can be drawn in step 4. This boils down to selecting

the individuals from the population using a random number generator by applying the sampling design.

Once these individuals are selected, the field work can be performed in step 5. This is again a crucial step

in the roadmap, and the errors made here might explain discontinuities due to redesign. More specific,

the field work is heavily under subject of non-sampling errors. First, non-response errors occur when

the respondent refuses to cooperate in the survey, e.g. because he or she has no interest in participating

in any survey. If there is a part of the population which is systematically left out of the survey due to

non-response, this will bias the results. Then, measurement errors occur when the respondent is reached,

but he or she fails to produce the correct answer. The questionnaire design and data collection method

mostly determine the amount of this kind of bias. Hence, these errors account for a bias in the sample

such that the sample parameters differ from the population parameters (Bethlehem, 2009). If the survey

designed in steps 1-5 is kept is unchanged, these errors do not change either such that the results of the

survey are comparable over time. Things change if the survey is redesigned in steps 1-5, such that this

bias changes making it necessary to quantify this change in order to be able to compare it with previous

surveys.

Last, the sampled data is analyzed and published in step 6, which is the final step where the survey can

be checked for extreme outliers and other mistakes. An example is to sort data and check for extreme

ages in the sample, having someone aged 134 is quite unlikely, but possible. (Bethlehem, 2009)

2.2 Definitions

In this section the following notational conventions have been used. An uppercase letter, e.g. Y or U,

will always denote a population parameter. On the other hand, lowercase letters such as y or s will refer

4

to sample parameters. Additionally, upperbars such as in Y and y are systematically used to denote

means, whereas the circumflex in ˆy refers to estimators.

Let the finite population be represented by U, which in turn consists of N identifiable individuals. Note

that the target variable is the variable which is necessary to answer the question that one wants to

answer about the population (Bethlehem, 2009). Each element k of U (k = 1, . . . , N) is associated with

Y_{k}, the value of the scalar target variable and a vector X_{k }which includes the values of the auxiliary

variable. The population totals of the target and auxiliary variables are defined as Y = ^{∑}^{N}

_{k}_{=1 }Y_{k }and

X = ^{∑}^{N}

_{k}_{=1 }X_{k}, respectively.

Then, let s = (k_{1}, . . . , k_{n}) be the sample which is drawn from U using a lottery-like mechanism, also

called a random sample. Here, k_{i }is an element from the population U and n is the sample size. The

values of the target variable for the selected elements k_{1}, . . . , k_{n }are denoted as y_{1}, . . . , y_{n}. Note that a

random sample is necessary to draw a sample which is representative for the population. The sample s is

not unique meaning that using the lottery-like mechanism several different samples can be drawn from

a population. The complete set of all possible samples which can be drawn from U is represented by ∇.

Every sample s ∈ ∇ has a probability p(s) to be selected (Särndal et al., 1992). In detail, p(s) is called

the sampling design and has to fulfill the following constraints :

0 < p(s) ≤ 1, and ^{∑ }p(s) = 1,

such that p(s) represents the probability to every conceivable sample s from U (Bethlehem, 2009). If

s_{k }represents the number of times the element k is included in sample s, then the first order inclusion

s

expectation of k is defined as

π_{k }= ^{∑ }s_{k}p(s),

s∈∇

whereas the second order inclusion expectation of elements k and l is

π_{kl }= ^{∑ }s_{k}s_{l}p(s).

s∈∇

The first order inclusion expectation π_{k }can be interpreted as the expectation of the number of times

one particular element is included in the sample, whereas π_{kl }represents the expectation that a pair of

elements is included in the sample. Note that these inclusion expectations are the crucial determinant

for the sampling design, the exact design of the lottery-like mechanism is done here.

On the other hand, an estimator ^{ˆ}

θ(s) is a sample statistic that can be used for estimating the value

for a population parameter θ using sample s. Here, a sample statistic is a function that depends on the

values observed in the sample (Bethlehem, 2009). A basic example of a sample statistic is the sample

5

mean ¯y = ^{1 }^{∑}^{n}_{n i}_{=1 }y_{i}. The expectation and variance of the estimator ^{ˆ}

θ for a population parameter θ equal

E(^{ˆ}

θ) = ^{∑}

s∈∇ ^{ˆ}

θ(s)p(s), and V ar(^{ˆ}

θ) = ^{∑}_{s}_{∈∇}[^{ˆ}

θ(s) − E(^{ˆ}

θ)]^{2}p(s). As aforementioned, the combination of

sampling design p(s) and the estimator ^{ˆ}

θ is called the sampling strategy.

2.3 Sampling design

In this subsection it is assumed that step 1 and 2 as described in section 2.1 are already fixed.

While several sampling designs are considered in this subsection, the estimator is kept fixed. In the next

subsection the sampling design will be fixed and the estimator is discussed further. The basic estimator

used in this subsection is the Horvitz-Thompson estimator.

2.3.1 Horvitz - Thompson estimator

Let the population mean Y be defined as Y = ^{1 }^{∑}^{N}_{N k}_{=1 }Y_{k}. Using the first order inclusion expectation,

the Horvitz-Thompson (HT) estimator for the population mean is then defined as

ˆ_{¯}_{y}_{HT }_{= }^{1 }^{∑}

N

k∈s

y_{k}

π_{k}

. (2.1)

Note that here ^{ˆ}¯y_{HT }is one of the estimators ^{ˆ}

θ from the previous section. The HT estimator (2.1) is

a general estimator since it is design unbiased for all possible random sampling designs, such that

E(^{ˆ}¯y_{HT }) = Y . Its variance is V ar(^{ˆ}¯y_{HT }) = ^{0}^{.}^{5}

N^{2}

∑_{N }∑ ^{( )}^{2}

N

_{k}_{=1 }_{l}_{=1 }π_{k}π_{l}−π_{kl }, which can be estimated

by ^{̂}

V ar(^{ˆ}¯y_{HT }) = ^{0}^{.}^{5}

N^{2}

∑_{n}

k=1

∑_{n}

l=1

(

π_{k}π_{l}−π_{kl}

π_{kl}

)(

y_{k}

π_{k}

y_{l}

−

π_{l}

)(

Y_{k}

π_{k}

−

Y_{l}

π_{l}

^{)}^{2}. Note that the more proportional the inclusion

expectations are to the target variables, the smaller is the variance of the HT estimator. In the ideal case,

if the inclusion expectations are exactly proportional to the target variables, the ratios in the variance

formula become constant and hence the variance will be zero.

All the estimators discussed in this subsection are special cases of the Horvitz-Thompson estimator.

2.3.2 Simple random sampling without replacement

This brings us to the first sampling design, called simple random sampling. Here, let s be a random

sample without replacement. In simple random sampling, each population element has the same probability

to be drawn. The inclusion expectations are therefore computed easily since the set ∇ consists_{of }( ) ( ) ( )

N N−1 N−2

n

, such that there are exactly

n−1

samples containing element k, and

n−2

samples containing

elements k and l (Renssen, 1998). Therefore,

π_{k }= ^{n}_{N }, and π_{kl}

n(n − 1)

=

N(N − 1) ^{.}

6

Because the sampling is done without replacement, π_{kl }= π_{k }if k = l. Also, the ratio ^{n}

N

the sample fraction f since it is the ratio of sample size to population size.

Plugging π_{k }into the HT estimator ^{ˆ}¯y_{HT }leads to the conclusion that the HT estimator for the po-

∑_{n}_{i}_{=1 }y_{i}. The variance of the HT estimator is

pulation mean is equal to the sample mean ^{ˆ}¯y = ^{1}

n

is often called

given by V ar(^{ˆ}¯y_{HT }) = ^{1}^{−}^{f}_{n }Ψ^{2}, where f = ^{n}_{N }and Ψ^{2 }is the population variance defined as Ψ^{2 }=

(

Y_{k }− Y ^{)}^{2 2}

. Note that given Ψ , the larger sample size n the more precise the estimator

1

N−1

∑_{N}

k=1

gets. However, Ψ^{2 }is in general not available and therefore V ar(^{ˆ}¯y_{HT }) has to be estimated using the

(

y_{i }− ^{ˆ}¯y^{)}^{2 }which provides the unbiased estimator

sample variance ^{ˆ}

ψ^{2 }= ^{1}

n−1

∑_{n}

i=1

^{̂}V ar(^{ˆ}¯y_{HT }) = ^{1 }^{− }^{f}

n ^{ˆ}

ψ^{2}.

For a more detailed discussion, the reader is referred to chapter 3 of (Bethlehem, 2009).

2.3.3 Stratified simple random sampling

In case there exist homogeneous subpopulations or strata in the population of interest, it is possible

to use this information in the sampling design as auxiliary information. An example of homogeneous

subpopulations is the case when the subpopulation is divided in men and women, and it is known that men

have on average a higher salary than women. Stratification boils down to drawing a number of mutually

independent samples from each stratum, instead of only one sample for the whole population which is

the case with simple random sampling. As it will be shown, the main idea is to reduce variance because

the samples are drawn independently so the variance between the subpopulations is eliminated from the

sampling error. Hence, in case the target variable is homogeneous within the strata and heterogeneous

across the strata, it is valuable to use stratified sampling.

For stratified simple random sampling the population U is divided into M mutually exclusive subpopulations

U_{1}, . . . , U_{M }, which together should cover the entire population. If N_{h }is the size of stratum

h = (1, . . . , M), then it should hold that N_{1 }+ . . . + N_{M }= N. Accordingly, the population values of the

target variables in stratum h are denoted as Y ^{(}^{h}^{)}_{1 }, . . . , Y ^{(}^{h}^{)}_{N}_{h }, with Y ^{(}^{h}^{) }= ^{∑}^{N}^{h }^{(}^{h}^{)}

k=1

Y_{k }and Y ^{(}^{h}^{) }^{(}^{h}^{)}

Y

=

N_{h }^{.}

The sample is naturally also drawn from these strata, which results in the following notational adjustments.

First, the sample also consists of M mutually independent strata with sizes n_{1}, . . . , n_{M }, with

n_{1}+. . .+n_{M }= n. The target variables observed in the sample are denoted for stratum h by y^{(}^{h}^{)}_{1 }, . . . , y^{(}^{h}^{)}

n_{h }^{.}

The inclusion expectations per strata are similar to those of simple random sampling, except for the adjustment

for the strata. Now the set ∇_{h }consists of ^{( ) ( )}

N_{h }N_{h}−1

per stratum, such that there are

n_{h }n_{h}−1

samples

containing element k, and ^{( )}

N_{h}−2

n_{h}−2

samples containing elements k and l. Therefore,

π^{(}^{h}^{)}

k

n_{h}

= , π

N_{h}

(h)_{kl }_{= }n_{h}(n_{h }− 1)

N_{h}(N_{h }− 1) ^{, }^{and }^{π}^{(}^{hh}^{′}

)

kl

= π^{(}^{h}^{)}

_{k }× π^{(}^{h}^{′}

)

l

,

7

where h and h^{′ }are two different strata. The last inclusion expectation follows from the fact that the

strata are drawn independently. Note that the ratio ^{n}^{h}

N_{h}

is called the sample fraction f_{h}.

Plugging in the inclusion expectations, it follows that the HT estimator per stratum equals

ˆ_{¯}_{y}^{(}^{h}^{) }^{1}

HT

=

n_{h}

∑^{n}^{h}

i=1

y^{(}^{h}^{)}_{i }= ^{ˆ}¯y^{(}^{h}^{)}. (2.2)

The variance of this estimator is V ar(^{ˆ}¯y^{(}^{h}^{)}

HT

Ψ^{2 }^{∑ (}

N_{h}

h _{k}_{=1}

Y ^{(}^{h}^{)}_{k }− Y ^{(}^{h}^{)}^{)}^{2}. Since Ψ^{2}

h

=

1

N_{h}−1

using the sample variance in stratum h ^{ˆ}

ψ^{2}

h

estimator

1−f_{h}

) =

n_{h }^{Ψ}^{2}_{h}, where Ψ^{2}

h

is the variance in stratum h defined as

(h)

is in general not available, V ar(^{ˆ}¯y

HT

) has to be estimated

(

y^{(}^{h}^{)}

i ^{− }^{ˆ¯}^{y}^{(}^{h}^{)}

)_{2}

, which provides the unbiased

=

1

n_{h}−1

∑

n_{h}

i=1

_{̂}_{V ar}_{(ˆ¯}_{y}_{(}_{h}_{) }1 − f_{h}

HT

) = ^{ˆ}ψ

n_{h}

2

h^{.}

Note that so far the estimator and its variance were discussed at the stratum level. To get the HT

estimator for the population mean Y , the estimators at the stratum levels are simply added :

ˆ_{¯}_{y}_{ST R }_{=}

M_{∑}

h=1

N_{h}

N ^{ˆ¯}^{y}^{(}^{h}^{)}^{,}

and similarly the variance of this HT estimator and the estimate of this variance are obtained by

V ar(^{ˆ}¯y_{ST R}) =

M_{∑}

h=1

N ^{2}

h_{N }_{2 }V ar(^{ˆ}¯y^{(}^{h}^{)}), and ^{̂}

V ar(^{ˆ}¯y_{ST R}) =

M_{∑}

h=1

N ^{2}

h

N ^{2 }^{̂}

V ar(^{ˆ}¯y^{(}^{h}^{)}).

These properties hold because the samples for each stratum h are drawn independently, and the variance

of the sum of independent random variables is the sum of the variances. This is the main strength of stratified

simple random sampling, since only the variance within the strata is taken into the variance. The

variance between the strata is eliminated from variance of the estimator, which improves its precision.

Therefore, when looking for auxiliary variables as stratification variables one should look for subpopulations

which are as homogeneous as possible.

The allocation issue concentrates on the distribution of the stratum sample sizes n_{1}, . . . , n_{M }. Two methods

to allocate the total sample size n over the M strata are known as the Neyman-allocation and the

proportional allocation, respectively :

n_{h }=

N_{h}Ψ_{h}

∑_{H}

h=1 ^{N}^{h}^{Ψ}^{h}

n, and n_{h }= ^{N}^{h}

N ^{n.}

It can be shown that the optimal allocation, or the allocation with minimum variance, is obtained if

the Neyman-allocation is used. However, this requires that Ψ_{h }is known. If that is not the case, the

proportional allocation can be used, which assumes that the variances of all strata have the same order

8

of magnitude. Note that this simplifies the first order inclusion expectations to π_{k }= ^{n}

N

, such that

irrespective of their stratum all population elements have the same first order inclusion expectations. A

further discussion of stratified sampling is found in Bethlehem (2009).

2.3.4 Two stage sampling

A more general sampling design is known as two stage sampling. Here, in the first stage there is

a sample drawn from M subpopulations, whereas in the second stage a sample is drawn within the

subpopulations selected in the first stage. The units drawn in the first stage are referred to as primary

sampling units, whereas those drawn in the second stage are called secondary sampling units. An example

is when the primary sampling units are Dutch municipalities and the secondary sampling units are

households, such that in the first stage a number of municipalities is drawn and then a sample from the

households in the selected municipalities is taken. Note that stratification as discussed previously is a

special case of two stage sampling, namely when all primary sampling units are selected, and a sample

is drawn from the secondary sampling units.

Cluster sampling does exactly the opposite : a sample is drawn from the primary sampling units, and

from the selected primary sampling units all secondary sampling units are observed. While stratified

sampling is relatively precise compared to simple sampling, both have a common disadvantage. Namely,

when taking a sample there is a considerable amount of travel time because the respondents are located

throughout the whole country. Two stage cluster sampling tackles this problem by drawing a random

sample from the primary sampling units. Because the elements are only observed within the selected

clusters, the traveling time is smaller compared to stratified sampling. However, this also increases the

variance since the observations under cluster sampling are less spread out over the population since only

a few primary sampling units are selected. For the formulas for cluster sampling and two stage sampling,

including the HT estimators, the reader is referred to Särndal et al. (1992).

2.4 Estimators

In the previous section several sampling designs were considered and two ways of improving the precision

of the results by incorporating auxiliary information in the design stage has been discussed, while

the estimator has been kept fixed to the Horvtiz-Thopmson estimator. On the contrary in this subsection

the auxiliary information is implemented in the estimator, while the estimators for the sampling designs

considered in the previous subsection will be presented.

Following the derivations of Renssen (1998) and Särndal et al. (1992), using (2.1) the generalized regression

estimator is defined as

ˆ_{¯}_{y}_{greg }_{= }ˆ_{¯}_{y}_{HT }_{+ }^{ˆ}

b^{′}( ^{¯}

X − ^{ˆ}¯x_{HT }), where ^{ˆ}

b =

(_{∑}

x_{k}λ_{k}x^{′}

k

)_{−}_{1}

∑

x_{k}λ_{k}y_{k}

π_{k }π_{k}

k∈s _{k}_{∈}_{s}

. (2.3)

9