Discontinuities due to survey redesigns : a structural time series approach
Here, x1, . . . , xn are the sample observations for the auxiliary variables and ˆ¯xHT is the HT estimator of
the known ¯
X. Särndal et al. (1992) proposed that λk = 1
σ2 can be seen as the variance of the independent
random variable Yk defined in a superpopulation model ℜ where yk are assumed to be the outcomes.
More specific, the choice of ˆ
b and σ2
is based on an assumption about the shape of the finite population
(Yk, Xk), k = 1, . . . , N.
As proposed by Särndal et al. (1992), this assumption is expressed in terms of model ℜ :
y1, . . . , yN are considered as realized values of independent random variables Y1, . . . , YN ,
Eℜ (Yk) = ♭txk, and V arℜ (Yk) = σ2
k = 1, . . . , N
Here, ♭ and σ2
1, . . . , σ2
N are the model parameters. In case of a hypothetical complete enumeration of the
entire population, the weighted least squares estimator of ♭ would have been
It can be shown that B is the best linear unbiased estimator of ♭ under the model. However, the entire
population cannot be observed, such that B is an unknown population characteristic. Using sample s,
B can be estimated by ˆ
b as defined in (2.3).
The variance of ˆ¯ygreg is V ar(ˆ¯ygreg) = 0.5
V ar(ˆ¯ygreg) = 0.5
∑n ∑ ( )(
n πkπl−πkl ξk ξl
πkl πk πl
πkπl − πkl πk
)2, which can be estimated
. Here, Ξt = Yt − BXt and ξt = yt − ˆ
bxt are the
population and sample residuals, respectively. Note the difference with the variance formulas of the HT
estimator, where instead of the residuals the target variables are used. Hence the variance of the general
regression estimator is strictly lower than the variance of the HT estimator, if model ℜ is properly defined
(Särndal et al., 1992).
The estimator (2.3) can be also represented in terms of weights :
where wk = 1
1 + λkx
ˆ¯ygreg = ∑ wkyk, (2.4)
∑ ( xkλkx′
)−1 ( )
¯X − ˆ¯xHT
Here, wk are called final weights,
πk the inclusion weights and gk the correction weights or simply
g-weights. More on the general regression estimator and its alternative expressions can be found in paragraphs
6.4 - 6.6 of Särndal et al. (1992).
For simple random sampling it holds that
ˆ¯ygreg = ˆ¯y + ˆ
X − ˆ¯x), (2.5)
b as in (2.3). Note that in case no auxiliary information is available, (2.5) boils down to the sample
mean ˆ¯y as discussed in section 2.3.2. For stratified sampling two estimators exist, depending on whether
all strata are estimated separately or on the national level :
(h) + ˆb′
Xh − ˆ¯x(h)
) , or ˆ¯ycomgreg = ˆ¯yST R + ˆb′( ¯
X − ˆ¯xST R), (2.6)
∑ xkλkyk, and ˆ
In equation (2.6), the first estimator refers to the seperated generalized regression estimator, whereas
ˆ¯ycomgreg refers to the combined general regression estimator.
3 Data on victimization
3.1 Application on data of Statistics Netherlands
The theory described in the previous section is used in the data gathered by Statistics Netherlands.
More specific, Statistics Netherlands collects crime statistics from 1980 onwards. Since then in total five
different survey designs have been used, which are summarized in table 1. The variable considered in this
paper is victimization, and all surveys in table 1 collected data on this variable. Victimization is defined
as the percentage of the Dutch population which once or more has been a victim of a criminal offense.
The reference period of the surveys is 12 months before the interview date, unless stated otherwise.
Below the surveys are discussed in greater detail. Note that people in a detention house and psychological
institutions have been excluded from the target population for all surveys.
CSV survey design
The CSV has been conducted yearly from 1981 to 1985 and two-yearly from 1987 to 1993. The main
challenge was the collection of victimization data of people residing in a Dutch household of 15 years
and older. The CSV was conducted in the first quarter of the year following the reference year, hence
in January-March 1991 the data over 1990 was collected (Huys and Rooduijn, 1993). The interviewees
were visited at their homes and except for 1991 and 1993 the data was collected manually on paper.
In 1991 and 1993 the data was collected using Computer Assisted Personal Interviewing (CAPI). The
sample frame consisted of addresses as provided by the Dutch post office. With the CSV, the sampling
Table 1 – Data Summary
Survey Collection period Data collection mode English/Dutch translations
CSV 1980-1993 CAPI Crime Victims Survey/
Enquête Slachtoffers Misdrijven
LPSS 1992-1996 CAPI Legal Protection and Safety Survey/
Enquête Rechtsbescherming en Veiligheid
PSLC 1997-2004 CAPI Permanent Study Living Conditions/
Permanent Onderzoek Leefsituatie
SM 2005-2007 CATI/CAPI Safety Monitor/
(mixed mode) Veiligheidsmonitor Rijk
ISM 2008-current* CAWI/PAPI or CATI/CAPI Integral Safety Monitor/
(mixed mode) Integrale Veiligheidsmonitor
SMIV 2008-2010 CATI/CAPI Safety Monitor/
(mixed mode) Veiligheidsmonitor Rijk
* In this paper the ISM is only considered until 2010.
design used was stratified sampling, where using stratification variables gender, age, marital status, urbanization
and region in total 154 strata were obtained. The sample size varied between 4000 and 10000
respondents on a yearly basis (Statistics Netherlands, 2011b).
LPSS survey design
The LPSS was conducted on a yearly basis between 1992 and 1996, such that it was run parallel to the
CSV for one year. Both surveys focused on collecting crime statistics. The target population of the LPSS
consisted of persons of 15 years and older in private household in the Netherlands, and if possible two
respondents were interviewed per household. The sample frame also consisted of addresses as provided
by the Dutch post office. On the contrary to the CSV, the LPSS has been collected throughout the entire
calender year. The reference period was changed to 12+i months, where i was the month the survey
was taken. Hence, the reference period varied between 13 and 23 months. Furthermore, the entire survey
has been collected by CAPI. With the LPSS, two stage stratified sampling was used as sampling design,
where using four stratification variables in total 80 strata were obtained (Huys and Rooduijn, 1993). The
sample size varied around 5000 respondents on a yearly basis (Statistics Netherlands, 2011a).
PSLC survey design
The PSLC survey is a ”module-based integrated survey combining various themes concerning living
conditions and quality of life” (van den Brakel et al., 2008). Hence, the PSLC is a more general survey
on living conditions where the Justice and Security module (JSM) focused on publishing crime statistics.
An age difference in the target population has been introduced in the PSLC, lowering it from 15 years
and older to 12 years and older. Similar to the LPSS, this survey is collected throughout the calender year
using the CAPI interview method. As sampling design two-stage stratified sampling has been used with
Dutch provinces and municipalities as stratification variables. For the PSLC, the proportional allocation
scheme was used, as described in section 2.3.3. Being collected from 1997 to 2004, it had a relatively
high yearly sample size varying around 10000 (Statistics Netherlands, 2011d).
SM survey design
To reduce the response burden and costs, the JSM of the PSLC and the survey Population Police Monitor
(PPM) which was conducted by two Dutch ministries and collected data on the same topics, were
merged in 2004 into the SM, which would be conducted by Statistics Netherlands. Another purpose of
this merge is to obtain one consistent victimization rate. Whereas the PSLC was a more general survey,
the SM is again focused on collecting crime statistics such that the questionnaire and the context of the
survey were changed compared to PSLC (van den Brakel et al., 2008). Questions from the JSM were
skipped and questions from the PPM were added. SM has been collected from 2005 to 2008 in January
- March using a mixed mode survey design. First, all the interviewees of which Statistics Netherlands
possesses the phone number are called (from 2007 on both including a landline and a mobile phone) in
order to complete a questionnaire using Computer Assisted Telephone Interviewing (CATI), and the rest
is visited by interviewers at their homes to complete a questionnaire using CAPI. With the SM the target
population changed to the entire population of all persons of 15 years and older. Stratified two-stage
sampling was used for the SM, whereas the sample sizes between the strata were set equal, independent of
the stratum size. Therefore, on stratum level the sampling error was equal. This allocation deviates from
the allocations as described in 2.3.3, which are optimal for estimation at the national level. The sample
size in 2005 was 5000, whereas in the following years it fluctuated around 20000 (Statistics Netherlands,
ISM survey design
The ISM has a more complex set up compared to its predecessors. The national sample is conducted by
Statistics Netherlands and some local authorities conducted extra samples on a local level. The target
population of the ISM consists of the entire population aged 15 years and older. The numbers were
collected from September until December on a yearly basis from 2008 onwards.
The data collection method for the national sample conducted by Statistics Netherlands is set up as follows
: first the interviewees are given the opportunity to fill in the surveys electronically using Computer
Assisted Web Interviewing (CAWI) or on paper using Paper and Pencil Interviewing (PAPI). After two
reminders the non responding interviewees are either called (CATI) or visited at their homes (CAPI),
depending on whether Statistics Netherlands is in possession of their phone numbers. The sample size
of the ISM as conducted by Statistics Netherlands has been constant throughout the years at roughly
Local authorities also contributed to the ISM by local oversampling. Oversampling occurs when extra
samples are drawn next to the original sample. The sample size of the ISM conducted by the local authorities
varied from year to year between 20000 and 180000. The amount of oversampling varied between
the different Dutch municipalities within a particular year as well. Also, the allocation of the respondents
between the data collection modes changed from year to year. The data collection modes used by local
authorities are CAWI, PAPI and CATI, such that CAPI is not used (Statistics Netherlands, 2011c).
The ISM had by far the largest sample size of all surveys considered, ranging from 40000 to 200000
Concerning the sampling design, two-stage stratified sampling has been used with ISM, as described in
section 2.3.4. The sample sizes were set equal for all strata, independent of stratum size. The oversampling
by the local authorities has an effect on the variance of the HT estimator. As aforementioned the
HT estimator is designed in such a way that the more the inclusion expectations πk are proportional to
the target variables, the smaller the variance of the HT estimator. However, the oversampling in the ISM
was not performed proportionally over the entire population. Only in some municipalities oversampling
occurred. In the variance formula of (2.1) this means that V ar(ˆ¯yHT ) is not minimized, and therefore the
HT estimator (2.1) is not variance reducing in the case of ISM.
Additionally, parallel to ISM, between 2008 and 2010 the SM has also been conducted with a smaller
sample size of around 6000 respondents on a yearly basis. The adjusted SM has been collected in the
fourth quarter of the year (September-December) instead of the first quarter (January-March). Therefore,
the SM collected in the period 2008-2010 will be referred to as SMIV . By conducting SMIV , Statistics
Netherlands always had a back up survey and comparison material in case something would go wrong
with ISM. In this paper SMIV is used as a parallel run for the ISM, which will be discussed in the section
on state space models.
The estimation of the victimization rates for all surveys discussed in this subsection is done by the
generalized regression estimator discussed in section 2.4. Using this methodology, for every respondent
in the sample a weight is calculated such that the sum of the weighted target variables results in an approximately
unbiased estimator for the unknown population parameters. This so-called weighting scheme
accounts for the differences in response between respondents by using auxiliary variables such as age, sex,
income and data collection mode. For the exact weighting schemes the reader is referred to the yearly
National Reports on the crime surveys of Statistics Netherlands, such as Statistics Netherlands (2008)
for the SM 2008 and Statistics Netherlands (2010) for the ISM 2009.
3.2 Time series
In this subsection the time series analyzed in this paper are discussed in depth.
3.2.1 Data on victimization
As aforementioned, this paper uses the victimization series as collected by Statistics Netherlands from
1980 onwards. The victimization series are classified in five subcategories or breakdowns, given in table 2.
The sum of the separate breakdowns does not equal the value for total victimization, due to the definition
of victimization. Namely, if a respondent has been victim of e.g. a property offense and vandalism, he
or she is taken only once in the total number of victimization. In such a way, multiple crimes are taken
only once in the total. This means that the value of the sum of all breakdowns is necessarily higher
than or equal to the total value. The complete list of crimes belonging to each breakdown, as well as the
questionnaires belonging to the respective surveys can be found on www.cbs.nl/en-gb or in the national
reports and report tables which are published yearly by Statistics Netherlands.
The victimization rates are also classified in subpopulations, such that it is possible to investigate whether
there are any differences between population classes. Three classifications are represented in table 3. The
subpopulation classification used in this paper is the gender classification, since for the age subpopulation
the classes are different in the LPSS and PSLC and the urbanization subpopulation was not available.
For the ISM, the numbers of the subpopulations for the year 2010 are not available yet.
Table 2 – Breakdowns in types of victimization
Geweldsdelicten Violence offenses
Vermogensdelicten Property offenses
Doorrijden na een aanrijding Failure to stop after an accident
Overige delicten… Other offences…
… Not available for the subpopulations in table 3
Table 3 – Subpopulations of victimization
Subpopulation type Number of classes Classes
Gender 2 Male, Female
Ubranization 5 Very highly urban, Highly urban, Moderate urban,
Little urban, Not urban
Age… 8 15-18, 18-25, 25-35, 35,45, 45-55, 55-65, 65-75, 75+
… This subpopulation is different for the LPSS and the PSLC
For the CSV, the first survey design, it appeared that the definition of the breakdowns differed
substantially from the definition of the breakdowns in the other four surveys, such that it was not
possible to compare them in any way. Therefore, it was decided to disregard the CSV survey from the
analysis. Additionally, the ISM did not report the values for Failure to stop after an accident.
Failure to stop
1995 2000 2005 2010
*A vertical line represents the introduction of a new survey.
Figure 1: Victimization data : 1992-2010
Total victimization series
1995 2000 2005 2010
1995 2000 2005 2010
Figure 2: Total victimization series and police series : 1992-2010
On the first page of the Appendix, in table 12, the victimization series from 1992 to 2010 with its
breakdowns used in this paper are shown, as well as in figure 1 in this section. At the point where in
figure 1 the vertical lines hit the series, a new survey is introduced : the PSLC in 1997, the SM in 2005
and the ISM in 2008. The parallel run of the SMIV with the ISM in 2008-2010 is displayed with dots. It
seems that the series makes a jump most of the times a new survey is introduced.
3.2.2 Police series
Additionally, the Dutch police also offers data on victimization. The police series is not a survey, but
a registration. It is the number of criminal offenses as reported in the Dutch police stations and therefore
it does not contain any sampling error. The police series was redesigned in 2005 after a new computer
system was introduced, so this also gives rise to a possible discontinuity. The police reports the series
as the number of offenses per 1000 inhabitants. It was available for the entire time span from 1992 to
2010. The series is represented together with the total victimization series in figure 2. Note the similar
movements of both series and the rise of the police series in 2005, the year the series was redesigned.
Looking at figure 1 it seems that, especially concerning the total number of victimization, the value
of victimization tends to increase every time a new survey is introduced. However, it is not clear whether
these increases are due to the development of the victimization variable or because a new survey has been
introduced. As discussed in paragraph 3.1, none of the surveys considered in this paper have had exactly
the same survey design. More specific, several elements have been varying across surveys, including the
target population, data collection mode, the exact period in which the survey was taken and whether or
not the survey was combined with an other survey. These factors could lead to the observed differences
in figure 1 in the following ways.
1. Differences between target populations
The main difference in the target population in the series considered is the larger target population
of the PSLC, starting from 12 years instead of 15 years old. In such, it is possible that the rates
are different because the target population is larger.
2. Differences in data collection mode
The differences in data collection modes are probably one of the main reasons behind the discontinuities.
Namely, as it has been shown by de Leeuw (2005) and Dillman and Christian (2005) it
appears that the so called mode effects appear to have a large influence on the way the interviewee
answers the questionnaires. For example, compared to CATI, the interview speed with CAPI is
lower, which might result in a lower measurement error. An other example is that with electronic
questionnaires, i.e. CAPI, CAWI and CATI, more advanced set ups of questionnaires are possible
because once the respondent answered ”No” some questions can be skipped. This is not possible
with PAPI. Then, under CAPI fewer socially desirable answers are obtained due to the personal
contact with the interviewer.
3. Differences between data collection periods
The data collection period varied across the surveys considered in this paper, ranging from a short
period of a couple of months (SM, ISM) to the entire year (LPSS, PSLC). Moreover, the SM has
been conducted from January to March, while the ISM is conducted from September till December.
Note that since the respondent is asked about criminal offenses in the past year, seasonal effects
should not be present.
4. Differences between context of surveys
An example of a survey redesign which could have an effect on the parameter values is the changeover
from the PSLC to the SM. The PSLC also contained topics not directly related to criminal
offenses like health care and living conditions, whereas the SM was explicitly focused on criminal
offenses. The context of the survey is changed, such that a victim of a criminal offense is more
likely to participate in the SM than in the PSLC. Note that this effect can be seen as a response
5. Differences in questionnaire design
In general, with every new survey design the questionnaire changed as well. This was less the case
with the PSLC, where the questions in the JSM stayed more or less the same with the LPSS.
Continuing on the previous example, the questionnaire of the PSLC focused on more general
subjects, whereas in the SM only questions related to crime were asked. Additionally, with the ISM
the questioning on car theft and vandalism changed severely compared to SM. These redesigns
might have systematic effects on the outcomes of these surveys.
Another example of a change in the questionnaire is to add or delete questions in order to reduce socalled
telescoping effects, which happen when respondents mix up events in the past. More specific,
if one is asked on criminal offenses in the past 12 months, and that person has had a criminal offense
13 months ago, he or she might still report that offense. To minimize such telescoping effects, in
the questionnaire the first questions should structure the respondents mind, and then the questions
focused on events in the last 12 months should be asked.
4 Quantifying the discontinuities
The preceding effects of a survey redesign might result in discontinuities. Users of official statistics
are usually not only interested in the value of a target parameter at one time point, but also in the
development through years. When a series contains such a discontinuity, this comparison is not possible.
Several methods are developed in order to quantify these discontinuities.
The recalculation method can be applied if the micro data of the survey is consistent after the redesign.
That is if for example only the classification of publication domains of the victimization variable has been
changed. In that case, one can still use the data observed, and then simply apply the new classification
system using a domain indicator variable (van den Brakel et al., 2008).
However, in many real life situations the micro data collected is not consistent after a redesign for one
of the reasons mentioned in section 3.3. In that case two other methods can be applied. One approach is
to conduct an experiment, where the regular and new survey are run concurrently for some period. This
allows the researcher to estimate the main survey parameters under both survey designs and test whether
these parameters are significantly different from each other. Another feature of the experimental approach
is that in case the new survey fails to produce an appropriate estimate for the parameter of interest,
there is always a backup survey which still does produce the correct estimate. More on the experimental
approach can be found in van den Brakel et al. (2008). A major drawback of the experimental approach is
that two surveys have to be conducted, which is not always possible since statistical offices are generally
bounded by a certain financial budget.
An alternative to the experimental approach is the structural time series approach. With this approach
no parallel run is necessary. This approach is discussed further in this section and will be used to analyze
the victimization series discussed in section 3.2.1.
First, section 4.1 will discuss the notations used in this section. Then, in section 4.2 the structural time
series framework and the corresponding state space representation as discussed by Durbin and Koopman
(2001) will be addressed. Section 4.3 will present the application of the Kalman filter and the parameter
4.1 Notational conventions
The notation as discussed in section 2.2 and used in section 2 is applied in this section as well.
Hence, an uppercase letter will refer to a population parameter, and lowercase letters will refer to sample
parameters. The circumflex in ˆy is used to denote to estimators.
In this section a special font is used to denote only matrices. For example, a k by k identity matrix is
represented by Ik and the system matrices in the state space model will be represented by Zt and Tt.
Exact definitions of these matrices will be provided at their first use in the discussion.
4.2 Time series analysis
4.2.1 Structural time series models for sampling surveys
Structural time series models as described by Durbin and Koopman (2001) propose that a series yt
can be decomposed in a trend component µt, a seasonal component ιt, other cyclic components γt and