Document Text (Pages 31-40) Back to Document

Discontinuities due to survey redesigns : a structural time series approach

Page 31

4.3.3 Missing values

Independent of the type of information set the analysis is based on, either filtering or smoothing, one
has almost always to deal with practical problem of missing data. For several reasons, it is difficult to
obtain complete datasets, generally because of the processing time the statisticians need to publish the
data. Using the Kalman filter, one can easily deal with this problem.
Suppose that the data ˆyj, with j = {tm1, . . . , tm2 1} are missing for 1 < tm1 < tm2 t. To get the
equation for smoothing and filtering for these missing values, note that a missing observation implies
that the innovations vt and the corresponding inverse variance matrix F1
t
this to the Kalman fliter in (4.18) leads to
are equal to zero. Applying

at+1 = Tat, Pt+1 = TPtT+ Qt, t = tm1, . . . , tm2 1 , (4.20)

and similarly the smoothing recursions in box (4.19) become for the missing observations

wt1 = Twt, Nt1 = TNtT, t = tm2 1, . . . , tm1

. (4.21)

Other relevant formulas for smoothing do not change and can be applied directly.

4.3.4 Diffuse priors

In the previous sections it was assumed that the distribution parameters a1 and P1 of α1 were
known. In such, the recursive formulas (4.18) and (4.19) could be applied. However, in many real life
applications this is an unrealistic assumption, usually the distribution of α1 is not known. Durbin and
Koopman (2001) proposes to represent α1 as having a diffuse prior density. That means that one should
fix a1 at some value and let P1 → ∞. Then, the Kalman filter determines the values starting from k = 2
and after that the standard recursive formulas for filtering and smoothing can be applied. This process
is called diffuse initialization of the Kalman filter.

4.3.5 Parameter estimation

The covariance matrices Ht and Qt in 4.17 are not known since they cannot be observed. However,
the Kalman filter assumes these parameters, also called hyperparameters, are known and therefore they
have to be estimated using maximum likelihood estimation. Depending on whether the initial conditions
are known or diffuse, several forms of the likelihood function exist. Because of the assumption that the
innovations are Normal and i.i.d. distributed, the likelihood function for the Normal distribution can be

30

Page 32

used. Then, let the classical likelihood be represented by
t
Ly) = py1, . . . , ˆyt) = py1) pyt| ˆYt1),

t=2

where ˆYt1= {y1, . . . , yt1} is the information set for y up until t 1. Instead of the original likelihood
usually the loglikelihood is considered :
t
log Ly) = log pyt| ˆYt1),

t=1

where py1| ˆY0) = py1).

Relating to the model considered in this paper, first assume that the initial state vector has a known
distribution N(a1, P1). Note that Eyt| ˆYt1)= Ztat, vt = ˆyt Ztat and Ft = V aryt| ˆYt1), substitute
N(Ztat, Ft) for pyt| ˆYt1) in the log likelihood function above such that it becomes

t × k
log Ly) = log (2π)
2
1
2
t

t=1

(log |Ft| + v
tF1
t vt), (4.22)

where Ft is nonsingular. Because the quantities Ft and vt are computed by the Kalman filter, the log
likelihood is easily computed from the Kalman filter output.
In the case not all elements of the initial state vector are known, the log likelihood is changed as well
since it has to be adapted for its missing elements. More specific, log L(vt) becomes log L(v)d to account
for d unknown diffuse hyperparameters :

t × k
log Ly)d = log (2π)
2
1
2
t

t=d +1

(log |Ft| + v
tF1
t vt).

Hence, Koopman et al. (2008) assumes that for diffuse priors the summation in (4.22), which runs from
1 to t, should start with d +1 since for the first d summations the sum will be approximately zero. Now,
the Kalman filter can be applied since the hyperparameters are estimated. The details of the derivations
and extensions are discussed in chapter 7 of Durbin and Koopman (2001).

4.3.6 Diagnostic Tests

The core of the foregoing Gaussian state space model relies the assumption that the disturbances ϕt
and ηt are Normally distributed and serially independent with constant variances. To check for these
assumptions, this section provides several tests. First, note that the standardised one-step forecast errors,
defined as

ut = vt

, t = 1, . . . , t. (4.23)

ft

31

Page 33

also have to be normally distributed and serially independent. Here, the standard error of vt is an entry
in the matrix Ft and denoted by ft. Note that in the diffuse case, (4.23) starts at t = 2 for all tests
considered in this section. The following tests can be applied.
Normality
The first four moments are defined as

m1 = 1
t

t

t=1

ut, mq = 1
t
t

t=1

(ut m1)q , q = 2, 3, 4.

Then, the standard statistic indicators skewness S and kurtosis K are defined as

S

m3

m

3
2

, K m4
m2 .

2

When these assumptions are valid, S should be around 0 and K around 3, or

S N
(
0, 6 ) (

, K N 3,
t
24 )
t

Usually for convenience instead of the kurtosis, the excess kurtosis is considered which is defined as
EK = K 3. Standard statistical tests can be performed to check for significance of skewness and
kurtosis. So the confidence intervals for S and EK are :
√ √ √ √
6 6 24 24
CIS = [1, 96 , 1, 96 ], and CIEK = [1, 96 , 1, 96 ] (4.24)

t t t t

If the skewness and kurtosis are within the boundaries of their confidence intervals, there is no reason to
suspect them of violating the normality assumption.
Additionally the so-called QQ-plot is a graphical method to compare two distributions. In this paper,
the ordered residuals of the standardized innovations are plotted against their theoretical quantiles of
a Normal distribution. Under perfect circumstances the match is one-to-one and is represented by a
45-degree line. Hence, the further away the QQ-plot from the 45-degree line, the worse is the match of
the ordered residuals.
Heteroscedasticity test
The distribution of the standardized forecast errors ut is assumed to be homoscedastic, or have the same
finite variance. A simple test for heteroskedasticity is defined as

HSK(b) is F b
b

HSK(b) =
t
t=tb+1 u2
t

b
t=1 u2
.

t

distributed with b degrees of freedom, under the null of homoscedasticity. b has to be
chose in such a way that the standardized innovations are roughly divided in three parts. In this paper

32

Page 34

b = 7, such that it results in the following confidence interval :

CIHSK = [F 7
7( a
2 ), F 7
7(1a
2 )]. (4.25)

Here F refers to the F distribution and a is the significance level, which is equal to 0.05 in this paper.

Serial correlation
Durbin-Watson (DW) tests have been performed to check for serial correlation :

DW =

t

t=2+d(ut ut1)2
t
t=1+d u2
,
t

where d is the number of innovations skipped due to the initialization of the Kalman filter, in this paper
d=1. It should hold that DW N(2, 4). The confidence interval is computed as

t

√ √
4 4
CIDW = [2 1, 96 , 2 + 1, 96 ], (4.26)

t t

with a significance level a = 0.05 (van den Brakel et al., 2011). Additionally, ACF graphs can be used
to check for serial correlation as well. The confidence intervals for the ACF graphs are computed as

[2
t , 2

t ]. (4.27)

One-step forecast errors
Finally, the one-step forecast errors ut can be plotted against the time span, such that graphical outliers
can be detected. Even though it is not a test, it might help in case an extreme observation distorts the
analysis of entire sample. Generally the values of ut should be between 2 and 2.

5 Application and results

In this section, the structural time series approach as described in section 4 is applied to the victimization
data set as described in section 3. The programming language used to build the model is Ox, and
the Kalman filter is implemented using the specialized library of Ox called Ssfpack 3.0. The reference
for this library is Koopman et al. (2008), where all the necessary functions and input requirements are
described. Note that since no information was available on the starting values, diffuse priors have been
used. The system matrices T and Zt as described in the previous section are stacked in one matrix Φ and
the covariance matrices Ht and Qt, are stacked in one matrix called Ω. If the target parameters are not
correlated, Ω is a diagonal matrix, if they correlate there are coefficients at the respective off-diagonal
entries. The missing values such as the values for Failure to stop after an accident for ISM, can be
modeled by leaving the entries empty in the original dataset.

33

Page 35

The maximization procedure as described in 4.3.5 is already implemented as the numerical maximization
function MaxBFGS in the standard Ox library maximize, and therefore can be applied directly.
The MaxBFGS function, named after Broyden-Fletcher-Goldfarb-Shannon, also returns the degree of
convergence ranging from ”no convergence” to ”strong convergence”.
This section is set up as follows. First, the basic models used in this section will be described in section
5.1. Some improvements for these models are presented in section 5.2. Then, section 5.3 will present the
estimated discontinuities obtained using the Kalman filtering and smoothing recursions. Section 5.4 in
turn discusses the diagnostic tests performed on these models, whereas section 5.5 will present the final
model choice.

5.1 Description of the models

For the victimization time series four models have been developed, starting from a basic one and
building up to more complex ones. As aforementioned, it is assumed that the series can be decomposed in
a stochastic trend, a level intervention variable and an irregular term. Additionally, explanatory variables
can be added. No seasonal or cyclical components are included since the data is observed yearly, as it
has been discussed in section 4.2.2. Next to these four models, the subpopulations have been investigated
as well, as discussed in section 4.2.4. The dataset consists of the surveys LPSS, PSLC, SM and ISM,
conducted in the years 1992 to 2010. In such, there have been three survey redesigns which are denoted
in this section by

– Intervention 1 : LPSS to PSLC,
– Intervention 2 : PSLC to SM,
– Intervention 3 : SM to ISM.
The estimated coefficient ˆ
βk of the intervention variable models the difference between the introduced
survey, hence the PSLC, the SM or the ISM, and the LPSS.
The first model called M1, consists of the original victimization series. Its time span is from 1992 to
2010, containing the surveys LPSS, PSLC, SM and ISM. The state space representation of M1 is given
in (4.7), with its entries as defined in (4.9). The total victimization series is divided in five breakdowns.
As a result, ˆyt contains six series, such that k = 6.
The second model M2 coincides with M1, except that in the period 2008 - 2010 instead of the ISM the

SMIV is used. Recall from section 3.1 that the SMIV only differs from the SM by a lower sample size.

Therefore, M2 can be seen as a model check since it is expected that there are no discontinuities between
SM and SMIV . Also for M2, k = 6.
The third model M3 again contains the surveys LPSS, PSLC, SM and ISM. Additionally, in M3 an
explanatory series is introduced in the form of a time varying intervention variable as defined in (4.11).
The state space representation of M3 is given in (4.7), with its entries as defined in (4.9) except for

34

Page 36

Intervention 3 where δi
t is defined in (4.11) with ˆydif,t = ˆyISM,t ˆySMIV ,t. Here, the SMIV is defined in
section 3.1. The reason for choosing for the difference between the ISM and the SMIV as an explanatory
variable in the intervention variable, is because this difference denotes the discontinuity between ISM
and SMIV . This might result in better model fits. Also for M3, k = 6.
Model four, denoted as M4, expands M3 by adding the police series as described in section 3.2.2 to the
model. The state space representation of M4 is given in (4.7), with its entries as defined in (4.12), where
there is correlation between the irregular terms of the trends of total victimization series and police series.
Similar to M3 the difference ˆydif,t = ˆyISM,t ˆySMIV ,t is also implemented in the intervention variable
for Intervention 3 as defined in (4.11). Note that for M4 the police series is treated as an explanatory
variable such that k = 7. The implementation of M4 has been discussed in detail in section 4.2.4.
At last an extension to subpopulations has been implemented in M5. The subpopulations classified on
age were not comparable due to different age classes between surveys, and the data for urbanization
were not available. Therefore, the gender subpopulations male and female have been implemented, to
check for differences in discontinuities in subpopulations. Recall from section 3.2.1 that the data for the
subpopulations ranges from 1992 to 2009, since the data for 2010 is not available. The model applied to
M5 is (4.7), with its entries as defined in (4.15). As discussed in section 3.2.1, series for gender differed
with the original victimization series since instead of five there are only four breakdowns, the breakdown
”Other offences” is left out. For the gender subpopulations H = 2, such that as discussed in section
4.2.4, k = 5 × 3 = 15.

Several transformations have been applied to the original victimization series as provided by Statistics
Netherlands. First, the original dataset has been divided by 100 to prevent numerical overflows of
the maximization procedure. The presented results have transformed back by multiplication with 100
again. Then, within the models M1 to M4, a log and a logit transformation has been applied to the
series, namely lnyt) and ln(

ˆyt
100ˆyt

). The logit transformation is developed for proportions, such as the
victimization series. For log and logit transformations M3 and M4 have to be adjusted in the following
way. To get the difference ˆydif,t = ˆyISM,t ˆySMIV ,t in the intervention variable (4.11), it has to be put
as a ratio : ˆydif,t =

ˆyISM,t

. Taking the log of this ratio when implementing it in the model, results in

ˆySMIV ,t

the difference lnydif,t) = lnyISM,t) lnySMIV ,t).
As discussed in section 3.1, the amount of oversampling by local authorities in the ISM is fluctuating
sharply regionally. Also, CAPI was left out as a data collection method. In this case, the assumption that
variance estimates are inversely proportional with sample size might be invalid. Therefore, for models
M1, M3, M4 and M5, next to the original file for sample sizes an adapted file for sample size has been
used where the sample sizes of the ISM are adjusted to 19000 for all years, marked by . This is the
sample size of the part of the ISM conducted by Statistics Netherlands, as discussed in section 3.1. It is

35

Page 37

5.2 Model improvements

5.2.1 Likelihood ratio tests

This section discusses the applied Likelihood Ratio Tests as discussed in section 4.2.5. Recall that in
M1 - M5 every variance of an irregular term is considered separately. That is, for every breakdown k,
both the the variances of the irregular terms of the transition equation and the measurement equation
are estimated independent from each other. The number of hyperparameters to be estimated in such
becomes 12 for models M1, M2 and M3, 15 for model M4 and 10 for M5. To lower the number of
hyperparameters to be estimated, it has been considered to assume that the variances of the irregular
terms of either the measurement equation or the transition equation are equal among the k breakdowns.
These constraints are represented in (4.16). They can be easily implemented in the multivariate state
space model described in section 4 by setting the hyperparameters equal to each other in the estimation
process. To test for this assumption, the classical Likelihood Ratio Test (LRT) can be used. Here, the
variances of the irregular terms are set equal to each other under the null, and unrestricted under the
alternative. The ratio
( )
likelihood(null)
LR = 2ln likelihood(alternative)

(
= 2 ln(likelihood(null)) ln(likelihood(alternative)))

is then chi-square distributed, with 5 degrees of freedom M1-M4 and 4 degrees of freedom for M5. Note
that for M4, the variances of the police series have not been restricted, such that also for M4 there are
5 degrees of freedom. The results are provided in table 5.

Table 5 – Results Likelihood Ratio Tests
Model LR measurement equationLR transition equation@
M1 29.60 0.46
M1 43.60 1.10
M2 36.66 2.56
M3 54.14 9.98
M3 51.56 Out of memory
M4 50.44 5.48
M4 50.96 5.48
M5 Out of memory 0.00
M5 Out of memory 0.00
- the variances of the irregular terms of the measurement equation are set equal under the null.
@ - the variances of the irregular terms of the transition equation are set equal under the null.

Bold value - The null that the variances of the irregular terms are equal can be rejected.

The results in table 5 point out that the null hypothesis of setting the variances of the measurement
equation equal can be rejected. Hence, setting these variances equal would worsen the fit of the model.

36

Page 38

Note that since M5 and M5 ran out of memory with the applied restriction, the LRT could not be
performed for these models. The null hypothesis of setting the variances of the transition equation equal
could not be rejected for no model. The LRT could not be performed on M3 since the program ended
with an out of memory message. Following the results of Likelihood Ratio Tests, it has been decided to
set the variances of the irregular terms of the transition equation equal to each other in Qt.

The models M1-M5 as described in section 5.1 have been implemented with the restriction on the
variances of the transition equation as discussed in section 5.2.1. The resulting diagnostic tests showed
severe signs of heteroskedasticity for the Property series for almost all models. When looking at figure 1,
it seems that a change in the variance of Property in 2005 could cause this heteroskedasticity. Therefore
it has been decided to model the variance of the measurement equation for Property separately in the
years 2005-2010, and check whether this adjustment solves the heteroskedasticity problem. The adjusted
variance of the measurement equation in Ht is implemented as

V ar(ϕt,P roperty) = ⎪⎩

1

nt (σ2
ϕ,P ) if 1992 t 2004
roperty1

1

nt (σ2
ϕ,P ) if 2005 t 2010
roperty2

, (5.1)

for M1, M2, M3 and M4. For the series on national level of M5 it holds that

V ar(ϕt,P roperty) = ⎪⎩

1

nt (σ2
ϕ,P ) if 1992 t 2004
roperty1

1

nt (σ2
ϕ,P ) if 2005 t 2009
roperty2

, (5.2)

and for the male and female subpopulations of M5 it holds that

V ar(ϕt,P roperty,h) =

1
n

(σ

t,h
2
ϕ,P ) if 1992 t 2004
roperty1

1
n

(σ

t,h
2
ϕ,P ) if 2005 t 2009
roperty2

. (5.3)

The results for the the heteroskedasticity tests under both the original and adjusted variance of the
irregular term of the measurement equation for the Property series are given in table 6. Note that the
confidence interval for the heteroskedasticity test as defined in section 4.3.6 is equal to

CIHSK = [F 7

7( 0.05
2

), F

7
7(10.05
2

)] = [0.20, 5.00].

Following the results in table 6, it can be concluded that there are no signs for heteroskedasticity in
the Property series with the adjusted variance model. Therefore, it has been decided to implement this
adjustment for all models to estimate the discontinuities as described in section 4.3. Note that with
this assumption there are no ”out of memory” messages present. The heteroskedasticity test and the

37

Page 39

remaining diagnostic tests will be discussed in depth in section 5.4.

Model

Table 6 – Results for the heteroskedasticity tests for M1-M5
Heterosked. Breakdowns
test Violence Property Vandalism No stop* Other Total
M1 No adjustment 1.16 10.17 5.12 0.30 0.71 4.46
M1 No adjustment 0.71 8.35 2.80 0.30 0.50 2.77
M2 No adjustment 0.66 5.57 1.31 0.28 0.40 2.07
M3 No adjustment 0.77 8.36 4.67 0.29 0.46 3.00
M3 No adjustment Out of memory.
M4 No adjustment 0.77 8.35 4.66 0.29 0.45 3.08(0.29)
M4 No adjustment 0.66 7.96 2.98 0.29 0.46 2.65(0.29)
National 1.32 12.39 4.19 0.59 - 4.25
Men 2.97 8.70 2.49 0.76 - 5.41
Women 0.69 3.60 3.63 0.79 - 2.62
National 0.91 11.15 3.36 0.59 - 3.42
Men 1.98 7.13 1.28 0.76 - 3.41
Women 0.65 3.40 3.42 0.79 - 2.44
M1 Adjusted var.@ 1.16 2.73 5.12 0.30 0.71 4.46
M1 Adjusted var.@ 0.71 2.82 2.80 0.30 0.50 2.77
M2 Adjusted var.@ 0.66 1.31 1.31 0.28 0.40 2.07
M3 Adjusted var.@ 0.77 3.14 4.67 0.29 0.46 3.00
M3 Adjusted var.@ 0.66 2.95 2.99 0.30 0.46 2.58
M4 Adjusted var.@ 0.77 3.34 4.67 0.29 0.46 3.08(0.29)
M4 Adjusted var.@ 0.66 2.94 2.98 0.30 0.45 2.64(0.29)
National1.32 4.46 4.19 0.59 - 4.25
Men\$ 2.97 2.22 2.49 0.76 - 5.41
Women\$ 0.69 1.87 3.63 0.79 - 2.62
National0.91 4.13 3.36 0.59 - 3.42
Men\$ 1.98 2.25 1.28 0.76 - 3.41
Women\$ 0.65 1.68 3.42 0.79 - 2.44

* - ”Failure to stop after an accident”, - Sample size of ISM is adjusted to 19.000,

- Police series in brackets, @ - The variance of the Property series is adjusted to (5.1),
- The variance of the Property series is adjusted to (5.2).

\$ - The variance of the Property series is adjusted to (5.3).
Bold value - the outcome is outside the confidence interval for the statistic.

The results of the Likelihood Ratio Tests for the models with the adjusted variance of the measurement
equation of the Property series are presented in table 7. These results also show that the null of setting
the variances of the transition equation equal to each other could not be rejected for no model. Therefore,
also in the models with the adjusted variance for the Property series, the variances of the irregular terms
of the transition equation are set equal in the estimation process described in section 4.3.5.

38

Page 40

Model
Table 7 – Results Likelihood Ratio Tests with adjusted variance for Property
LR transition equation@
M1 1.45
M1 0.44
M2 0.88
M3 9.84
M3 3.92
M4 10.51
M4 5.86
M5 0.00
M5 0.00
@ - the variances of the irregular terms of the transition equation are set equal under the null.

Bold value - The null that the variances of the irregular terms are equal can be rejected.

5.3 Estimated discontinuities

The estimated discontinuities ˆ
βk are presented in tables 8, 9 and 10. Within the tables, the discontinuities
which are significantly different from zero are made bold.

Transformation None refers to the untransformed series. The log and the logit models are reported
in the log and logit scale, respectively. Initially it was planned that the logit transformation would
be applied to all models. However, after the results for M1 and M2 pointed out that these results
were almost identical to the results with the log transformation, it has been decided to disregard the
logit transformation for M3 and M4. Furthermore, for all models discussed in this paper the parameter
estimation converged ”strong”, meaning that the model was a good fit for the data.

5.3.1 Estimated discontinuities M1-M4

The estimated discontinuities for the models M1 - M4 are provided in tables 8 and 9. For those models
it is directly seen that for the breakdowns Violence, Failure to stop after an accident and Other there
were no significant discontinuities. Hence, it appears that the series Violence, Failure to stop after an
accident and Other are less sensitive for survey redesigns. These are exactly the three series with the
lowest percentages in the dataset. This might explain why the coefficients of the intervention variables
for these series are not significant, since series with lower percentages are also expected to have lower
discontinuities such that they are less likely to be significant.

The breakdowns Property, Vandalism and Total victimization, on the contrary, appear to be more
sensitive to survey redesigns. The first two series are the series with the highest rates, and the latter
one is the total victimization percentage. For Intervention 1, from LPSS to PSLC, only the Vandalism
series is significant for some models. Recall from section 3 that the main difference between these surveys
was that the LPSS focused on victimization, whereas the PSLC was a more general survey. However,
most questions from the LPSS remained the same in the PSLC. This might explain why this redesign

39