Process segmentation and modelling applied to time series featuring the response of biological materials to toxic agents
in environmental and military fields. In this framework, the department of Microbiology
and Biochemistry at Oregon State University has discovered that fish living cells
are promising indicators of the presence of a wide range of toxins. Thus, an interdisciplinary
project called ”SOS Cytosensor” was launched to create an autonomous and
mobile device to detect such toxins using these living cells.
After exposing a cell culture to a specific biological or chemical agent, a sequence
of cell images is recorded. The extraction of features from the experimental sequences
of images results in time series that have to be modelled and classified in order to prove
useful in toxin detection. The chosen models should give a representation of time series
that supports accurate classification and clustering and that would also make storage
and transmission more efficient. There are many techniques for dimensionality reduction
of time series data in the literature, such as Fourier transforms, but segmentation is
the most popular technique for extracting structures from time series. Segmentation
algorithms can be classified as batch or online. The main idea is that given a time
series Y, segmentation produces the best representation using an undefined number K
of segments, such that the combined error of all segments is less than a user-specified threshold and that the maximum error for any segment doesn’t exceed a user-specified
local threshold. First, we modelled each time series data using a single ARX model
with regularly spaced breakpoints. Then, we considered improving the result by placing
the breakpoints dynamically. As a pre-analysis of the curves, we performed a piecewise
linear segmentation, thus tracking changes in the behaviour of the time series and placing
breakpoints at those locations. Piecewise linear regression refers to the approximation of
a time series Y, of length N, with K straight lines. Because K is typically much smaller
than N, this representation makes the storage, transmission and computation of data
more efficient. The piecewise linear regression is usually used for change point detection,
which is our goal in this study.
As the segmentation into several simple adequate AR models proved not to be
satisfying in terms of fitting, we combined this concept with the piecewise linear segmentation
discussed above. Instead of modelling the time series by a single ARX model
using breakpoints determined by the segmentation algorithm or by several AR models,
we model each segment with a different ARX model. We use sum of square errors or the
residual error as a measure of the cost of merging segments. Computation speed has been
increased by presegmenting the time series with a fine piecewise linear approximation.
It also enables the user to predefine the number of final segments for classification and
clustering purposes. The final state can be detected by extracting the last segment from
the segmentation process.
Finally, classification and clustering are essential steps in the analysis of the experimental
time series. Cytosensor project required a numerical and non-numerical representation
of the experimental data. The approach adopted in this study is a soft
classification approach, which allows a better understanding and eases decision making,
thus complementing numerical features. Using calibration runs and the resulting model
parameters, we build a database of tight clusters representing scenarios. Then, we calculate
the probabilistic distances between an operational cluster and each of the calibration
clusters, leading to the identification of an operational run to a specific scenario.
Advisor:Temes, Gabor C.
School:Oregon State University
School Location:USA - Oregon
Source Type:Master's Thesis
Keywords:biosensors computer programs toxicity testing
ISBN:
Date of Publication:07/16/2004