Metaheuristic Multiple Sequence Alignment Optimisation
Abstract (Summary)
The development of good tools and algorithms to analyse genome and protein data in
the form of amino acid sequences coded as strings is one of the major subjects of Bioinformatics.
Multiple Sequence Alignment (MSA) is one of the basic yet most important
techniques to analyse data of more than two sequences. From a computer scientiest
point of view, MSA is merely a combinatorial optimisation problem where strings have
to arrange in a tabular form to optimise an objective function. The di culty with MSA
is that it is NP {complete, and thus impossible to solve optimally in reasonable time
under the assumption that P 6= NP .
The abilitytotackle NP {hard problems has been greatly extended bytheintroduction
of Metaheuristics (see Blum
&
Roli (2003)) for a summary of most Metaheuristics,
general problem-independent optimisation algorithms extending the hill-climbing local
search approach to escape local minima. One of these algorithms is Iterated Local
Search (ILS) (Lourenco et al., 2002; Stutzle, 1999a, p.25 ), a recent easy to implement
but powerful algorithm with results comparable or superior to other state-of-the-art
methods for manycombinatorial optimisation problems, among them TSP and QAP.ILS
iteratively samples local minima by modifying the current local minimum and restarting
a local search porcedure on this modi ed solution.
This thesis will show howILS can be implemented for MSA. After that, ILS will be
evaluated and compared to other MSA algorithms by BAliBASE (Thomson et al., 1999),
a set of manually re ned alignments used in most recent publications of algorithms and
in at least two MSA algorithm surveys. The runtime-behaviour will be evaluated using
runtime-distribution.
The quality of alignments produced by ILS is at least as good as the best algorithms
available and signi cantly superiour to previously published Metaheuristics for MSA,
Tabu Search and Genetic Algorithm (SAGA). On the average, ILS performed best in
ve out of eight test cases, second for one test set and third for the remaining two.
A drawback of all iterative methods for MSA is the long runtime needed to produce
good alignments. ILS needs considerably less runtime than Tabu Search and SAGA,
but can not compete with progressive or consistency based methods, e.g. ClustalW or
T{COFFEE.
keywords: Combinatorial Optimisation, Metaheuristics, Multiple Sequence
Alignment, Iterated Local Search
Bibliographical Information:
Advisor:
School:Högskolan i Skövde
School Location:Sweden
Source Type:Master's Thesis
Keywords:metaheuristic multiple sequence alignment optimisation
ISBN:
Date of Publication:02/27/2008