Metaheuristic Multiple Sequence Alignment Optimisation

by Auer, Jens

Abstract (Summary)
The development of good tools and algorithms to analyse genome and protein data in the form of amino acid sequences coded as strings is one of the major subjects of Bioinformatics. Multiple Sequence Alignment (MSA) is one of the basic yet most important techniques to analyse data of more than two sequences. From a computer scientiest point of view, MSA is merely a combinatorial optimisation problem where strings have to arrange in a tabular form to optimise an objective function. The di culty with MSA is that it is NP {complete, and thus impossible to solve optimally in reasonable time under the assumption that P 6= NP . The abilitytotackle NP {hard problems has been greatly extended bytheintroduction of Metaheuristics (see Blum & Roli (2003)) for a summary of most Metaheuristics, general problem-independent optimisation algorithms extending the hill-climbing local search approach to escape local minima. One of these algorithms is Iterated Local Search (ILS) (Lourenco et al., 2002; Stutzle, 1999a, p.25 ), a recent easy to implement but powerful algorithm with results comparable or superior to other state-of-the-art methods for manycombinatorial optimisation problems, among them TSP and QAP.ILS iteratively samples local minima by modifying the current local minimum and restarting a local search porcedure on this modi ed solution. This thesis will show howILS can be implemented for MSA. After that, ILS will be evaluated and compared to other MSA algorithms by BAliBASE (Thomson et al., 1999), a set of manually re ned alignments used in most recent publications of algorithms and in at least two MSA algorithm surveys. The runtime-behaviour will be evaluated using runtime-distribution. The quality of alignments produced by ILS is at least as good as the best algorithms available and signi cantly superiour to previously published Metaheuristics for MSA, Tabu Search and Genetic Algorithm (SAGA). On the average, ILS performed best in ve out of eight test cases, second for one test set and third for the remaining two. A drawback of all iterative methods for MSA is the long runtime needed to produce good alignments. ILS needs considerably less runtime than Tabu Search and SAGA, but can not compete with progressive or consistency based methods, e.g. ClustalW or T{COFFEE. keywords: Combinatorial Optimisation, Metaheuristics, Multiple Sequence Alignment, Iterated Local Search
Bibliographical Information:


School:Högskolan i Skövde

School Location:Sweden

Source Type:Master's Thesis

Keywords:metaheuristic multiple sequence alignment optimisation


Date of Publication:02/27/2008

© 2009 All Rights Reserved.