The use of evolutionary information in protein alignments and homology identification
For the vast majority of proteins no experimental information about the three-dimensional structure is known, but only its sequence. Therefore, the easiest way to obtain some understanding of the structure and function of these proteins is by relating them to well studied proteins. This can be done by searching for homologous proteins. It is easy to identify a homologous sequence if the sequence identity is above 30%. However, if the sequence identity drops below 30% then more sophisticated methods have to be used. These methods often use evolutionary information about the sequences, which makes it possible to identify homologous sequences with a low sequence identity.In order to build a three--dimensional model from the sequence based on a protein structure the two sequences have to be aligned. Here the aligned residues serve as a first approximation of the structure.This thesis focuses on the development of fold recognition and alignment methods based on evolutionary information. The use of evolutionary information for both query and target proteins was shown to improve both recognition and alignments. In a benchmark of profile--profile methods it was shown that the probabilistic methods were best, although the difference between several of the methods was quite small once optimal gap-penalties were used. An artificial neural network based alignment method ProfNet was shown to be at least as good as the best profile--profile method, and by adding information from a self-organising map and predicted secondary structure we were able to further improve ProfNet.
Source Type:Doctoral Dissertation
Keywords:NATURAL SCIENCES; Chemistry; Theoretical chemistry; Protein alignment homology sequence profile
Date of Publication:01/01/2006