Classification of protein targeting sequences by
self-organizing feature maps

Schneider, G.

F.Hoffmann-La Roche, Pharmaceuticals Division, Molecular Design & Bioinformatics, CH-4070 Basel, Switzerland

Much information about the targeting pathway of proteins is contained in the N-terminal part of protein precursor sequences. N-terminal targeting signals directing a newly synthesized protein towards organelles or the secretory route were analyzed to further understand and visualize these characteristic sequence features. Self-organizing neural networks were developed for this purpose, which were able to perform a non-linear projection of the multi-dimensional sequence space onto a two-dimensional map. The maps generated show a clear separation of the different types of targeting sequences and allow for an interpretation of the features extracted by the network. It turns out that physicochemical properties of the N-terminal fragments of protein precursor sequences contribute to the targeting signal. This finding is a consequence of a new way of sequence representation in terms of correlations of residue properties. This technique is presented, and both specific advantages and limitations of pattern recognition by self-organizing networks are discussed.

Schneider, G. (1997) Concepts in molecular bioinformatics. BIF Futura 12, 87-97.
Schneider, G., Sjoling, S., Wallin, E., Wrede, P., Glaser, E., von Heijne, G. (1998) Feature extraction from mitochondrial signal peptidase cleavage sites. Proteins: Struct. Funct. Genet. (January 1998 issue), in press.

LOCATION DATE TIME
Lecture Hall II Sunday, April 5 06:20 pm