Shalom Rackovsky, PhD
- PROFESSORIAL LECTURER | Pharmacology and Systems Therapeutics
BA, Yeshiva University
PhD, Massachusetts Institute of Technology
Alignment-free methods for protein homology identification.
Identification of homologs to a protein sequence of interest is one of the basic tasks in bioinformatics. Standard methods make use of sequence alignments, which have certain intrinsic disadvantages.
- Insertions and deletions are accounted for by gap penalty functions which are arbitrary, and have no visible physical basis.
- The alignment of multiple sequences simultaneously is an NP-hard problem, and therefore large groups of sequences can only be aligned using approximations.
We are developing methods for detecting sequence similarity which require no alignment, and which are optimized to detect structural similarity. These methods make use of the optimized reduced alphabets we have developed, and of sequence comparison methods using N-gram distributions, based on methods we have used previously in protein structure comparisons.
Development and optimization of reduced aminio acid alphabets
Reduced amino acid alphabets are an important bioinformatic tool. They are used both explicitly, in folding studies, and implicitly, in homology searches and fold identification studies. The use of reduced alphabets improves database statistics, but eliminates information inherent in the complete sequence. One can therefore ask whether it is possible to construct reduced alphabets which retain the maximum possible amount of structural information. We have shown that this can be done, and other investigators have shown that the alphabets we developed do indeed give greatly improved results in structural homology searches. We are developing reduced alphabets which optimally encode other types of information, and developing uses for the resulting alphabets in various informatic applications.
Detection of architecture signals in protein sequences.
It was long believed that structural similarity between proteins implied an evolutionary relationship. In recent years, however, it has become clear that proteins which exhibit no discernible sequence relationship can assume the same fold. This is perhaps the deepest observation in protein science, because it demonstrates that we don't actually understand the sequence signals which determine protein architecture. We have developed new methods to isolate the signals which determine architecture, using methods of signal processing to study groups of proteins which are unrelated by sequence, but assume the same architecture. Significant periodic signals have been detected, and we are studying their relationship to the distribution of amino acid physical properties in sequences.
We are interested in computational proteomics and protein bioinformatics. We have a special interest in the ways in which folding information is encoded in protein sequences, and in the development of methods for utilizing that information.A number of projects are active in the group at the moment.Our research is mathematical and computational in nature, and we use a wide range of analytic and algorithmic techniques.
Solis AD, Rackovsky S. Property-based sequence representations do not adequately encode local protein folding information. Proteins 2007 Jun 1; 67(4): 785-788.
Solis AD, Rackovsky S. Improvement of statistical potentials and threading score functions using information maximization. Proteins 2006 Mar 1; 62(4): 892-908.
Rackovsky S. Characterization of architecture signals in proteins. J Phys Chem B Condens Matter Mater Surf Interfaces Biophys 2006 Sept 28; 110(38): 18771-18778.