Publications
List of publications
2022
- J. Struct. Biol.Unusual commonality in active site structural features of substrate promiscuous and specialist enzymesD. Thakur, and S.B. PanditJournal of Structural Biology 2022
Enzyme promiscuity is the ability of (some) enzymes to perform alternate reactions or catalyze non-cognate substrate(s). The latter is referred to as substrate promiscuity, widely studied for its biotechnological applications and understanding enzyme evolution. Insights into the structural basis of substrate promiscuity would greatly benefit the design and engineering of enzymes. Previous studies on some enzymes have suggested that flexibility, hydrophobicity, and active site protonation state could play an important role in enzyme promiscuity. However, it is not known yet whether substrate promiscuous enzymes have distinctive structural characteristics compared to specialist enzymes, which are specific for a substrate. In pursuit to address this, we have systematically compared substrate/catalytic binding site structural features of substrate promiscuous with those of specialist enzymes. For this, we have carefully constructed dataset of substrate promiscuous and specialist enzymes. On careful analysis, surprisingly, we found that substrate promiscuous and specialist enzymes are similar in various binding/catalytic site structural features such as flexibility, surface area, hydrophobicity, depth, and secondary structures. Recent studies have also alluded that promiscuity is widespread among enzymes. Based on these observations, we propose that substrate promiscuity could be defined as a continuum feature that varies from narrow (specialist) to broad range of substrate preferences. Moreover, diversity of conformational states of an enzyme accessible for ligand binding may possibly regulate its substrate preferences. © 2022 Elsevier Inc.
- Front. Microbiol.Curcumin Inhibits Membrane-Damaging Pore-Forming Function of the β-Barrel Pore-Forming Toxin Vibrio cholerae CytolysinM. Singh, N. Rupesh, S.B. Pandit, and 1 more authorFrontiers in Microbiology 2022
Vibrio cholerae cytolysin (VCC) is a β-barrel pore-forming toxin (β-PFT). Upon encountering the target cells, VCC forms heptameric β-barrel pores and permeabilizes the cell membranes. Structure-function mechanisms of VCC have been extensively studied in the past. However, the existence of any natural inhibitor for VCC has not been reported yet. In the present study, we show that curcumin can compromise the membrane-damaging activity of VCC. Curcumin is known to modulate a wide variety of biological processes and functions. However, the application of curcumin in the physiological scenario often gets limited due to its extremely poor solubility in the aqueous environment. Interestingly, we find that VCC can associate with the insoluble fraction of curcumin in the aqueous medium and thus gets separated from the solution phase. This, in turn, reduces the availability of VCC to attack the target membranes and thus blocks the membrane-damaging action of the toxin. We also observe that the soluble aqueous extract of curcumin, generated by the heat treatment, compromises the pore-forming activity of VCC. Interestingly, in the presence of such soluble extract of curcumin, VCC binds to the target membranes and forms the oligomeric assembly. However, such oligomers appear to be non-functional, devoid of the pore-forming activity. The ability of curcumin to bind to VCC and neutralize its membrane-damaging activity suggests that curcumin has the potential to act as an inhibitor of this potent bacterial β-PFT. Copyright © 2022 Singh, Rupesh, Pandit and Chattopadhyay.
2021
- ACS Bio. Sci. Eng.Optimal Protein Sequence Design Mitigates Mechanical Failure in Silk β-Sheet NanocrystalsP. Verma, B. Panda, K.P. Singh, and 1 more authorACS Biomaterials Science and Engineering 2021
The excellent mechanical strength and toughness of spider silk are well characterized experimentally and understood atomistically using computational simulations. However, little attention has been focused on understanding whether the amino acid sequence of β-sheet nanocrystals, which is the key to rendering strength to silk fiber, is optimally chosen to mitigate molecular-scale failure mechanisms. To investigate this, we modeled β-sheet nanocrystals of various representative small/polar/hydrophobic amino acid repeats for determining the sequence motif having superior nanomechanical tensile strength and toughness. The constant velocity pulling of the central β-strand in the nanocrystal, using steered molecular dynamics, showed that homopolymers of small amino acid (alanine/alanine-glycine) sequence motifs, occurring in natural silk fibroin, have better nanomechanical properties than other modeled structures. Further, we analyzed the hydrogen bond (HB) and β-strand pull dynamics of modeled nanocrystals to understand the variation in their rupture mechanisms and explore sequence-dependent mitigating factors contributing to their superior mechanical properties. Surprisingly, the enhanced side-chain interactions in homopoly-polar/hydrophobic amino acid models are unable to augment backbone HB cooperativity to increase mechanical strength. Our analyses suggest that nanocrystals of pristine silk sequences most likely achieve superior mechanical strength by optimizing side-chain interaction, packing, and main-chain HB interactions. Thus, this study suggests that the nanocrystal β-sheet sequence plays a crucial role in determining the nanomechanical properties of silk, and the evolutionary process has optimized it in natural silk. This study provides insight into the molecular design principle of silk with implications in the genetically modified artificial synthesis of silk-like biomaterials. © 2021 American Chemical Society.
- Mol. Microbiol.Tyrosine in the hinge region of the pore-forming motif regulates oligomeric β-barrel pore formation by Vibrio cholerae cytolysinA.K. Mondal, P. Verma, N. Sengupta, and 3 more authorsMolecular Microbiology 2021
β-barrel pore-forming toxins perforate cell membranes by forming oligomeric β-barrel pores. The most crucial step is the membrane-insertion of the pore-forming motifs that create the transmembrane β-barrel scaffold. Molecular mechanism that regulates structural reorganization of these pore-forming motifs during β-barrel pore-formation still remains elusive. Using Vibrio cholerae cytolysin as an archetypical example of the β-barrel pore-forming toxin, we show that a key tyrosine residue (Y321) in the hinge region of the pore-forming motif plays crucial role in this process. Mutation of Y321 abrogates oligomerization of the membrane-bound toxin protomers, and blocks subsequent steps of pore-formation. Our study suggests that the presence of Y321 in the hinge region of the pore-forming motif is crucial for the toxin molecule to sense membrane-binding, and to trigger essential structural rearrangements required for the subsequent oligomerization and pore-formation process. Such a regulatory mechanism of pore-formation by V. cholerae cytolysin has not been documented earlier in the structurally related β-barrel pore-forming toxins. © 2020 John Wiley & Sons Ltd
2020
- BMB Educ.An inquiry-based approach in large undergraduate labs: Learning, by htmlng it the “wrong” wayA.K. Bachhawat, S.B. Pandit, I. Banerjee, and 4 more authorsBiochemistry and Molecular Biology Education 2020
Undergraduate laboratory courses, owing to their larger sizes and shorter time slots, are often conducted in highly structured modes. However, this approach is known to interfere with students’ engagement in the experiments. To enhance students’ engagement, we propose an alternative mode of running laboratory courses by creating some “disorder” in a previously adopted structure. After performing an experiment in the right way, the students were asked to repeat the experiment but with a variation at certain steps leading to the experiment being done the “wrong” way. Although this approach led to fewer experiments being conducted in a semester, it significantly enhanced the students’ involvement. This was also reflected in the students’ feedback. The majority of students preferred repeating an experiment with a variant protocol than performing a new experiment. Although we have tested this inquiry-based approach only for an undergraduate laboratory course in molecular biology, we believe such an approach could also be extended to undergraduate laboratory courses of other subjects. © 2020 International Union of Biochemistry and Molecular Biology
2019
- PLoS ONEUnraveling the structural landscape of intra-chain domain interfaces: Implication in the evolution of domain-domain interactionsR. Verma, and S.B. PanditPLoS ONE 2019
Intra-chain domain interactions are known to play a significant role in the function and stability of multidomain proteins. These interactions are mediated through a physical interaction at domain-domain interfaces (DDIs). With a motivation to understand evolution of interfaces, we have investigated similarities among DDIs. Even though interfaces of protein-protein interactions (PPIs) have been previously studied by structurally aligning interfaces, similar analyses have not yet been performed on DDIs of either multidomain proteins or PPIs. For studying the structural landscape of DDIs, we have used iAlign to structurally align intra-chain domain interfaces of domains. The interface alignment of spatially constrained domains (due to inter-domain linkers) showed that 88% of these could identify a structural matching interface having similar C-alpha geometry and contact pattern despite that aligned domain pairs are not structurally related. Moreover, the mean interface similarity score (IS-score) is 0.307, which is higher compared to the average random IS-score (0.207) suggesting domain interfaces are not random. The structural space of DDIs is highly connected as 84% of all possible directed edges among interfaces are found to have at most path length of 8 when 0.26 is IS-score threshold. At this threshold, 83% of interfaces form the largest strongly connected component. Thus, suggesting that structural space of intra-chain domain interfaces is degenerate and highly connected, as has been found in PPI interfaces. Interestingly, searching for structural neighbors of inter-chain interfaces among intra-chain interfaces showed that 86% could find a statistically significant match to intra-chain interface with a mean IS-score of 0.311. This implies that domain interfaces are degenerate whether formed within a protein or between proteins. The interface degeneracy is most likely due to limited possible ways of packing secondary structures. In principle, interface similarities can be exploited to accurately model domain interfaces in structure prediction of multidomain proteins. © 2019 Verma, Pandit. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
2017
- BMC Bioinform.CSmetaPred: A consensus method for prediction of catalytic residuesP. Choudhary, S. Kumar, A.K. Bachhawat, and 1 more authorBMC Bioinformatics 2017
Background: Knowledge of catalytic residues can play an essential role in elucidating mechanistic details of an enzyme. However, experimental identification of catalytic residues is a tedious and time-consuming task, which can be expedited by computational predictions. Despite significant development in active-site prediction methods, one of the remaining issues is ranked positions of putative catalytic residues among all ranked residues. In order to improve ranking of catalytic residues and their prediction accuracy, we have developed a meta-approach based method CSmetaPred. In this approach, residues are ranked based on the mean of normalized residue scores derived from four well-known catalytic residue predictors. The mean residue score of CSmetaPred is combined with predicted pocket information to improve prediction performance in meta-predictor, CSmetaPred_poc. Results: Both meta-predictors are evaluated on two comprehensive benchmark datasets and three legacy datasets using Receiver Operating Characteristic (ROC) and Precision Recall (PR) curves. The visual and quantitative analysis of ROC and PR curves shows that meta-predictors outperform their constituent methods and CSmetaPred_poc is the best of evaluated methods. For instance, on CSAMAC dataset CSmetaPred_poc (CSmetaPred) achieves highest Mean Average Specificity (MAS), a scalar measure for ROC curve, of 0.97 (0.96). Importantly, median predicted rank of catalytic residues is the lowest (best) for CSmetaPred_poc. Considering residues ranked ≤20 classified as true positive in binary classification, CSmetaPred_poc achieves prediction accuracy of 0.94 on CSAMAC dataset. Moreover, on the same dataset CSmetaPred_poc predicts all catalytic residues within top 20 ranks for 73% of enzymes. Furthermore, benchmarking of prediction on comparative modelled structures showed that models result in better prediction than only sequence based predictions. These analyses suggest that CSmetaPred_poc is able to rank putative catalytic residues at lower (better) ranked positions, which can facilitate and expedite their experimental characterization. Conclusions: The benchmarking studies showed that employing meta-approach in combining residue-level scores derived from well-known catalytic residue predictors can improve prediction accuracy as well as provide improved ranked positions of known catalytic residues. Hence, such predictions can assist experimentalist to prioritize residues for mutational studies in their efforts to characterize catalytic residues. Both meta-predictors are available as webserver at: http://14.139.227.206/csmetapred/. © 2017 The Author(s).
- Biochem. J.Disulphide bond restrains the C-terminal region of thermostable direct hemolysin during folding to promote oligomerizationN. Kundu, S. Tichkule, S.B. Pandit, and 1 more authorBiochemical Journal 2017
Pore-forming toxins (PFTs) are typically produced as water-soluble monomers, which upon interacting with target cells assemble into transmembrane oligomeric pores. Vibrio parahaemolyticus thermostable direct hemolysin (TDH) is an atypical PFT that exists as a tetramer in solution, prior to membrane binding. The TDH structure highlights a core β-sandwich domain similar to those found in the eukaryotic actinoporin family of PFTs. However, the TDH structure harbors an extended C-terminal region (CTR) that is not documented in the actinoporins. This CTR remains tethered to the β-sandwich domain through an intra-molecular disulphide bond. Part of the CTR is positioned at the interprotomer interface in the TDH tetramer. Here we show that the truncation, as well as mutation, of the CTR compromise tetrameric assembly, and the membrane-damaging activity of TDH. Our study also reveals that intra-protomer disulphide bond formation during the folding/assembly process of TDH restrains the CTR to mediate its participation in the formation of inter-protomer contact, thus facilitating TDH oligomerization. However, once tetramerization is achieved, disruption of the disulphide bond does not affect oligomeric assembly. Our study provides critical insights regarding the regulation of the oligomerization mechanism of TDH, which has not been previously documented in the PFT family. © 2017 The Author(s); published by Portland Press Limited on behalf of the Biochemical Society.
2014
- Nucleic Acids Res.XTMS: Pathway design in an eXTended metabolic spaceP. Carbonell, P. Parutto, J. Herisson, and 2 more authorsNucleic Acids Research 2014
As metabolic engineering and synthetic biology progress toward reaching the goal of a more sustainable use of biological resources, the need of increasing the number of value-added chemicals that can be produced in industrial organisms becomes more imperative. Exploring, however, the vast possibility of pathways amenable to engineering through heterologous genes expression in a chassis organism is complex and unattainable manually. Here, we present XTMS, a web-based pathway analysis platform available at http://xtms.issb.genopole.fr, which provides full access to the set of pathways that can be imported into a chassis organism such as Escherichia coli through the application of an Extended Metabolic Space modeling framework. The XTMS approach consists on determining the set of biochemical transformations that can potentially be processed in vivo as modeled by molecular signatures, a specific coding system for derivation of reaction rules for metabolic reactions and enumeration of all the corresponding substrates and products. Most promising routes are described in terms of metabolite exchange, maximum allowable pathway yield, toxicity and enzyme efficiency. By answering such critical design points, XTMS not only paves the road toward the rationalization of metabolic engineering, but also opens new processing possibilities for non-natural metabolites and novel enzymatic transformations. © 2014 The Author(s).
- Mol. Biosyst.Evolutionarily conserved and conformationally constrained short peptides might serve as DNA recognition elements in intrinsically disordered regionsN. Tayal, P. Choudhary, S.B. Pandit, and 1 more authorMolecular BioSystems 2014
Despite recent advances, it is yet not clear how intrinsically disordered regions in proteins recognize their targets without any defined structures. Short linear motifs had been proposed to mediate molecular recognition by disordered regions; however, the underlying structural prerequisite remains elusive. Moreover, the role of short linear motifs in DNA recognition has not been studied. We report a repertoire of short evolutionarily Conserved Recognition Elements (CoREs) in long intrinsically disordered regions, which have very distinct amino-acid propensities from those of known motifs, and exhibit a strong tendency to retain their three-dimensional conformations compared to adjacent regions. The majority of CoREs directly interact with the DNA in the available 3D structures, which is further supported by literature evidence, analyses of ΔΔG values of DNA-binding energies and threading-based prediction of DNA binding potential. CoREs were enriched in cancer-associated missense mutations, further strengthening their functional nature. Significant enrichment of glycines in CoREs and the preference of glycyl φ-Ψ values within the left-handed bridge range in the l-disallowed region of the Ramachandran plot suggest that Gly-to-nonGly mutations within CoREs might alter the backbone conformation and consequently the function, a hypothesis that we reconciled using available mutation data. We conclude that CoREs might serve as bait for DNA recognition by long disordered regions and that certain mutations in these peptides can disrupt their DNA binding potential and consequently the protein function. We further hypothesize that the preferred conformations of CoREs and of glycyl residues therein might play an important role in DNA binding. The highly ordered nature of CoREs hints at a therapeutic strategy to inhibit malicious molecular interactions using small molecules mimicking CoRE conformations. © 2014 the Partner Organisations.
2012
- BMC Syst. Biol.Enumerating metabolic pathways for the production of heterologous target chemicals in chassis organismsP. Carbonell, D. Fichera, S.B. Pandit, and 1 more authorBMC Systems Biology 2012
Background: We consider the possibility of engineering metabolic pathways in a chassis organism in order to synthesize novel target compounds that are heterologous to the chassis. For this purpose, we model metabolic networks through hypergraphs where reactions are represented by hyperarcs. Each hyperarc represents an enzyme-catalyzed reaction that transforms set of substrates compounds into product compounds. We follow a retrosynthetic approach in order to search in the metabolic space (hypergraphs) for pathways (hyperpaths) linking the target compounds to a source set of compounds.Results: To select the best pathways to engineer, we have developed an objective function that computes the cost of inserting a heterologous pathway in a given chassis organism. In order to find minimum-cost pathways, we propose in this paper two methods based on steady state analysis and network topology that are to the best of our knowledge, the first to enumerate all possible heterologous pathways linking a target compounds to a source set of compounds. In the context of metabolic engineering, the source set is composed of all naturally produced chassis compounds (endogenuous chassis metabolites) and the target set can be any compound of the chemical space. We also provide an algorithm for identifying precursors which can be supplied to the growth media in order to increase the number of ways to synthesize specific target compounds.Conclusions: We find the topological approach to be faster by several orders of magnitude than the steady state approach. Yet both methods are generally scalable in time with the number of pathways in the metabolic network. Therefore this work provides a powerful tool for pathway enumeration with direct application to biosynthetic pathway design. © 2012 Carbonell et al.; licensee BioMed Central Ltd.
2011
2010
- ProteinsTASSER_low-zsc: An approach to improve structure prediction using low z-score-ranked templatesS.B. Pandit, and J. SkolnickProteins: Structure, Function and Bioinformatics 2010
In a variety of threading methods, often poorly ranked (low z-score) templates have good alignments. Here, a new method, TASSER_low-zsc that identifies these low z-score-ranked templates to improve protein structure prediction accuracy, is described. The approach consists of clustering of threading templates by affinity propagation on the basis of structural similarity (thread_cluster) followed by TASSER modeling, with final models selected by using a TASSER_QA variant. To establish the generality of the approach, templates provided by two threading methods, SP 3 and SPARKS 2, are examined. The SP 3 and SPARKS 2 benchmark datasets consist of 351 and 357 medium/hard proteins (those with moderate to poor quality templates and/or alignments) of length ≤250 residues, respectively. For SP 3 medium and hard targets, using thread_cluster, the TM-scores of the best template improve by 4 and 9% over the original set (without low z-score templates) respectively; after TASSER modeling/refinement and ranking, the best model improves by 7 and 9% over the best model generated with the original template set. Moreover, TASSER_low-zsc generates 22% (43%) more foldable medium (hard) targets. Similar improvements are observed with low-ranked templates from SPARKS 2. The template clustering approach could be applied to other modeling methods that utilize multiple templates to improve structure prediction. © 2010 Wiley-Liss, Inc.
- BioinformaticsPSiFR: An integrated resource for prediction of protein structure and functionS.B. Pandit, M. Brylinski, H. Zhou, and 3 more authorsBioinformatics 2010
In the post-genomic era, the annotation of protein function facilitates the understanding of various biological processes. To extend the range of function annotation methods to the twilight zone of sequence identity, we have developed approaches that exploit both protein tertiary structure and/or protein sequence evolutionary relationships. To serve the scientific community, we have integrated the structure prediction tools, TASSER, TASSER-Lite and METATASSER, and the functional inference tools, FINDSITE, a structure-based algorithm for binding site prediction, Gene Ontology molecular function inference and ligand screening, EFICAz2, a sequence-based approach to enzyme function inference and DBD-hunter, an algorithm for predicting DNA-binding proteins and associated DNA-binding residues, into a unified web resource, Protein Structure and Function prediction Resource (PSiFR). Availability and implementation: PSiFR is freely available for use on the web at http://psifr.cssb.biology.gatech.edu/. Contact: skolnick@gatech.edu © The Author 2010. Published by Oxford University Press.
- BOOK[Book Chap.] Tasser-Based Protein Structure PredictionShashi Bhushan Pandit, Hongyi Zhou, and Jeffrey Skolnick2010
2009
- BOOK[Book Chap.] Dynamics of protein-protein interaction network in plasmodium falciparumS. Mohanty, S.B. Pandit, and N. Srinivasan2009
Integration of organism-wide protein interactome data with information on expression of genes, cellular localization of proteins and their functions has proved extremely useful in developing biologically intuitive interaction networks. This chapter highlights the dynamics in protein interaction network across different stages in the lifecycle of Plasmodium falciparum, a malarial parasite, and the implication of the network dynamics in different physiological processes. The main focus of the chapter is the integration of information on experimentally derived interactions of P.falciparum proteins with expression data and analysis of the implications of interactions in different cellular processes. Extensive analysis has been made to quantify the interaction dynamics across various stages, as well as correlating it with the dynamics of the cellular pathways involving the interacting proteins. The authors’ analysis demonstrates the power of strategic integration of genome-wide datasets in extracting information on dynamics of biological pathways and processes. © 2009, IGI Global.
- ProteinsPerformance of the Pro-sp3-TASSER server in CASPBH. Zhou, S.B. Pandit, and J. SkolnickProteins: Structure, Function and Bioinformatics 2009
The performance of the protein structure prediction server pro-sp3-TASSER in CASP8 is described. Compared to CASP7, the major improvement in prediction is in the quality of input models to TASSER. These improvements are due to the PRO-SP 3 threading method, the improved quality of contact predictions provided by TASSER-2.0, multiple short TASSER simulations for building the full-length model, and the accuracy of model selection using the TASSER-QA quality assessment method. Finally, we analyze the overall performance and highlight some successful predictions of the pro-sp3-TASSER server. © 2009 WILEY-LISS, INC.
2008
- BMC Bioinform.Fr-TM-align: A new protein structural alignment method based on fragment alignments and the TM-scoreS.B. Pandit, and J. SkolnickBMC Bioinformatics 2008
Background: Protein tertiary structure comparisons are employed in various fields of contemporary structural biology. Most structure comparison methods involve generation of an initial seed alignment, which is extended and/or refined to provide the best structural superposition between a pair of protein structures as assessed by a structure comparison metric. One such metric, the TM-score, was recently introduced to provide a combined structure quality measure of the coordinate root mean square deviation between a pair of structures and coverage. Using the TM-score, the TM-align structure alignment algorithm was developed that was often found to have better accuracy and coverage than the most commonly used structural alignment programs; however, there were a number of situations when this was not true. Results: To further improve structure alignment quality, the Fr-TM-align algorithm has been developed where aligned fragment pairs are used to generate the initial seed alignments that are then refined using dynamic programming to maximize the TM-score. For the assessment of the structural alignment quality from Fr-TM-align in comparison to other programs such as CE and TM-align, we examined various alignment quality assessment scores such as PSI and TM-score. The assessment showed that the structural alignment quality from Fr-TM-align is better in comparison to both CE and TM-align. On average, the structural alignments generated using Fr-TM-align have a higher TM-score (∼9%) and coverage (∼7%) in comparison to those generated by TM-align. Fr-TM-align uses an exhaustive procedure to generate initial seed alignments. Hence, the algorithm is computationally more expensive than TM-align. Conclusion: Fr-TM-align, a new algorithm that employs fragment alignment and assembly provides better structural alignments in comparison to TM-align. The source code and executables of Fr-TM-align are freely downloadable at: http://cssb.biology.gatech.edu/skolnick/files/ FrTMalign/. © 2008 Pandit and Skolnick; licensee BioMed Central Ltd.
2007
- ProteinsAnalysis of TASSER-based CASP7 protein structure prediction resultsH. Zhou, S.B. Pandit, Y.L. Seung, and 4 more authorsProteins: Structure, Function and Genetics 2007
An improved TASSER (Threading/ASSEmbly/Refinement) methodology is applied to predict the tertiary structure for all CASP7 targets. TASSER employs template identification by threading, followed by tertiary structure assembly by rearranging continuous template fragments, where conformational space is searched via Parallel Hyperbolic Monte Carlo sampling with an optimized force-field that includes knowledge-based statistical potentials and restraints derived from threading templates. The final models are selected by clustering structures from the low temperature replicas. Improvements in TASSER over CASP6 involve use of better templates from 3D-jury applied to three threading programs, PROSPECTOR_3, SP3, and SPARKS, and a fragment comparison method for better model ranking. For targets with no reliable templates, a variant of TASSER (chunk-TASSER) is also applied with potentials and restraints extracted from ab initio folded supersecondary chunks of the target to build full-length models. For all 124 CASP targets/domains, the average root-mean-square-deviation (RMSD) from native and alignment coverage of the best initial threading models from 3D-jury are 6.2 Å and 93%, respectively. Following TASSER reassembly, the average RMSD of the best model in the template aligned region decreases to 4.9 Å and the average TM-score increases from 0.617 for the template to 0.678 for the best full-length model. Based on target difficulty, the average TM-scores of the final model to native are 0.904, 0.671, and 0.307 for high-accuracy template-based modeling, template-based modeling, and free modeling targets/domains, respectively. For the more difficult targets, TASSER with modest human intervention performed better in comparison to its server counterpart, MetaTASSER, which used a limited time simulation. © 2007 Wiley-Liss, Inc.
- BiochemistryPhosphoprotein and phosphopeptide interactions with the FHA domain from arabidopsis kinase-associated protein phosphataseZ. Ding, H. Wang, X. Liang, and 6 more authorsBiochemistry 2007
FHA domains are phosphoThr recognition modules found in diverse signaling proteins, including kinase-associated protein phosphatase (KAPP) from Arabidopsis thaliana. The kinase-interacting FHA domain (KI-FHA) of KAPP targets it to function as a negative regulator of some receptor-like kinase (RLK) signaling pathways important in plant development and environmental responses. To aid in the identification of potential binding sites for the KI-FHA domain, we predicted (i) the structure of a representative KAPP-binding RLK, CLAVATA1, and (ii) the functional surfaces of RLK kinase domains using evolutionary trace analysis. We selected phosphopeptides from KAPP-binding Arabidopsis RLKs for in vitro studies of association with KI-FHA from KAPP. Three phosphoThr peptide fragments from the kinase domain of CLV1 or BAK1 were found to bind KI-FHA with KD values of 8-20 μM, by NMR or titration calorimetry. Their affinity is driven by favorable enthalpy and solvation entropy gain. Mutagenesis of these three threonine sites suggests Thr546 in the C-lobe of the BAK1 kinase domain to be a principal but not sole site of KI-FHA binding in vitro. The brassinosteroid receptor BRI1 and KAPP are shown to associate in vivo and in vitro. Further genetic studies indicate that KAPP may be a negative regulator of the BRI1 signaling transduction pathway. l5N-Labeled KI-FHA was titrated with the GST-BRI1 kinase domain and monitored by NMR. BRI1 interacts with the same 3/4, 4/5, 6/7, 8/9, and 10/11 recognition loops of KI-FHA, with similar affinity as the phosphoThr peptides. © 2007 American Chemical Society.
2006
- Biophys. J.TASSER-lite: An automated tool for protein comparative modelingS.B. Pandit, Y. Zhang, and J. SkolnickBiophysical Journal 2006
This study involves the development of a rapid comparative modeling tool for homologous sequences by extension of the TASSER methodology, developed for tertiary structure prediction. This comparative modeling procedure was validated on a representative benchmark set of proteins in the Protein Data Bank composed of 901 single domain proteins (41-200 residues) having sequence identities between 35-90% with respect to the template. Using a Monte Carlo search scheme with the length of runs optimized for weakly/nonhomologous proteins, TASSER often provides appreciable improvement in structure quality over the initial template. However, on average, this requires ∼29 h of CPU time per sequence. Since homologous proteins are unlikely to require the extent of conformational search as weakly/nonhomologous proteins, TASSER’s parameters were optimized to reduce the required CPU time to ∼17 min, while retaining TASSER’s ability to improve structure quality. Using this optimized TASSER (TASSER-Lite), we find an average improvement in the aligned region of ∼10% in root mean-square deviation from native over the initial template. Comparison of TASSER-Lite with the widely used comparative modeling tool MODELLER showed that TASSER-Lite yields final models that are closer to the native. TASSER-Lite is provided on the web at http://cssb.biology.gatech.edu/skolnick/webservice/tasserlite/index. html. © 2006 by the Biophysical Society.
2005
- In Silico Biol.A new domain family in the superfamily of alkaline phosphatasesR. Bhadra, N. Srinivasan, and S.B. PanditIn Silico Biology 2005
During the course of our large-scale genome analysis a conserved domain, currently detectable only in the genomes of Drosophila melanogaster, Caenorhabditis elegans and Anopheles gambiae, has been identified. The function of this domain is currently unknown and no function annotation is provided for this domain in the publicly available genomic, protein family and sequence databases. The search for the homologues of this domain in the non-redundant sequence database using PSI-BLAST, resulted in identification of distant relationship between this family and the alkaline phosphatase-like superfamily, which includes families of aryl sulfatase, N-acetylgalactosomine-4-sulfatase, alkaline phosphatase and 2,3-bisphosphoglycerate-independent phosphoglycerate mutase (iPGM). The fold recognition procedures showed that this new domain could adopt a similar 3-D fold as for this supefamily. Most of the phosphatases and sulfatases of this superfamily are characterized by functional residues Ser and Cys respectively in the topologically equivalent positions. This functionally important site aligns with Ser/Thr in the members of the new family. Additionally, set of residues responsible for a metal binding site in phosphatases and sulphtases are conserved in the new family. The in-depth analysis suggests that the new family could possess phosphatase activity. © 2005 IOS Press. All rights reserved.
- Nucleic Acids Res.PRODOC: A resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain familiesO. Krishnadev, N. Rekha, S.B. Pandit, and 5 more authorsNucleic Acids Research 2005
PROtein Domain Organization and Comparison (PRODOC) comprises several programs that enable convenient comparison of proteins as a sequence of domains. The in-built dataset currently consists of ∼698 000 proteins from 192 organisms with complete genomic data, and all the SWISSPROT proteins obtained from the Pfam database. All the entries in PRODOC are represented as a sequence of functional domains, assigned using hidden Markov models, instead of as a sequence of amino acids. On average 69% of the proteins in the proteomes and 49% of the residues are covered by functional domain assignments. Software tools allow the user to query the dataset with a sequence of domains and identify proteins with the same or a jumbled or circularly permuted arrangement of domains. As it is proposed that proteins with jumbled or the same domain sequences have similar functions, this search tool is useful in assigning the overall function of a multi-domain protein. Unique features of PRODOC include the generation of alignments between multi-domain proteins on the basis of the sequence of domains and in-built information on distantly related domain families forming superfamilies. It is also possible using PRODOC to identify domain sharing and gene fusion events across organisms. An exhaustive genome-genome comparison tool in PRODOC also enables the detection of successive domain sharing and domain fusion events across two organisms. The tool permits the identification of gene clusters involved in similar biological processes in two closely related organisms. The URL for PRODOC is http://hodgkin.mbu.iisc.ernet.in/ prodoc. © 2005 Oxford University Press.
2004
- In Silico Biol.Recognition of remotely related structural homologues using sequence profiles of aligned homologous protein structuresS. Namboori, N. Srinivasan, and S.B. PanditIn Silico Biology 2004
In order to bridge the gap between proteins with three-dimensional (3-D) structural information and those without 3-D structures, extensive experimental and computational efforts for structure recognition are being invested. One of the rapid and simple computational approaches for structure recognition makes use of sequence profiles with sensitive profile matching procedures to identify remotely related homologous families. While adopting this approach we used profiles that are generated from structure-based sequence alignment of homologous protein domains of known structures integrated with sequence homologues. We present an assessment of this fast and simple approach. About one year ago, using this approach, we had identified structural homologues for 315 sequence families, which were not known to have any 3-D structural information. The subsequent experimental structure determination for at least one of the members in 110 of 315 sequence families allowed a retrospective assessment of the correctness of structure recognition. We demonstrate that correct folds are detected with an accuracy of 96.4% (106/110). Most (81/106) of the associations are made correctly to the specific structural family. For 23/106, the structure associations are valid at the superfamily level. Thus, profiles of protein families of known structure when used with sensitive profile-based search procedure result in structure association of high confidence. Further assignment at the level of superfamily or family would provide clues to probable functions of new proteins. Importantly, the public availability of these profiles from us could enable one to perform genome wide structure assignment in a local machine in a fast and accurate manner. © 2004 - IOS Press and Bioinformation Systems e.V. and the authors. All rights reserved.
- In Silico Biol.Identification and analysis of a new family of bacterial serine proteinasesS.B. Pandit, and N. SrinivasanIn Silico Biology 2004
A family of hypothetical proteins, identified predominantly from archaeal genomes, has been analyzed in order to understand its functional characteristics. Using extensive sequence similarity searches it is inferred that this family is remotely related (best sequence identity is 19%) to ClpP proteinases that belongs to serine proteinase class. This family of hypothetical proteins is referred to as SDH proteinase family based on conserved sequential order of Ser, Asp and His residues and predicted serine proteinase activity. Results of fold recognition of SDH family sequences confirmed the remote relationship between SDH proteinases and Clp proteinases and revealed similar tertiary location of putative catalytic triad residues critical for serine proteinase function. However, the best sequence alignment we could obtain suggests that while catalytic Ser is conserved across Clp and SDH proteinases the location of the other catalytic triad residues, namely, His and Asp are swapped in their amino acid alignment positions and hence in 3-D structure. The evidence of conserved catalytic triad suggests that SDH could be a new family of serine proteinases with the fold of Clp proteinase, however sharing the catalytic triad order of carboxypeptidase clan. Signal peptide sequence identified at the N-terminus of some of the homologues suggests that these might be secretory serine proteinases involved in cleavage of extracellular proteins while the remote homologues, ClpP proteinases, are known to work in intracellular environment. © 2004 - IOS Press and Bioinformation System e.V. and the authors. All rights reserved.
- IUBMB LifeStructural and functional characterization of gene products encoded in the human genome by homology detectionS.B. Pandit, S. Balaji, and N. SrinivasanIUBMB Life 2004
Availability of the human genome data has enabled the exploration of a huge amount of biological information encoded in it. There are extensive ongoing experimental efforts to understand the biological functions of the gene products encoded in the human genome. However, computational analysis can aid immensely in the interpretation of biological function by associating known functional/structural domains to the human proteins. In this article we have discussed the implications of such associations. The association of structural domains to human proteins could help in prioritizing the targets for structure determination in the structural genomics initiatives. The protein kinase family is one of the most frequently occurring protein domain families in the human proteome while P-loop hydrolase, which comprises many GTPases and ATPases, is a highly represented superfamily. Using the superfamily relationships between families of unknown and known structures we could increase structural information content of the human genome by about 5%. We could also make new associations of domain families to 33 human proteins that are potentially linked to genetically inherited diseases.
- BMC Bioinform.SUPFAM: A database of sequence superfamilies of protein domainsS.B. Pandit, R. Bhadra, V.S. Gowri, and 3 more authorsBMC Bioinformatics 2004
Background: SUPFAM database is a compilation of superfamily relationships between protein domain families of either known or unknown 3-D structure. In SUPFAM, sequence families from Pfam and structural families from SCOP are associated, using profile matching, to result in sequence superfamilies of known structure. Subsequently all-against-all family profile matches are made to deduce a list of new potential superfamilies of yet unknown structure. Description: The current version of SUPFAM (release 1.4) corresponds to significant enhancements and major developments compared to the earlier and basic version. In the present version we have used RPS-BLAST, which is robust and sensitive, for profile matching. The reliability of connections between protein families is ensured better than before by use of benchmarked criteria involving strict e-value cut-off and a minimal alignment length condition. An e-value based indication of reliability of connections is now presented in the database. Web access to a RPSBLAST-based tool to associate a query sequence to one of the family profiles in SUPFAM is available with the current release. In terms of the scientific content the present release of SUPFAM is entirely reorganized with the use of 6190 Pfam families and 2317 structural families derived from SCOP. Due to a steep increase in the number of sequence and structural families used in SUPFAM the details of scientific content in the present release are almost entirely complementary to previous basic version. Of the 2286 families, we could relate 245 Pfam families with apparently no structural information to families of known 3-D structures, thus resulting in the identification of new families in the existing superfamilies. Using the profiles of 3904 Pfam families of yet unknown structure, an all-against-all comparison involving sequence-profile match resulted in clustering of 96 Pfam families into 39 new potential superfamilies. Conclusion: SUPFAM presents many non-trivial superfamily relationships of sequence families involved in a variety of functions and hence the information content is of interest to a wide scientific community. The grouping of related proteins without a known structure in SUPFAM is useful in identifying priority targets for structural genomics initiatives and in the assignment of putative functions. © 2004 Pandit et al; licensee BioMed Central Ltd.
- J. Biosci.Enhanced functional and structural domain assignments using remote similarity detection procedures for proteins encoded in the genome of Mycobacterium tuberculosis H37RvS. Namboori, N. Mhatre, S. Sujatha, and 2 more authorsJournal of Biosciences 2004
The sequencing of the Mycobacterium tuberculosis (MTB) H37Rv genome has facilitated deeper insights into the biology of MTB, yet the functions of many MTB proteins are unknown. We have used sensitive profile-based search procedures to assign functional and structural domains to infer functions of gene products encoded in MTB. These domain assignments have been made using a compendium of sequence and structural domain families. Functions are predicted for 78% of the encoded gene products. For 69% of these, functions can be inferred by domain assignments. The functions for the rest are deduced from their homology to proteins of known function. Superfamily relationships between families of unknown and known structures have increased structural information by ∼ 11%. Remote similarity detection methods have enabled domain assignments for 1325 ’hypothetical proteins’. The most populated families in MTB are involved in lipid metabolism, entry and survival of the bacillus in host. Interestingly, for 353 proteins, which, we refer to as MTB-specific, no homologues have been identified. Numerous, previously unannotated, hypothetical proteins have been assigned domains and some of these could perhaps be the possible chemotherapeutic targets. MTB-specific proteins might include factors responsible for virulence. Importantly, these assignments could be valuable for experimental endeavors. The detailed results are publicly available at http://hodgkin.mbu.iisc.ernet.in/ dots.
2003
- ProteinsSurvey for G-proteins in the prokaryotic genomes: Prediction of functional roles based on classificationS.B. Pandit, and N. SrinivasanProteins: Structure, Function and Genetics 2003
The members of the family of G-proteins are characterized by their ability to bind and hydrolyze guanosine triphosphate (GTP) to guanosine diphosphate (GDP). Despite a common biochemical function of GTP hydrolysis shared among the members of the family of G-proteins, they are associated with diverse biological roles. The current work describes the identification and detailed analysis of the putative G-proteins encoded in the completely sequenced prokaryotic genomes. Inferences on the biological roles of these G-proteins have been obtained by their classification into known functional subfamilies. We have identified 497 G-proteins in 42 genomes. Seven small GTP-binding protein homologues have been identified in prokaryotes with at least two of the diagnostic sequence motifs of G-proteins conserved. The translation factors have the largest representation (234 sequences) and are found to be ubiquitous, which is consistent with their critical role in protein synthesis. The GTP_OBG subfamily comprises of 79 sequences in our dataset. A total of 177 sequences belong to the subfamily of GTPase of unknown function and 154 of these could be associated with domains of known functions such as cell cycle regulation and t-RNA modification. The large GTP-binding proteins and the α-subunit of heterotrimeric G-proteins are not detected in the genomes of the prokaryotes surveyed. © 2003 Wiley-Liss, Inc.
- Nucleic Acids Res.Integration of related sequences with protein three-dimensional structural families in an updated version of PALI databaseV.S. Gowri, S.B. Pandit, P.S. Karthik, and 2 more authorsNucleic Acids Research 2003
The database of Phylogeny and ALIgnment of homologous protein structures (PALI) contains three-dimensional (3-D) structure-dependent sequence alignments as well as structure-based phylogenetic trees of protein domains in various families. The latest updated version (Release 2.1) comprises of 844 families of homologous proteins involving 3863 protein domain structures with each of these families having at least two members. Each member in a family has been structurally aligned with every other member in the same family using two proteins at a time. In addition, an alignment of multiple structures has also been performed using all the members in a family. Every family with at least three members is associated with two dendrograms, one based on a structural dissimilarity metric and the other based on similarity of topologically equivalenced residues for every pairwise alignment. Apart from these multi-member families, there are 817 single member families in the updated version of PALI. A new feature in the current release of PALI is the integration, with 3-D structural families, of sequences of homologues from the sequence databases. Alignments between homologous proteins of known 3-D structure and those without an experimentally derived structure are also provided for every family in the enhanced version of PALI. The database with several web interfaced utilities can be accessed at: http://pauling.mbu.iisc.ernet.in/ pali.
2002
- Nucleic Acids Res.SUPFAM - A database of potential protein superfamily relationships derived by comparing sequence-based and structure-based families: Implications for structural genomics and function annotation in genomesS.B. Pandit, D. Gosar, S. Abhiman, and 5 more authorsNucleic Acids Research 2002
Members of a superfamily of proteins could result from divergent evolution of homologues with insignificant similarity in the amino acid sequences. A superfamily relationship is detected commonly after the three-dimensional structures of the proteins are determined using X-ray analysis or NMR. The SUPFAM database described here relates two homologous protein families in a multiple sequence alignment database of either known or unknown structure. The present release (1.1), which is the first version of the SUPFAM database, has been derived by analysing Pfam, which is one of the commonly used databases of multiple sequence alignments of homologous proteins. The first step in establishing SUPFAM is to relate Pfam families with the families in PALI, which is an alignment database of homologous proteins of known structure that is derived largely from SCOP. The second step involves relating Pfam families which could not be associated reliably with a protein superfamily of known structure. The profile matching procedure, IMPALA, has been used in these steps. The first step resulted in identification of 1280 Pfam families (out of 2697, i.e. 47%) which are related, either by close homologous connection to a SCOP family or by distant relationship to a SCOP family, potentially forming new superfamily connections. Using the profiles of 1417 Pfam families with apparently no structural information, an all-against-all comparison involving a sequence-profile match using IMPALA resulted in clustering of 67 homologous protein families of Pfam into 28 potential new superfamilies. Expansion of groups of related proteins of yet unknown structural information, as proposed in SUPFAM, should help in identifying ’priority proteins’ for structure determination in structural genomics initiatives to expand the coverage of structural information in the protein sequence space. For example, we could assign 858 distinct Pfam domains in 2203 of the gene products in the genome of Mycobacterium tubercolosis. Fifty-one of these Pfam families of unknown structure could be clustered into 17 potentially new superfamilies forming good targets for structural genomics. SUPFAM database can be accessed at http://pauling.mbu.iisc.ernet.in/ supfam.
- Protein Eng.Proteomics analysis of carbon-starved Mycobacterium smegmatis: Induction of Dps-like proteinS. Gupta, S.B. Pandit, N. Srinivasan, and 1 more authorProtein Engineering 2002
Mycobacterium tuberculosis is a globally successful pathogen, infecting more than one third of total world’s population. These bacteria have the remarkable ability to persist in the host for long periods of time unrecognized by the immune system and then to re-emerge later in life causing the disease. The physiology of such persistent or dormant bacilli is not very well characterized. Some evidence suggests that the dormant bacilli survive in a nutrient-deprived state that is similar to the stationary phase of the bacteria with respect to gene expression and physiology. Under this assumption we have studied the survival of Mycobacterium smegmatis in carbon starvation conditions as a model for mycobacterial persistence. M.smegmatis, being a fast-growing strain, serves as a good model to study starvation responses. Using the two-dimensional electrophoresis-based proteomics approach, we identified a protein which was found to be expressed preferentially under starvation conditions. This protein is homologous to a family of proteins called Dps (DNA binding Protein from Starved cells) that are known to protect DNA under various kinds of environmental stresses and its existence has, so far, not been reported in mycobacteria. Upon expression and purification of this protein, we observed that it has non-specific DNA-binding ability. Formation of a cage-like dodecamer structure is a characteristic feature of Dps. Using comparative modelling we were able to show that Dps from M.smegmatis could form a dodecamer structure similar to the crystal structure of Dps from Escherichia coli.