Header menu link for other important links
X
Novel approach for selecting the best predictor for identifying the binding sites in DNA binding proteins
Raju Nagarajan,
Published in
2013
PMID: 23788679
Volume: 41
   
Issue: 16
Pages: 7606 - 7614
Abstract
Protein-DNA complexes play vital roles in many cellular processes by the interactions of amino acids with DNA. Several computational methods have been developed for predicting the interacting residues in DNA-binding proteins using sequence and/or structural information. These methods showed different levels of accuracies, which may depend on the choice of data sets used in training, the feature sets selected for developing a predictive model, the ability of the models to capture information useful for prediction or a combination of these factors. In many cases, different methods are likely to produce similar results, whereas in others, the predictors may return contradictory predictions. In this situation, a priori estimates of prediction performance applicable to the system being investigated would be helpful for biologists to choose the best method for designing their experiments. In this work, we have constructed unbiased, stringent and diverse data sets for DNA-binding proteins based on various biologically relevant considerations: (i) seven structural classes, (ii) 86 folds, (iii) 106 superfamilies, (iv) 194 families, (v) 15 binding motifs, (vi) single/double-stranded DNA, (vii) DNA conformation (A, B, Z, etc.), (viii) three functions and (ix) disordered regions. These data sets were culled as non-redundant with sequence identities of 25 and 40% and used to evaluate the performance of 11 different methods in which online services or standalone programs are available. We observed that the best performing methods for each of the data sets showed significant biases toward the data sets selected for their benchmark. Our analysis revealed important data set features, which could be used to estimate these context-specific biases and hence suggest the best method to be used for a given problem. We have developed a web server, which considers these features on demand and displays the best method that the investigator should use. The web server is freely available at http://www.biotech.iitm.ac.in/ DNA-protein/. Further, we have grouped the methods based on their complexity and analyzed the performance. The information gained in this work could be effectively used to select the best method for designing experiments. © 2013 The Author(s).
About the journal
JournalNucleic Acids Research
ISSN03051048
Open AccessNo
Concepts (30)
  •  related image
    Dna binding protein
  •  related image
    Double stranded dna
  •  related image
    Single stranded dna
  •  related image
    DNA
  •  related image
    Article
  •  related image
    Binding site
  •  related image
    Computer prediction
  •  related image
    Computer program
  •  related image
    Dna binding motif
  •  related image
    Dna conformation
  •  related image
    Dna protein complex
  •  related image
    Dna structure
  •  related image
    Information service
  •  related image
    Internet
  •  related image
    Priority journal
  •  related image
    Protein structure
  •  related image
    Biology
  •  related image
    Chemistry
  •  related image
    Classification
  •  related image
    Conformation
  •  related image
    Metabolism
  •  related image
    Methodology
  •  related image
    Protein folding
  •  related image
    Protein motif
  •  related image
    Amino acid motifs
  •  related image
    Binding sites
  •  related image
    Computational biology
  •  related image
    Dna-binding proteins
  •  related image
    Nucleic acid conformation
  •  related image
    Software