Review Article
Biological Databases and Resources for Sequence Analysis
P Kavita, Swarnali and Anamika Singh*
Corresponding Author: Anamika Singh, Department of Botany, Maitreyi College University of Delhi.
Received: December 21, 2022; Revised: January 20, 2023; Accepted: January 23, 2023 Available Online: January 31, 2023
Citation: Kavita P, Swarnali & Singh A. (2023) Biological Databases and Resources for Sequence Analysis. J Pharm Drug Res, 6(2): 704-710.
Copyrights: ©2023 Kavita P, Swarnali & Singh A. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Share :
  • 415

    Views & Citations
  • 10

    Likes & Shares
Today is the era of bioinformatics as it uses mathematical algorithms to do DNA sequence alignment to protein structure prediction and drug designing making biological research easier? Biological research has generated lots of databases may it be in form of gene sequences, protein sequences or metabolic pathways across various species. These databases are huge and without bioinformatics one can does not make sense out of it. Bioinformatics provides wide range of tools to deal with all the queries raised in biological research and to find a solution to it. In this article we tried to compile all such freely available tools as well as biological databases, one can have ease to access them and have promising prospects in the area of biological research.

Keywords: Biological databases, Tools for sequence analysis, Protein structure, Bioinformatics, Genomics
INTRODUCTION

Data is increasing day by day in every field of research, but issues start with its arrangement, future utility, easy access for others and prospects of data in research and its applications. Experimental data collection is very important for further and increasing amount of data generated from different genomic projects has made the use of computer databases a necessity which helps in rapid assimilation. By using different computer languages and programs theses data can be further analyzed and used for research. With the remarkable increase in the results produced by different biological research, the amount of information stored in the databases doubling every 14-15 months [1] presents a huge demand for analysis and interpretation of these data [2] Bioinformatics organizes data in a way that allows researchers to access existing information and to submit new entries as they are produced, e.g., the Protein Data Bank for 3D macromolecular structures [3,4]. Its aims to develop tools and resources which helps in the analysis of data [5]. Bioinformatics deals with biological information acquisition, processing, storage, distribution, analysis along with data interpretation that uses different tools and techniques of mathematics, computer science, and biology [1]. With the help of bioinformatics, it is easier for biologists to access data from the internet and other fitting websites and easily discovers the composition of any biological molecules such as nucleic acids and proteins. Bioinformatics is having many different branches (Figure 1) and collections of biological sequences information’s in different biological databases. The two most important biological sequences databases are protein databases and nucleic acid databases, while the structural databases are separate and having 3D structural information’s. The main use of the bioinformatics tools are sequence analysis of DNA and protein with the help of different programs and databases available on the web [2].

Bioinformatics can be used for analysis of gene expression and gene analysis, detection of gene regulation networks, analysis of gene and protein structure and its function [2] Prospection of genomic and transcriptomic data [6]. Bioinformatics constitute a wide range of scientific disciplines and genomic analysis [7].

The two main classes of nucleic acids i.e., DNA and RNA functioning as the carrier of genetic information. As DNA double helical structure is well-known structure with defined function, this information is copied and passed on to the next generation [8]. One of the most important molecules in living cells is deoxyribonucleic acid (DNA). The genetic material DNA is a polymer composed of monomeric units known as nucleotides. A nucleotide is made up of a 5-carbon sugar, deoxyribose, a nitrogenous base, and one or more phosphate groups and the phosphate group is acidic, so the name nucleic acid was coined [8].

Similarly, RNA is chemically identical to DNA as it is a chain of similar monomers. RNA is further of three types: mRNA, tRNA and rRNA. RNA molecules are required at all stages of protein synthesis. Messenger RNA transmits the code that specifies the amino acid sequence of the protein; transfer RNA molecules translate the code word for word into protein; and ribosomal RNAs in the ribosome provide part of the machinery to perform the synthesis. Protein structure is known to have a higher degree of conservation compared to sequences due to large variations in sequence within the protein family which can still result in very similar three-dimensional structures [9]. The structure of any protein molecule helps in determining its function [10]. There are many important such findings and, we can analyze it by using different tools used in bioinformatics. Here we are discussing about some most common biological databases and tools used in sequence analysis of protein and nucleic acids.






DISCUSSION

Bioinformatics is a discipline of biology that grows extensively in last few years. Sequences analysis and identification of new gene, proteins and structure are few important applications of bioinformatics [54]. The most important application is designing of 3D structure of proteins whose structures were not predicted by Nuclear magnetic resonance (NMR) and crystallographic method, due to protein bulky size and other limitations [55]. Genome analysis and sequencing of genome of new varieties is possible only because of extensive computational applications of bioinformatics. In this article we tried to compile most of the resources related to protein and nucleic acids that gives new insight to biological research [56-58].

CONCLUSION

Bioinformatics is a young discipline, which is widely used for analysis of genome, prediction of protein and gene structures, cell modeling, analysis of molecular pathways etc. As per the requirement of these tasks, various tools like the ones mentioned in this paper have been successfully curated and has made something as complex as genome sequencing much easier to work with. These tools can be used for various tasks like retrieval of structures, prediction and formation of new structures, comparison of different structures etc. that could be helpful for research of a new macromolecule. All these tools are easy to use and free to access.

 

 

  1. Benton D (1996) Bioinformatics - principles and potential of a new multidisciplinary tool. Trends Biotechnol 14(8): 261-272.
  2. Bayat A (2002) Science, medicine, and the future: Bioinformatics. BMJ 324(7344): 1018-1022.
  3. Bernstein FC, Koetzle TF, Williams GJ, Meyer EF, Jr Bric MD, et al. (1977) The Protein Data Bank. A computer-based archival file for macromolecular structures. Eur J Biochem 80(2): 319-324.
  4. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, et al. (2000) The Protein Data Bank. Nucleic Acids Res 28(1): 235-242.
  5. Luscombe NM, Greenbaum D, Gerstein M (2001) What is bioinformatics? An introduction and overview. Yearb Med Inform 1: 83-99.
  6. Jorge NA, Ferreira CG, Passetti F (2012) Bioinformatics of Cancer ncRNA in High Throughput Sequencing: Present State and Challenges. Front Genet 3:
  7. Zagursky RJ, Russel D (2001) Bioinformatics: Use in Bacterial Vaccine Discovery. Biotechniques 31: 636-659.
  8. Minchin S, Lodge J (2019) Understanding biochemistry: Structure and function of nucleic acids. Essays Biochem 63(4): 433-456.
  9. Al-Karadaghi S (2022) Basic Principles of Protein Three-Dimensional Structure. Available online at: https://proteinstructures.com/structure/introduction/
  10. Alberts B, Johnson A, Lewis J, Raff M, Roberts K, et al. (2022) The Shape and Structure of Proteins. Available online at: https://www.ncbi.nlm.nih.gov/books/NBK26830/
  11. Nielsen M, Lundegaard C, Lund O, Petersen TN (2010) CPHmodels-3.0--remote homology modeling using structure-guided sequence profiles. Nucleic Acids Res 38: W576-W581.
  12. Lambert C, Léonard N, De Bolle X, Depiereux E (2002) ESyPred3D: Prediction of proteins 3D structures. Bioinformatics (Oxford, England) 18(9): 1250-1256.
  13. Kelley LA, Mezulis S, Yates CM, Wass MN, Sternberg MJ (2015) The Phyre2 web portal for protein modeling, prediction and analysis. Nat Protoc 10(6): 845-858.
  14. Guex N, Peitsch MC, Schwede T (2009) Automated comparative protein structure modeling with SWISS-MODEL and Swiss-PdbViewer: A historical perspective. Electrophoresis 30 Suppl 1: S162-S173.
  15. Roy A, Kucukural A, Zhang Y (2010) I-TASSER: A unified platform for automated protein structure and function prediction. Nat Protoc 5(4): 725-738.
  16. Kim DE, Chivian D, Baker D (2004) Protein structure prediction and analysis using the Robetta server. Nucleic Acids Res 32: W526-W531.
  17. Lamiable A, Thévenet P, Rey J, Vavrusa M, Derreumaux P, et al. (2016) PEP-FOLD3: Faster de novo structure prediction for linear peptides in solution and in complex. Nucleic Acids Res 44(W1): W449-W454.
  18. Chen CC, Hwang JK, Yang JM (2006) (PS)2: Protein structure prediction server. Nucleic Acids Res 34: W152-W157.
  19. Zemla A, Zhou CE, Slezak T, Kuczmarski T, Rama D, et al. (2005) AS2TS system for protein structure modeling and analysis. Nucleic Acids Res 33: W111-W115.
  20. Källberg M, Wang H, Wang S, Peng J, Wang Z, et al. (2012) Template-based protein structure modeling using the RaptorX web server. Nat Protoc 7(8): 1511-1522.
  21. Fiser A, Do RK, Sali A (2000) Modeling of loops in protein structures. Protein Sci 9(9): 1753-1773.
  22. Deprez, C, Lloubès R, Gavioli M, Marion D, Guerlesquin F, et al. (2005) Solution structure of the coli TolA C-terminal domain reveals conformational changes upon binding to the phage g3p N-terminal domain. J Mol Biol 346(4): 1047-1057.
  23. Berjanskii M, Tang P, Liang J, Cruz JA, Zhou J, et al. (2009) GeNMR: A web server for rapid NMR-based protein structure determination. Nucleic Acids Res 37: W670-W677.
  24. Ye Y, Godzik A (2003) Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics (Oxford, England) 19 Suppl 2: ii246-ii255.
  25. Maiti R, Van Domselaar GH, Zhang H, Wishart DS (2004) SuperPose: A simple server for sophisticated structural superposition. Nucleic Acids Res 32: W590-W594.
  26. Buchan DW, Minneci F, Nugent TC, Bryson K, Jones DT (2013) Scalable web services for the PSIPRED Protein Analysis Workbench. Nucleic Acids Res 41: W349-W357.
  27. Gelly JC, Brevern AG, Hazout S (2006) Protein peeling: An approach for splitting a 3D protein structure into compact fragments. Bioinformatics 22(2): 129-133.
  28. Negi S, Schein C, Oezguen N, Power T, Braun W (2007) InterProSurf: A web server for predicting interacting sites on protein surfaces. Bioinformatics 23(24): 3397-3399.
  29. Léonard S, Joseph A, Srinivasan N, Gelly J, de Brevern A (2013) mulPBA: An efficient multiple protein structure alignment method based on a structural alphabet. J Biomol Struct Dyn 32(4): 661-668.
  30. Vriend G (1990) WHAT IF: A molecular modeling and drug design program. J Mol Graph 8(1): 52-29.
  31. Ilinkin I, Ye J, Janardan R (2010) Multiple structure alignment and consensus identification for proteins. BMC Bioinformatics 11: 71.
  32. Gelly JC, Joseph AP, Srinivasan N, de Brevern AG (2011) iPBA: A tool for protein structure comparison using sequence alignment strategies. Nucleic Acids Res 39: W18-W23.
  33. Kulkarni-Kale U, Bhosle S, Kolaskar AS (2005) CEP: A conformational epitope prediction server. Nucleic Acids Res 33: W168-W171.
  34. Milburn D, Laskowski RA, Thornton JM (1998) Sequences annotated by structure: A tool to facilitate the use of structural information in sequence analysis. Protein Eng 11(10): 855-859.
  35. Ghouzam Y, Postic G, Guerin PEG, de Brevern AG, Gelly JC (2016) ORION: A web server for protein fold recognition and structure prediction using evolutionary hybrid profiles. Sci Rep 6(1): 28268.
  36. Masso M, Vaisman II (2010) AUTO-MUTE: Eeb-based tools for predicting stability changes in proteins due to single amino acid replacements. Protein Eng Des Sel 23(8): 683-687.
  37. Maiti R, van Domselaar GH, Wishart DS (2005) Movie Maker: A web server for rapid rendering of protein motions and interactions. Nucleic Acids Res 33: W358-W362.
  38. Pires DE, Ascher DB, Blundell TL (2014) DUET: A server for predicting effects of mutations on protein stability using an integrated computational approach. Nucleic Acids Res 42: W314-W319.
  39. Sievers F, Wilm A, Dineen D, Gibson TJ, Karplus K, et al. (2011) Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega. Mol Syst Biol 7(1): 539.
  40. Sagendorf JM, Berman HM, Rohs R (2017) DNA proDB: An interactive tool for structural analysis of DNA-protein complexes. Nucleic Acids Res 45: W89-W97.
  41. Gruber AR, Lorenz R, Bernhart SH, Neuböck R, Hofacker IL (2008) The Vienna RNA website. Nucleic Acids Res 36: W70-W74.
  42. Lu X, Bussemaker H, Olson W (2015) DSSR: An integrated software tool for dissecting the spatial structure of RNA. Nucleic Acids Res 43(21): e142.
  43. Lai D, Proctor JR, Zhu JY, Meyer IM (2012) R-CHIE: A web server and R package for visualizing RNA secondary structures. Nucleic Acids Res 40(12): e95.
  44. Sarver M, Zirbel CL, Stombaugh J, Mokdad A, Leontis NB (2008) FR3D: Finding local and composite recurrent structural motifs in RNA 3D structures. J Math Biol 56(1-2): 215-252.
  45. Kirillova S, Tosatto SC, Carugo O (2010) FRASS: The web-server for RNA structural comparison. BMC Bioinformatics 11: 327.
  46. Capriotti E, Marti-Renom MA (2009) SARA: A server for function annotation of RNA structures. Nucleic Acids Res 37(2): W260-W265.
  47. Lu XJ, Olson WK (2003) 3DNA: A software package for the analysis, rebuilding and visualization of three-dimensional nucleic acid structures. Nucleic Acids Res 31(17): 5108-5121.
  48. Chen VB, Arendall WB, 3rd Headd JJ, Keedy DA, Immormino RM, et al. (2010) Mol Probity: All-atom structure validation for macromolecular crystallography. Acta Crystallogr D Biol Crystallogr 66(Pt 1): 12-21.
  49. ČEch P, Svozil D, Hoksza D (2012) SETTER: Web server for RNA structure comparison. Nucleic Acids Res 40(W1): W42-W48.
  50. Petrov AS, Bernier CR, Hershkovits E, Xue Y, Waterbury CC, et al. (2013) Secondary structure and domain architecture of the 23S and 5S rRNAs. Nucleic Acids Res 41(15): 7522-7535.
  51. Rahrig RR, Petrov AI, Leontis NB, Zirbel CL (2013) R3D Align web server for global nucleotide to nucleotide alignments of RNA 3D structures. Nucleic Acids Res 41: W15-W21.
  52. Yang H, Jossinet F, Leontis N, Li C, Westbrook J, et al. (2003) Article navigation tools for the automatic identification and classification of RNA base pairs. Nucleic Acids Res 31(13): 3450-3460.
  53. Kerpedjiev P, Hammer S, Hofacker IL (2015) Forna (force-directed RNA): Simple and effective online RNA secondary structure diagrams. Bioinformatics (Oxford, England) 31(20): 337-3379.
  54. Robinson SW, Leader DP (2014) Bioinformatics: Concepts, Methods, and Data, in Handbook of Pharmacogenomics and Stratified Medicine.
  55. Fowler NJ, Sljoka A, Williamson MP (2020) A method for validating the accuracy of NMR protein structures. Nat Commun 11(1): 6321.
  56. Raux E, Schubert HL, Roper JM, Wilson KS, Warren MJ (1999) Vitamin b12: Insights into Biosynthesis’s Mount Improbable. Bioorg Chem 27(2): 100-118.
  57. Al-Karadaghi S, Hansson M, Nikonov S, Jönsson B, Hederstedt L (1997) Crystal structure of ferro chelatase: The terminal enzyme in heme biosynthesis. Structure 5(11): 1501-1510.
  58. Li F, Ryvkin P, Childress D, Valladares O, Gregory B, et al. (2012) SAVoR: A server for sequencing annotation and visualization of RNA structures. Nucleic Acids Res 40(W1): W59-W64.