Standard ICS number:General, Terminology, Standardization, Documentation>>Vocabulary>>01.040.01 General, Terminology, Standardization, Documentation (Vocabulary)
Standard Classification Number:General>>Basic Standards>>A22 Terms and Symbols
associated standards
Publication information
publishing house:China Standards Press
Publication date:2014-04-15
other information
drafter:Qi Fei, Sun Guangzhi, Zhang Yurun, Jiang Zhou, Ren Guanhua, Zhang Rui, Wang Jun, Li Yingrui, Yang Ling.
Drafting unit:China National Institute of Standardization, Shenzhen BGI Genomics Institute, Tianjin BGI Genomics Technology Co., Ltd.
Focal point unit:China National Institute of Standardization
Proposing unit:China National Institute of Standardization
Publishing department:General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China Standardization Administration of China
competent authority:China National Institute of Standardization
Standard number: GB/T 29859-2013
Standard name: Bioinformatics terms
English name: Bioinformatics terms
Standard format: PDF
Release time: 2013-11-12
Implementation time: 2014-04-15
Standard size: 990K
Standard introduction: This standard specifies the basic terms and definitions in the field of bioinformatics.
This standard is applicable to the unification and coordination of relevant concepts in the field of bioinformatics, as well as academic exchanges and knowledge dissemination.
2 Terms and definitions
2.1 Bioinformatics
2.1.1
Bioinformatics
An interdisciplinary subject that applies the methods and techniques of information science and related disciplines to study and analyze the storage, processing and transmission of information in biological systems and biological processes.
2.2 Biological sequence alignment
2.2.1
Sequence alignment
The process of comparing the similarity between two or more nucleotide or amino acid sequences.
2.2.2
Structural alignment
The process of comparing the similarity between the spatial structures of two or more protein or nucleic acid molecules.
This standard specifies the basic terms and their definitions in the field of bioinformatics.
This standard is applicable to the unification and coordination of relevant concepts in the field of bioinformatics, as well as academic exchanges and knowledge dissemination.
This standard was drafted in accordance with the rules given in GB/T1.1-2009.
This standard was proposed and managed by the China National Institute of Standardization.
The drafting units of this standard: China National Institute of Standardization, Shenzhen BGI Institute, Tianjin BGI Genomics Technology Co., Ltd.
The main drafters of this standard are Qi Fei, Sun Guangzhi, Zhang Yurun, Jiang Zhou, Ren Guanhua, Zhang Rui, Wang Jun, Li Yingrui and Yang Ling.
Foreword I
1 Scope 1
2 Terms and definitions 1
Index 9
References 15
Some standard content:
ICS 01.040.01 iiikAa~cJouakAa National Standard of the People's Republic of China GB/T29859-2013 Bioinformatics Terms Bioinformaticsterms 2013-11-12 Issued General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China Standardization Administration of China 2014-04-15 Implementation GB/T298592013 Terms and Definitions References iiiKAa~cJouaKA This standard was drafted in accordance with the regulations given in GB/T11-2009. This standard was proposed and managed by the China National Institute of Standardization. iiikAa~cJouakAa GB/T29859-2013 Drafting organizations of this standard: China National Institute of Standardization, Shenzhen BGI Genomics Institute, Tianjin BGI Genomics Technology Co., Ltd. Main drafters of this standard: Jie Fei, Sun Guangzhi, Zhang Yurun, Jiang Zhou, Ren Guanhua, Zhang Rui, Wang Jun, Li Yingrui, Yang Ling Scope Terms of bioinformatics This standard specifies the basic terms and definitions in the field of bioinformaticsiiikAa~cJouakA GB/T29859—2013 This standard is applicable to the unification and coordination of relevant concepts in the field of bioinformatics, as well as academic exchanges and knowledge disseminationTerms and definitions 2.1 Bioinformatics Bioinformatics A cross-disciplinary subject that applies the methods and techniques of information science and related disciplines to study and analyze the storage, processing and transmission of information in biological systems and biological processes. 2.2 Biological sequence alignment Sequence alignmentsequence alignment The process of comparing the similarity between two or more nucleotide or amino acid sequences. 2.2.2 Structural alignmentstructural alignment The process of comparing the similarity between the spatial structures of two or more protein or nucleic acid molecules2.2.3 Basic local alignment search toolbasic local alignment searchtool;BLASTSequence database search tool based on local alignment. 2.2.4 Nucleic acid sequencenucleic acid sequenceThe order of arrangement of (deoxy) nucleotides in nucleic acid molecules, usually described by the IUPAC nucleotide character set2.2.5 Structural domainstructural domain A unit in a protein or nucleic acid molecule with a specific folding structure and function. 2.2.6 Untranslated regionuntranslated region;UTRSequences located on both sides of the coding sequence in messenger RNA that are not translated into proteins. 2.2.7 Conserved sequence conserved sequence Nucleotide sequence or amino acid sequence that remains basically unchanged during the evolution process. 2.2.8 Exon A part of a eukaryotic gene that is retained in the mature RNA molecule after splicing. GB/T29859-2013 Expressed sequence tag: EsT A fragment of a gene expression sequence under specific conditions. 2.2,10 Sequence tag site, sequencedtagged site; STsiiikAa~cJouakA A specific locus in the genome that can be used as a marker for chromosome positioning when drawing a physical map of the genome. 2.2.11 Tandem repeat sequence A fragment of a sequence that is repeated multiple times and is connected end to end on a chromosome. 2.2.12 Gene family gene family A group of genes that exist in multiple species and originate from a common ancestor, or a group of genes in the same species that are similar in structure and function and have closely related evolutionary origins. Gene mapping gene mapping The process of determining the relative position and distance of genes in a DNA chromosome or chromatin. 2.2.14 Genetic linkage mapgenetie linkagemapA map showing the relative positions of genes and specific polymorphic DNA in the genome. 2.2.15 Physical mapphysicalmap A map showing the precise location of DNA in the genome, 2.2.16 restrictionmap restriction enzyme map A map describing the frequency of occurrence of specific recognition sequences of restriction endonucleases on the DNA chain and their relative positions. Gene prediction geneprediction The process of using algorithms to search for specific regions in DNA sequences, such as the start and end regions of genes, to predict potential genes or to predict new genes based on similarities with known genes, 2.2.18 transcription The process of copying the genetic information of DNA into the genetic information of RNA. 2.2.19 Transcription factor Protein that is involved in recognizing promoters, enhancers or specific sequences in DNA sequences and regulating gene expression. 2.2.20 Intron A part of a eukaryotic gene that is not retained in the mature RNA molecule after splicing. 2.2.21 Codon The basic coding unit of messenger RNA (mRNA) composed of three adjacent nucleotides, which determines the primary structure of the amino acid chain generated during translation and the start and end of translation. 2.2.22 Transposon A fragment of DNA that can move in the genome. 2.2.23 open reading frameORF Open reading frame iiikAa~cJouakA GB/T298592013 A nucleotide sequence from the start codon to the stop codon that can encode a complete amino acid sequence. There is no stop codon that interrupts translation. Operon A transcriptional functional unit composed of regulatory genes, promoters and structural genes in prokaryotes2.2.25 Promoter A sequence on a DNA molecule that can bind to RNA polymerase and form a transcription initiation complex. 2.2.26 Terminator A DNA sequence located at the end of a gene or operon that can terminate RNA polymerase transcription during transcription and terminate RNA synthesis. Noncoding RNAnoncoding RNA Ribonucleic acid sequence that can be transcribed but cannot encode proteins. 2.2.28 Homology Biological characteristics of two or more organisms that originated from a common ancestor. 2.2.29 Homologous gene Gene with homologous characteristics. Homologs Sequences with homologous characteristics. Paralogous gene, paralogous gene Genes with a common origin that are produced by gene duplication in the same species. 2.2.32 Orthologous gene: Genes with a common origin in different species. 2.2.33 Single nucleotide polymorphismSNPAt the genomic level, a polymorphism in the DNA sequence caused by a variation (substitution, insertion or deletion) of a single nucleotide site. Conserved domain conserved domain A domain that remains essentially unchanged during evolution. GB/T29859-2013 Synteny The characteristic of the same arrangement of loci in two or more genomes (fragments). 2.2.36 Gene locus The specific location of a gene on a chromosome. 2.2.37 Needleman-Wunsch algorithmNeedleman-Wunsch algorithm uses dynamic programming to perform global alignment of sequences. 2.2.38 Smith-Waterman algorithm Smith-Waterman algorithm uses dynamic programming to perform local alignment of sequences. 2.2.39 Sequence similarity sequence similarity iiikAa~cJouakA In the process of sequence alignment, it is used to refer to the proportion of identical deoxyribonucleic acid bases or amino acid residue sequences between the test sequence and the target sequence. global alignment The alignment result includes the alignment of all sites within the full length of the compared sequences. 2.2.41 local alignment Alignment of local segments with high similarity levels. 2.2.42 silencer Negative regulatory element that regulates gene transcription in eukaryotes. 2.2.43 Enhancer A positive regulatory element that regulates gene transcription in eukaryotes. 2.2.44 Donor site The 5°-end linker sequence of an intron during RNA splicing. 2.2.45 Acceptor site The 3°-end linker sequence of an intron during RNA splicing. 2.3 Molecular phylogeny Neighbor-joining method, a method of molecular system analysis using distance, 2.3.2 Phylogenetic taxonomy, a method of classifying species according to their genetic relationships during evolution. 2.3.3 Phylogenetic diagram Schematic diagram of a phylogenetic tree that uses branch length to represent evolutionary time. 2.3.4 Phylogenetic tree Schematic diagram used to represent the process of phylogeny. 2.3.5 Clatie Branches on an evolutionary tree during biological evolution. 2.3.6 Genetie drift Random drift of genes that have nothing to do with selection pressure. 2.3.7 Cladogram Cladogram A dendrite diagram representing the most recent common ancestor of each node branch in an evolutionary tree. 2.3.8 bootstraptest A quantitative test of the degree of confidence. synonymous substitution iiikAa~cJouakA GB/T29859-—2013 A nucleotide substitution that does not change the expressed amino acid sequence at the level of the DNA coding sequence. 2.3.10 non synonymous substitution Non-synonymous substitution A nucleotide substitution that changes the expressed amino acid sequence at the level of the DNA coding sequence. 2.3.11 positive selection Positive selection The phenomenon that the probability of a certain mutation being fixed is significantly increased in species or biological molecules under environmentally specific selection pressure. 2.3.12 negative selection negative selection purifying selection the phenomenon that the probability of a species or biological molecule being fixed under environmentally specific selection pressure is significantly reduced. 2.3.13 substitution ratio Ka/Ks:dN/ds the ratio between the non-synonymous substitution rate (Ka or dN) and the synonymous substitution rate (Ks or dS) is used to determine whether there is selection pressure acting on a gene. 2.4 Omics related comparative genomics is a discipline that uses computers and laboratory high-throughput screening methods to compare the genomic information of various organisms in order to understand biological processes and phenomena at the genomic level. 2.4.2 Transcriptomics transcriptomies The discipline that studies the transcription of genes in cells and the rules of transcriptional regulation at the overall level. GB/T29859-2013 Transcriptome sequencing technology RNA-Seq The method of sequencing the transcriptome in cells using high-throughput sequencing technology. 2.4.4 Biochips Microprocessors prepared by or using biotechnology 2.4.5 Microarrays iiikAa~cJouakA Ordered arrays of DNA probes or protein probes fixed at known positions on a solid substrate. 2.4.6 Functional genomics The discipline that studies the functions of each gene in the genome, including the expression of genes and their regulatory patterns. 2.4.7 Proteomics A discipline that analyzes the dynamic changes of protein components, expression levels and modification states in cells from a holistic perspective, understands the interactions and connections between proteins, and reveals the functions of proteins and the laws of cell life activities. 2.4.8 Serial analysis of gene expression SAGE is an experimental technique that detects the types of gene expression and their abundance on a large scale by constructing shorter expression sequence tags. 2.4.9 Genomics A discipline that uses whole genome sequence information and high-throughput gene technology to study the molecular mechanisms of the structure, function and evolution of biological systems at the genome level. Structural genomics: A discipline that studies the genomic structure of genes and proteins using genome maps, sequencing, composition and protein structure identification at the genome level. Genome annotation genome annotation The process of identifying and marking functional units of genome sequences and special signals on the genome. 2.4.12 Gene ontology gene ontology; GO A semantic vocabulary that defines and describes the functions of genes and proteins. 2.4.13 Sequencing The process of determining the sequence of amino acids or nucleic acids. 2.4.14 Contig DNA fragments obtained by sequencing, based on their overlap, form a long DNA fragment without deletions. 2.4.15 Shotgun sequencing is a method of breaking a genome into DNA fragments and sequencing them. 6 Epigenomics A branch of genomics that studies epigenetic variation. 2.4.17 Histone modification is a chemical modification that occurs on histones, which are components of chromosomes. 2.4.18 iiikAa~cJouakA GB/T29859-2013 Chromatin Immunoprecipitation-chip technologyChromatin Immunoprecipitation-chip: ChiP-chip combines chromatin immunoprecipitation technology (ChIP) with chip technology, and can quickly determine the exact binding segment of specific DNA binding proteins in the chromosomes of the target genome. 2.4.19 Chromatin Immunoprecipitation technologyChromatin Immunoprecipitation: ChIP is a technical method for studying the interaction between proteins and DNA in vivo. 2.4.20 Chromatin Immunoprecipitation-SequencingChromatin Immunoprecipitation-Sequencing; ChiP-Seq combines chromatin immunoprecipitation technology (ChIP) with high-throughput sequencing technology, and can quickly determine the exact binding segment of specific DNA binding proteins in the chromosomes of the target genome. 2.4.21 Systems biology systemsbiology A discipline that integrates omics data at different levels to understand how biological systems function. 2.4.22 PharmacogenomicsA discipline that studies the differences in drug responses among individuals and populations at the genomic level, and explores personalized medication and the development of new drugs for special populations. Gene regulatory networkgeneregulationnetworkA network of interactions formed by DNA, RNA, proteins and metabolic intermediates involved in gene expression regulation in cells. 2.5Structure and prediction of biological macromolecules Structural biologyStructural biologyA discipline that studies the three-dimensional structure of biological macromolecules and the relationship between structure and corresponding function by biophysical and biochemical methods. 2.5.2 Motifs Small and highly conserved regions in protein or nucleic acid sequences that are involved in interactions. 2.5.3 Protein secondary structure prediction Protein secondary structure predictionA method for predicting the possible secondary structure formed by amino acid sequences. 2.5.4 Protein tertiary structure prediction proteintertiarystructurepredietion A method for predicting the possible tertiary structure formed by an amino acid sequence. GB/T29859-2013 Homology modeling methodiiiKAa~cJouakA A method for finding a homologous protein of known structure through homology analysis, and then using the structure of the protein as a template to establish a structural model for a protein of unknown structure. Molecular docking moleculardocking A method for simulating the recognition process between two or more molecules. 2.6 Biological database mining Biological data mining biological datamining The process of revealing hidden, previously unknown and valuable information from a large amount of data in a biological database 2.6.2 Hidden Markov Model HiddenMarkovModelHMM is a probability model represented by parameters for describing the statistical characteristics of a random process, consisting of a Markov chain and a general random process. Note: It is often used for motif search and recognition in bioinformatics. 2.6.3 Support Vector Machines: SVM - a method of using hyperplanes to separate points in a high-dimensional space supported by data points. Note: It is often used for pattern recognition and classification in bioinformatics. 2.6.4 Expectation maximization algorithm, EM is an iterative algorithm for calculating maximum likelihood estimates or posterior distributions in the case of incomplete data. Note: It is often used for estimating model parameters in bioinformatics. 2.6.5 Principal component analysis; PCA is a statistical analysis method that concentrates information scattered on a set of variables on several comprehensive indicators (principal components). Note: It is often used in multi-dimensional industries in bioinformatics. Chinese Index Conserved domain Conserved sequence Comparative genomics Expressed sequence tag Epigenetics Operon Silencer Tandem repeat Purifying selection Single nucleotide polymorphism Protein secondary structure prediction· Protein tertiary structure prediction Proteomics· Contig Non-coding RNA Non-translated Region· Non-synonymous substitution Molecular docking· Negative selection Functional genomics Donor site Nucleic acid sequence· Basic local alignment search tool ...2.4.13 .2.2.42 ...2.2.11 .2.3.12 .2.4.14 .2.2.27 .2.2.44 Gene Theory iiiKAa~cJouakA GB/T298592013 Gene expression series analysis· Gene family Gene regulatory network Gene prediction Genomics· Genome annotation Gene mapping Locus Structural alignment Structural genomics Structural biology Domain. Evolutionary branch diagram·||t t||Evolutionary Clade Local Alignment Open Reading Frame Neighbor-Joining Codon Needleman-Wunsch Algorithm Intron Shotgun Sequencing Paralogous Genes Expectation Maximization Algorithm Promoter ..2.2.17 .2.4.11 .2.2.13 .2.3.5 .2.2.202 Motif A small and highly conserved region in a protein or nucleic acid sequence that is involved in interactions2.5.3 Protein secondary structure prediction Protein secondary structure predictionA method for predicting the possible secondary structure formed by an amino acid sequence. 2.5.4 Protein tertiary structure predictionA method for predicting the possible tertiary structure formed by an amino acid sequence. GB/T29859-2013 Homology modeling methodiiiKAa~cJouakA A method for finding a homologous protein of known structure through homology analysis, and then using the structure of the protein as a template to build a structural model for a protein of unknown structure. Molecular docking moleculardocking A method for simulating the recognition process between two or more molecules. 2.6 Biological database mining Biological data mining biological data mining The process of revealing hidden, previously unknown and valuable information from a large amount of data in a biological database 2.6.2 Hidden Markov Model HMM A probability model used to describe the statistical characteristics of a random process, represented by parameters, consisting of a Markov chain and a general random process. Note: It is often used for model search and recognition in bioinformatics. 2.6.3 Support Vector Machines Support Vector Machines: SVM-A method of separating points in a high-dimensional space supported by data points using hyperplanes. Note: It is often used in pattern recognition and classification in bioinformatics. 2.6.4 Expectation maximization algorithm expectation maximization algorithm, EM is an iterative algorithm for calculating the maximum likelihood estimate or posterior distribution in the case of incomplete data. Note: It is often used for estimating model parameters in bioinformatics. 2.6.5 Principal component analysisprinciple component analysis; PCA is a statistical analysis method that concentrates the information scattered on a set of variables on several comprehensive indicators (principal components). Note: It is often used in multidimensional industries in bioinformatics. Chinese index Conserved domain Conserved sequence Comparative genomics Expressed sequence tag Epigenetics Operon Silencer Tandem repeat sequence Purification selection Single nucleotide polymorphism Protein secondary structure prediction· Protein tertiary structure prediction Proteomics· Contig Non-coding RNA Non-translated Region· Non-synonymous substitution Molecular docking· Negative selection Functional genomics Donor site Nucleic acid sequence· Basic local alignment search tool ...2.4.13 .2.2.42 ...2.2.11 .2.3.12 .2.4.14 .2.2.27 .2.2.44 Gene Theory iiiKAa~cJouakA GB/T298592013 Gene expression series analysis· Gene family Gene regulatory network Gene prediction Genomics· Genome annotation Gene mapping Locus Structural alignment Structural genomics Structural biology Domain. Evolutionary branch diagram·||t t||Evolutionary Clade Local Alignment Open Reading Frame Neighbor-Joining Codon Needleman-Wunsch Algorithm Intron Shotgun Sequencing Paralogous Genes Expectation Maximization Algorithm Promoter ..2.2.17 .2.4.11 .2.2.13 .2.3.5 .2.2.202 Motif A small and highly conserved region in a protein or nucleic acid sequence that is involved in interactions2.5.3 Protein secondary structure prediction Protein secondary structure predictionA method for predicting the possible secondary structure formed by an amino acid sequence. 2.5.4 Protein tertiary structure predictionA method for predicting the possible tertiary structure formed by an amino acid sequence. GB/T29859-2013 Homology modeling methodiiiKAa~cJouakA A method for finding a homologous protein of known structure through homology analysis, and then using the structure of the protein as a template to build a structural model for a protein of unknown structure. Molecular docking moleculardocking A method for simulating the recognition process between two or more molecules. 2.6 Biological database mining Biological data mining biological data mining The process of revealing hidden, previously unknown and valuable information from a large amount of data in a biological database 2.6.2 Hidden Markov Model HMM A probability model used to describe the statistical characteristics of a random process, represented by parameters, consisting of a Markov chain and a general random process. Note: It is often used for model search and recognition in bioinformatics. 2.6.3 www.bzxz.net Support Vector Machines Support Vector Machines: SVM-A method of separating points in a high-dimensional space supported by data points using hyperplanes. Note: It is often used in pattern recognition and classification in bioinformatics. 2.6.4 Expectation maximization algorithm expectation maximization algorithm, EM is an iterative algorithm for calculating the maximum likelihood estimate or posterior distribution in the case of incomplete data. Note: It is often used for estimating model parameters in bioinformatics. 2.6.5 Principal component analysisprinciple component analysis; PCA is a statistical analysis method that concentrates the information scattered on a set of variables on several comprehensive indicators (principal components). Note: It is often used in multidimensional industries in bioinformatics. Chinese index Conserved domain Conserved sequence Comparative genomics Expressed sequence tag Epigenetics Operon Silencer Tandem repeat sequence Purification selection Single nucleotide polymorphism Protein secondary structure prediction· Protein tertiary structure prediction Proteomics· Contig Non-coding RNA Non-translated Region· Non-synonymous substitution Molecular docking· Negative selection Functional genomics Donor site Nucleic acid sequence· Basic local alignment search tool ...2.4.13 .2.2.42 ...2.2.11 .2.3.12 .2.4.14 .2.2.27 .2.2.44 Gene Theory iiiKAa~cJouakA GB/T298592013 Gene expression series analysis· Gene family Gene regulatory network Gene prediction Genomics· Genome annotation Gene mapping Locus Structural alignment Structural genomics Structural biology Domain. Evolutionary branch diagram·||t t||Evolutionary Clade Local Alignment Open Reading Frame Neighbor-Joining Codon Needleman-Wunsch Algorithm Intron Shotgun Sequencing Paralogous Genes Expectation Maximization Algorithm Promoter ..2.2.17 .2.4.11 .2.2.13 .2.3.5 .2.2.20 Cladogram· Clad Local alignment Open reading frame Neighbor-joining method Codon Needleman-Wunsch algorithm Intron Shotgun sequencing Paralogous genes Expectation-maximization algorithm Promoter ..2.2.17 .2.4.11 .2.2.13 .2.3.5 .2.2.20 Cladogram· Clad Local alignment Open reading frame Neighbor-joining method Codon Needleman-Wunsch algorithm Intron Shotgun sequencing Paralogous genes Expectation-maximization algorithm Promoter ..2.2.17 .2.4.11 .2.2.13 .2.3.5 .2.2.20 Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.