Bioinformatics terms

Basic Information

Standard ID: GB/T 29859-2013

Standard Name:Bioinformatics terms

Chinese Name: 生物信息学术语

Standard category:National Standard (GB)

state:in force

Date of Release2013-11-12

Date of Implementation:2014-04-15

standard classification number

Standard ICS number:General, Terminology, Standardization, Documentation>>Vocabulary>>01.040.01 General, Terminology, Standardization, Documentation (Vocabulary)

Standard Classification Number:General>>Basic Standards>>A22 Terms and Symbols

associated standards

Publication information

publishing house:China Standards Press

Publication date:2014-04-15

other information

drafter:Qi Fei, Sun Guangzhi, Zhang Yurun, Jiang Zhou, Ren Guanhua, Zhang Rui, Wang Jun, Li Yingrui, Yang Ling.

Drafting unit:China National Institute of Standardization, Shenzhen BGI Genomics Institute, Tianjin BGI Genomics Technology Co., Ltd.

Focal point unit:China National Institute of Standardization

Proposing unit:China National Institute of Standardization

Publishing department:General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China Standardization Administration of China

competent authority:China National Institute of Standardization

Skip to download

Introduction to standards:

Standard number: GB/T 29859-2013
Standard name: Bioinformatics terms
English name: Bioinformatics terms
Standard format: PDF
Release time: 2013-11-12
Implementation time: 2014-04-15
Standard size: 990K
Standard introduction: This standard specifies the basic terms and definitions in the field of bioinformatics.
This standard is applicable to the unification and coordination of relevant concepts in the field of bioinformatics, as well as academic exchanges and knowledge dissemination.
2 Terms and definitions
2.1 Bioinformatics
2.1.1
Bioinformatics
An interdisciplinary subject that applies the methods and techniques of information science and related disciplines to study and analyze the storage, processing and transmission of information in biological systems and biological processes.
2.2 Biological sequence alignment
2.2.1
Sequence alignment
The process of comparing the similarity between two or more nucleotide or amino acid sequences.
2.2.2
Structural alignment
The process of comparing the similarity between the spatial structures of two or more protein or nucleic acid molecules.
This standard specifies the basic terms and their definitions in the field of bioinformatics. This standard is applicable to the unification and coordination of relevant concepts in the field of bioinformatics, as well as academic exchanges and knowledge dissemination. This standard was drafted in
accordance with the rules given in GB/T1.1-2009.
This standard was proposed and managed by the China National Institute of Standardization.
The drafting units of this standard: China National Institute of Standardization, Shenzhen BGI Institute, Tianjin BGI Genomics Technology Co., Ltd.
The main drafters of this standard are Qi Fei, Sun Guangzhi, Zhang Yurun, Jiang Zhou, Ren Guanhua, Zhang Rui, Wang Jun, Li Yingrui and Yang Ling.

Foreword I
1 Scope 1
2 Terms and definitions 1
Index 9
References 15

Some standard content:

ICS 01.040.01
iiikAa~cJouakAa
National Standard of the People's Republic of China
GB/T29859-2013
Bioinformatics Terms
Bioinformaticsterms
2013-11-12 Issued
General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China Standardization Administration of China
2014-04-15 Implementation
GB/T298592013
Terms and Definitions
References
iiiKAa~cJouaKA
This standard was drafted in accordance with the regulations given in GB/T11-2009. This standard was proposed and managed by the China National Institute of Standardization. iiikAa~cJouakAa
GB/T29859-2013
Drafting organizations of this standard: China National Institute of Standardization, Shenzhen BGI Genomics Institute, Tianjin BGI Genomics Technology Co., Ltd. Main drafters of this standard: Jie Fei, Sun Guangzhi, Zhang Yurun, Jiang Zhou, Ren Guanhua, Zhang Rui, Wang Jun, Li Yingrui, Yang Ling Scope
Terms of bioinformatics
This standard specifies the basic terms and definitions in the field of bioinformaticsiiikAa~cJouakA
GB/T29859—2013
This standard is applicable to the unification and coordination of relevant concepts in the field of bioinformatics, as well as academic exchanges and knowledge disseminationTerms and definitions
2.1 Bioinformatics
Bioinformatics
A cross-disciplinary subject that applies the methods and techniques of information science and related disciplines to study and analyze the storage, processing and transmission of information in biological systems and biological processes.
2.2 Biological sequence alignment
Sequence alignmentsequence alignment
The process of comparing the similarity between two or more nucleotide or amino acid sequences. 2.2.2
Structural alignmentstructural alignment
The process of comparing the similarity between the spatial structures of two or more protein or nucleic acid molecules2.2.3
Basic local alignment search toolbasic local alignment searchtool;BLASTSequence database search tool based on local alignment. 2.2.4
Nucleic acid sequencenucleic acid sequenceThe order of arrangement of (deoxy) nucleotides in nucleic acid molecules, usually described by the IUPAC nucleotide character set2.2.5
Structural domainstructural domain
A unit in a protein or nucleic acid molecule with a specific folding structure and function. 2.2.6
Untranslated regionuntranslated region;UTRSequences located on both sides of the coding sequence in messenger RNA that are not translated into proteins. 2.2.7
Conserved sequence
conserved sequence
Nucleotide sequence or amino acid sequence that remains basically unchanged during the evolution process. 2.2.8
Exon
A part of a eukaryotic gene that is retained in the mature RNA molecule after splicing. GB/T29859-2013
Expressed sequence tag: EsT A fragment of a gene expression sequence under specific conditions. 2.2,10
Sequence tag site, sequencedtagged site; STsiiikAa~cJouakA
A specific locus in the genome that can be used as a marker for chromosome positioning when drawing a physical map of the genome. 2.2.11
Tandem repeat sequence A fragment of a sequence that is repeated multiple times and is connected end to end on a chromosome. 2.2.12
Gene family
gene family
A group of genes that exist in multiple species and originate from a common ancestor, or a group of genes in the same species that are similar in structure and function and have closely related evolutionary origins.
Gene mapping
gene mapping
The process of determining the relative position and distance of genes in a DNA chromosome or chromatin. 2.2.14
Genetic linkage mapgenetie linkagemapA map showing the relative positions of genes and specific polymorphic DNA in the genome. 2.2.15
Physical mapphysicalmap
A map showing the precise location of DNA in the genome, 2.2.16
restrictionmap
restriction enzyme map
A map describing the frequency of occurrence of specific recognition sequences of restriction endonucleases on the DNA chain and their relative positions.
Gene prediction
geneprediction
The process of using algorithms to search for specific regions in DNA sequences, such as the start and end regions of genes, to predict potential genes or to predict new genes based on similarities with known genes, 2.2.18
transcription
The process of copying the genetic information of DNA into the genetic information of RNA. 2.2.19
Transcription factor
Protein that is involved in recognizing promoters, enhancers or specific sequences in DNA sequences and regulating gene expression. 2.2.20
Intron
A part of a eukaryotic gene that is not retained in the mature RNA molecule after splicing. 2.2.21
Codon
The basic coding unit of messenger RNA (mRNA) composed of three adjacent nucleotides, which determines the primary structure of the amino acid chain generated during translation and the start and end of translation. 2.2.22
Transposon
A fragment of DNA that can move in the genome. 2.2.23
open reading frameORF
Open reading frame
iiikAa~cJouakA
GB/T298592013
A nucleotide sequence from the start codon to the stop codon that can encode a complete amino acid sequence. There is no stop codon that interrupts translation.
Operon
A transcriptional functional unit composed of regulatory genes, promoters and structural genes in prokaryotes2.2.25
Promoter
A sequence on a DNA molecule that can bind to RNA polymerase and form a transcription initiation complex. 2.2.26
Terminator
A DNA sequence located at the end of a gene or operon that can terminate RNA polymerase transcription during transcription and terminate RNA synthesis.
Noncoding RNAnoncoding RNA
Ribonucleic acid sequence that can be transcribed but cannot encode proteins. 2.2.28
Homology
Biological characteristics of two or more organisms that originated from a common ancestor. 2.2.29
Homologous gene
Gene with homologous characteristics.
Homologs
Sequences with homologous characteristics.
Paralogous gene, paralogous gene Genes with a common origin that are produced by gene duplication in the same species. 2.2.32
Orthologous gene: Genes with a common origin in different species. 2.2.33
Single nucleotide polymorphismSNPAt the genomic level, a polymorphism in the DNA sequence caused by a variation (substitution, insertion or deletion) of a single nucleotide site.
Conserved domain
conserved domain
A domain that remains essentially unchanged during evolution. GB/T29859-2013
Synteny
The characteristic of the same arrangement of loci in two or more genomes (fragments). 2.2.36
Gene locus
The specific location of a gene on a chromosome. 2.2.37
Needleman-Wunsch algorithmNeedleman-Wunsch algorithm uses dynamic programming to perform global alignment of sequences. 2.2.38
Smith-Waterman algorithm Smith-Waterman algorithm uses dynamic programming to perform local alignment of sequences. 2.2.39
Sequence similarity
sequence similarity
iiikAa~cJouakA
In the process of sequence alignment, it is used to refer to the proportion of identical deoxyribonucleic acid bases or amino acid residue sequences between the test sequence and the target sequence.
global alignment
The alignment result includes the alignment of all sites within the full length of the compared sequences. 2.2.41
local alignment
Alignment of local segments with high similarity levels. 2.2.42
silencer
Negative regulatory element that regulates gene transcription in eukaryotes. 2.2.43
Enhancer
A positive regulatory element that regulates gene transcription in eukaryotes. 2.2.44
Donor site
The 5°-end linker sequence of an intron during RNA splicing. 2.2.45
Acceptor site
The 3°-end linker sequence of an intron during RNA splicing. 2.3 Molecular phylogeny
Neighbor-joining method, a method of molecular system analysis using distance, 2.3.2
Phylogenetic taxonomy, a method of classifying species according to their genetic relationships during evolution. 2.3.3
Phylogenetic diagram
Schematic diagram of a phylogenetic tree that uses branch length to represent evolutionary time. 2.3.4
Phylogenetic tree
Schematic diagram used to represent the process of phylogeny. 2.3.5
Clatie
Branches on an evolutionary tree during biological evolution. 2.3.6
Genetie drift
Random drift of genes that have nothing to do with selection pressure. 2.3.7
Cladogram
Cladogram
A dendrite diagram representing the most recent common ancestor of each node branch in an evolutionary tree. 2.3.8
bootstraptest
A quantitative test of the degree of confidence.
synonymous substitution
iiikAa~cJouakA
GB/T29859-—2013
A nucleotide substitution that does not change the expressed amino acid sequence at the level of the DNA coding sequence. 2.3.10
non synonymous substitutionWww.bzxZ.net
Non-synonymous substitution
A nucleotide substitution that changes the expressed amino acid sequence at the level of the DNA coding sequence. 2.3.11
positive selection
Positive selection
The phenomenon that the probability of a certain mutation being fixed is significantly increased in species or biological molecules under environmentally specific selection pressure. 2.3.12
negative selection
negative selection
purifying selection
the phenomenon that the probability of a species or biological molecule being fixed under environmentally specific selection pressure is significantly reduced. 2.3.13
substitution ratio Ka/Ks:dN/ds
the ratio between the non-synonymous substitution rate (Ka or dN) and the synonymous substitution rate (Ks or dS) is used to determine whether there is selection pressure acting on a gene.
2.4 Omics related
comparative genomics is a discipline that uses computers and laboratory high-throughput screening methods to compare the genomic information of various organisms in order to understand biological processes and phenomena at the genomic level. 2.4.2
Transcriptomics
transcriptomies
The discipline that studies the transcription of genes in cells and the rules of transcriptional regulation at the overall level. GB/T29859-2013
Transcriptome sequencing technology RNA-Seq
The method of sequencing the transcriptome in cells using high-throughput sequencing technology. 2.4.4
Biochips
Microprocessors prepared by or using biotechnology 2.4.5
Microarrays
iiikAa~cJouakA
Ordered arrays of DNA probes or protein probes fixed at known positions on a solid substrate. 2.4.6
Functional genomics The discipline that studies the functions of each gene in the genome, including the expression of genes and their regulatory patterns. 2.4.7
Proteomics
A discipline that analyzes the dynamic changes of protein components, expression levels and modification states in cells from a holistic perspective, understands the interactions and connections between proteins, and reveals the functions of proteins and the laws of cell life activities. 2.4.8
Serial analysis of gene expression SAGE is an experimental technique that detects the types of gene expression and their abundance on a large scale by constructing shorter expression sequence tags. 2.4.9
Genomics
A discipline that uses whole genome sequence information and high-throughput gene technology to study the molecular mechanisms of the structure, function and evolution of biological systems at the genome level.
Structural genomics: A discipline that studies the genomic structure of genes and proteins using genome maps, sequencing, composition and protein structure identification at the genome level.
Genome annotation
genome annotation
The process of identifying and marking functional units of genome sequences and special signals on the genome. 2.4.12
Gene ontology
gene ontology; GO
A semantic vocabulary that defines and describes the functions of genes and proteins. 2.4.13
Sequencing
The process of determining the sequence of amino acids or nucleic acids. 2.4.14
Contig
DNA fragments obtained by sequencing, based on their overlap, form a long DNA fragment without deletions. 2.4.15
Shotgun sequencing is a method of breaking a genome into DNA fragments and sequencing them. 6
Epigenomics
A branch of genomics that studies epigenetic variation. 2.4.17
Histone modification is a chemical modification that occurs on histones, which are components of chromosomes. 2.4.18
iiikAa~cJouakA
GB/T29859-2013
Chromatin Immunoprecipitation-chip technologyChromatin Immunoprecipitation-chip: ChiP-chip combines chromatin immunoprecipitation technology (ChIP) with chip technology, and can quickly determine the exact binding segment of specific DNA binding proteins in the chromosomes of the target genome. 2.4.19
Chromatin Immunoprecipitation technologyChromatin Immunoprecipitation: ChIP is a technical method for studying the interaction between proteins and DNA in vivo. 2.4.20
Chromatin Immunoprecipitation-SequencingChromatin Immunoprecipitation-Sequencing; ChiP-Seq combines chromatin immunoprecipitation technology (ChIP) with high-throughput sequencing technology, and can quickly determine the exact binding segment of specific DNA binding proteins in the chromosomes of the target genome. 2.4.21
Systems biology
systemsbiology
A discipline that integrates omics data at different levels to understand how biological systems function. 2.4.22
PharmacogenomicsA discipline that studies the differences in drug responses among individuals and populations at the genomic level, and explores personalized medication and the development of new drugs for special populations.
Gene regulatory networkgeneregulationnetworkA network of interactions formed by DNA, RNA, proteins and metabolic intermediates involved in gene expression regulation in cells. 2.5Structure and prediction of biological macromolecules
Structural biologyStructural biologyA discipline that studies the three-dimensional structure of biological macromolecules and the relationship between structure and corresponding function by biophysical and biochemical methods. 2.5.2
Motifs
Small and highly conserved regions in protein or nucleic acid sequences that are involved in interactions. 2.5.3
Protein secondary structure prediction
Protein secondary structure predictionA method for predicting the possible secondary structure formed by amino acid sequences. 2.5.4
Protein tertiary structure prediction proteintertiarystructurepredietion A method for predicting the possible tertiary structure formed by an amino acid sequence. GB/T29859-2013
Homology modeling methodiiiKAa~cJouakA
A method for finding a homologous protein of known structure through homology analysis, and then using the structure of the protein as a template to establish a structural model for a protein of unknown structure.
Molecular docking
moleculardocking
A method for simulating the recognition process between two or more molecules. 2.6 Biological database mining
Biological data mining biological datamining The process of revealing hidden, previously unknown and valuable information from a large amount of data in a biological database 2.6.2
Hidden Markov Model HiddenMarkovModelHMM is a probability model represented by parameters for describing the statistical characteristics of a random process, consisting of a Markov chain and a general random process. Note: It is often used for motif search and recognition in bioinformatics. 2.6.3
Support Vector Machines: SVM - a method of using hyperplanes to separate points in a high-dimensional space supported by data points. Note: It is often used for pattern recognition and classification in bioinformatics. 2.6.4
Expectation maximization algorithm, EM is an iterative algorithm for calculating maximum likelihood estimates or posterior distributions in the case of incomplete data. Note: It is often used for estimating model parameters in bioinformatics. 2.6.5
Principal component analysis; PCA is a statistical analysis method that concentrates information scattered on a set of variables on several comprehensive indicators (principal components). Note: It is often used in multi-dimensional industries in bioinformatics. Chinese Index
Conserved domain
Conserved sequence
Comparative genomics
Expressed sequence tag
Epigenetics
Operon
Silencer
Tandem repeat
Purifying selection
Single nucleotide polymorphism
Protein secondary structure prediction·
Protein tertiary structure prediction
Proteomics·
Contig
Non-coding RNA
Non-translated Region·
Non-synonymous substitution
Molecular docking·
Negative selection
Functional genomics
Donor site
Nucleic acid sequence·
Basic local alignment search tool
...2.4.13
.2.2.42
...2.2.11
.2.3.12
.2.4.14
.2.2.27
.2.2.44
Gene Theory
iiiKAa~cJouakA
GB/T298592013
Gene expression series analysis·
Gene family
Gene regulatory network
Gene prediction
Genomics·
Genome annotation
Gene mapping
Locus
Structural alignment
Structural genomics
Structural biology
Domain.
Evolutionary branch diagram·||t t||Evolutionary Clade
Local Alignment
Open Reading Frame
Neighbor-Joining
Codon
Needleman-Wunsch Algorithm
Intron
Shotgun Sequencing
Paralogous Genes
Expectation Maximization Algorithm
Promoter
..2.2.17
.2.4.11
.2.2.13
.2.3.5
.2.2.202
Motif
A small and highly conserved region in a protein or nucleic acid sequence that is involved in interactions2.5.3
Protein secondary structure prediction
Protein secondary structure predictionA method for predicting the possible secondary structure formed by an amino acid sequence. 2.5.4
Protein tertiary structure predictionA method for predicting the possible tertiary structure formed by an amino acid sequence. GB/T29859-2013
Homology modeling methodiiiKAa~cJouakA
A method for finding a homologous protein of known structure through homology analysis, and then using the structure of the protein as a template to build a structural model for a protein of unknown structure.
Molecular docking
moleculardocking
A method for simulating the recognition process between two or more molecules. 2.6 Biological database mining
Biological data mining biological data mining The process of revealing hidden, previously unknown and valuable information from a large amount of data in a biological database 2.6.2
Hidden Markov Model HMM A probability model used to describe the statistical characteristics of a random process, represented by parameters, consisting of a Markov chain and a general random process. Note: It is often used for model search and recognition in bioinformatics. 2.6.3
Support Vector Machines Support Vector Machines: SVM-A method of separating points in a high-dimensional space supported by data points using hyperplanes. Note: It is often used in pattern recognition and classification in bioinformatics. 2.6.4
Expectation maximization algorithm expectation maximization algorithm, EM is an iterative algorithm for calculating the maximum likelihood estimate or posterior distribution in the case of incomplete data. Note: It is often used for estimating model parameters in bioinformatics. 2.6.5
Principal component analysisprinciple component analysis; PCA is a statistical analysis method that concentrates the information scattered on a set of variables on several comprehensive indicators (principal components). Note: It is often used in multidimensional industries in bioinformatics. Chinese index
Conserved domain
Conserved sequence
Comparative genomics
Expressed sequence tag
Epigenetics
Operon
Silencer
Tandem repeat sequence
Purification selection
Single nucleotide polymorphism
Protein secondary structure prediction·
Protein tertiary structure prediction
Proteomics·
Contig
Non-coding RNA
Non-translated Region·
Non-synonymous substitution
Molecular docking·
Negative selection
Functional genomics
Donor site
Nucleic acid sequence·
Basic local alignment search tool
...2.4.13
.2.2.42
...2.2.11
.2.3.12
.2.4.14
.2.2.27
.2.2.44
Gene Theory
iiiKAa~cJouakA
GB/T298592013
Gene expression series analysis·
Gene family
Gene regulatory network
Gene prediction
Genomics·
Genome annotation
Gene mapping
Locus
Structural alignment
Structural genomics
Structural biology
Domain.
Evolutionary branch diagram·||t t||Evolutionary Clade
Local Alignment
Open Reading Frame
Neighbor-Joining
Codon
Needleman-Wunsch Algorithm
Intron
Shotgun Sequencing
Paralogous Genes
Expectation Maximization Algorithm
Promoter
..2.2.17
.2.4.11
.2.2.13
.2.3.5
.2.2.202
Motif
A small and highly conserved region in a protein or nucleic acid sequence that is involved in interactions2.5.3
Protein secondary structure prediction
Protein secondary structure predictionA method for predicting the possible secondary structure formed by an amino acid sequence. 2.5.4
Protein tertiary structure predictionA method for predicting the possible tertiary structure formed by an amino acid sequence. GB/T29859-2013
Homology modeling methodiiiKAa~cJouakA
A method for finding a homologous protein of known structure through homology analysis, and then using the structure of the protein as a template to build a structural model for a protein of unknown structure.
Molecular docking
moleculardocking
A method for simulating the recognition process between two or more molecules. 2.6 Biological database mining
Biological data mining biological data mining The process of revealing hidden, previously unknown and valuable information from a large amount of data in a biological database 2.6.2
Hidden Markov Model HMM A probability model used to describe the statistical characteristics of a random process, represented by parameters, consisting of a Markov chain and a general random process. Note: It is often used for model search and recognition in bioinformatics. 2.6.3
Support Vector Machines Support Vector Machines: SVM-A method of separating points in a high-dimensional space supported by data points using hyperplanes. Note: It is often used in pattern recognition and classification in bioinformatics. 2.6.4
Expectation maximization algorithm expectation maximization algorithm, EM is an iterative algorithm for calculating the maximum likelihood estimate or posterior distribution in the case of incomplete data. Note: It is often used for estimating model parameters in bioinformatics. 2.6.5
Principal component analysisprinciple component analysis; PCA is a statistical analysis method that concentrates the information scattered on a set of variables on several comprehensive indicators (principal components). Note: It is often used in multidimensional industries in bioinformatics. Chinese index
Conserved domain
Conserved sequence
Comparative genomics
Expressed sequence tag
Epigenetics
Operon
Silencer
Tandem repeat sequence
Purification selection
Single nucleotide polymorphism
Protein secondary structure prediction·
Protein tertiary structure prediction
Proteomics·
Contig
Non-coding RNA
Non-translated Region·
Non-synonymous substitution
Molecular docking·
Negative selection
Functional genomics
Donor site
Nucleic acid sequence·
Basic local alignment search tool
...2.4.13
.2.2.42
...2.2.11
.2.3.12
.2.4.14
.2.2.27
.2.2.44
Gene Theory
iiiKAa~cJouakA
GB/T298592013
Gene expression series analysis·
Gene family
Gene regulatory network
Gene prediction
Genomics·
Genome annotation
Gene mapping
Locus
Structural alignment
Structural genomics
Structural biology
Domain.
Evolutionary branch diagram·||t t||Evolutionary Clade
Local Alignment
Open Reading Frame
Neighbor-Joining
Codon
Needleman-Wunsch Algorithm
Intron
Shotgun Sequencing
Paralogous Genes
Expectation Maximization Algorithm
Promoter
..2.2.17
.2.4.11
.2.2.13
.2.3.5
.2.2.20
Cladogram·
Clad
Local alignment
Open reading frame
Neighbor-joining method
Codon
Needleman-Wunsch algorithm
Intron
Shotgun sequencing
Paralogous genes
Expectation-maximization algorithm
Promoter
..2.2.17
.2.4.11
.2.2.13
.2.3.5
.2.2.20
Cladogram·
Clad
Local alignment
Open reading frame
Neighbor-joining method
Codon
Needleman-Wunsch algorithm
Intron
Shotgun sequencing
Paralogous genes
Expectation-maximization algorithm
Promoter
..2.2.17
.2.4.11
.2.2.13
.2.3.5
.2.2.20
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.