title>Specification of description of term component database - GB/T 19102-2003 - Chinese standardNet - bzxz.net
Home > GB > Specification of description of term component database
Specification of description of term component database

Basic Information

Standard ID: GB/T 19102-2003

Standard Name:Specification of description of term component database

Chinese Name: 术语部件库的信息描述规范

Standard category:National Standard (GB)

state:in force

Date of Release2003-05-14

Date of Implementation:2003-12-01

standard classification number

Standard ICS number:General, Terminology, Standardization, Documentation >> 01.020 Terminology (Principles and Coordination)

Standard Classification Number:General>>Basic Standards>>A22 Terms and Symbols

associated standards

Publication information

publishing house:China Standards Press

ISBN:155066.1-19897

Publication date:2003-12-01

other information

Release date:2003-05-14

Review date:2004-10-14

drafter:Ye Sheng, Wu Yunfang, Song Min, Sui Zhifang, Cheng Yonghong, Hu Junfeng, Xiao Yujing

Drafting unit:China Standards Research Center

Focal point unit:National Technical Committee on Terminology Standardization

Proposing unit:National Technical Committee on Terminology Standardization

Publishing department:General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China

competent authority:National Standardization Administration

Introduction to standards:

This standard specifies the information description specification of terminology component library. This standard applies to the research, development, maintenance and related management of terminology component library, and can also be used as a reference in the field of information retrieval. GB/T 19102-2003 Information Description Specification of Terminology Component Library GB/T19102-2003 Standard Download Decompression Password: www.bzxz.net
This standard specifies the information description specification of terminology component library. This standard applies to the research, development, maintenance and related management of terminology component library, and can also be used as a reference in the field of information retrieval.


Some standard content:

ICS01.020
National Standard of the People's Republic of China
GB/T19102—2003
Specification for description of term component databaseIssued on 2003-05-14
General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China
Implementation on 2003-12-01
1 Scope
2 Normative references
3 Terms and definitions
4 Information description of term component database
5 Construction of term component database
Appendix A (informative)
Structural semantic information description of term component
GB/T19102—2003
This standard is one of the series of national standards for term database. The series of standards that have been issued are:
GB/T13726—1992
GB/T16785—1997
GB/T16786—1997
GB/T17532—1998
GB/T18155—2000
Magnetic tape format for recording and exchanging terminology and dictionary entries
Terminology work
Harmonization of concepts and terms
Computer applications
Data categories
Terminology work computer applications
GB/T19102—2 003
Computer application of terminology
Machine-readable terminology interchange format (MARTIF) negotiation and exchange GB/T13725—2001
General principles and methods for establishing terminology databases GB/T15387.1—2001
Guide for the preparation of terminology database development documents
GB/T15387.2—2001Guide for the development of terminology databases GB/T15625—2001Guide for the technical evaluation of terminology databases GB/T19101—2003General principles and methods for establishing terminology corpora Appendix A of this standard is an informative appendix.
This standard is proposed by the National Technical Committee for Terminology Standardization. This standard is under the jurisdiction of the China Standards Research Center. This standard was drafted by the China Standards Research Center, Institute of Computational Linguistics of Peking University and other units. The main drafters of this standard are: Ye Sheng, Wu Yunfang, Song Min, Sui Zhifang, Cheng Yonghong, Hu Junfeng, Xiao Yujing. GB/T19102—2003
Term component library is a knowledge base containing rich information. This rich information is helpful for the automatic discovery of new terms, the automatic definition of terms, the establishment of term concept system and other related research work. 1 Scope
Information description specification of term component library
This standard specifies the information description specification of term component library. GB/T19102—2003
This standard is applicable to the research, development, maintenance and related management of term component library, and can also be used as a reference in the field of information retrieval. 2 Normative references
The clauses in the following documents become the clauses of this standard through reference in this standard. For all dated referenced documents, all subsequent amendments (excluding errata) or revisions are not applicable to this standard. However, the parties who reach an agreement based on this standard are encouraged to study whether the latest versions of these documents can be used. For all undated referenced documents, the latest versions are applicable to this standard. GB/T13715 Standard for the segmentation of modern Chinese words for information processing GB/T13725 General principles and methods for establishing terminology database GB/T15237.1-2000 Vocabulary for terminology work Part 1: Theory and application (egvISO1087-1:2000) GB/T17532-1998 Vocabulary for computer applications of terminology work (eqvISO/DIS1087-2-2:1996) 3 Terms and definitions
The terms and definitions established in GB/T15237.1-2000 and GB/T17532-1998 apply to this standard. For ease of use, this standard repeats some of the terms and definitions. 3.1
Term term
The word designation of general concepts in a specific professional field. [GB/T15237.1—2000, 3.4.3]3.2
TerminologicaldatabaseA database containing terminological data. [GB/T17532—1998, 7.6]3.3
Singlewordterm
A term consisting of a single word.
Multi-wordterm
A term consisting of multiple words.
Term componenttermcomponent
The words that make up a multi-word term. Language fragments that are closely combined, have strong generation capabilities, and are used stably in a specific professional field can also be regarded as term components, such as "ultra-large scale" and "optical coupling" can also be regarded as term components in the field of information science and technology. 3.6
Term component databasetermcomponentdatabaseA database that stores term component information.
Domain specificitydomainspecific
Features that are unique to a specific professional field and closely related to the subject matter of the professional field. 1
GB/T19102—2003
Domain-specific component domainspecificcomponent is a terminology component with domain-specificity in a specific professional field, generally a word term in the field. For example, "semiconductor" in "semiconductor materials". 4 Information description of terminology component library
4.1 Information structure of terminology component library
The information description of terminology component library can be carried out from four aspects: a) basic information description of terminology component;
b) statistical information description related to the position of terminology component; c) grammatical information description of terminology component;
d) semantic information description of terminology component.
The construction of terminology component library for different application targets can select different description aspects according to needs. The above-mentioned relevant information of terminology component is obtained from the terminology database of specific professional fields. 4.2 Basic information description of terminology component
4.2.1 Main entry terminology component
Indicate the terminology component itself.
4.2.2 Abbreviations
Indicates whether the term component is an abbreviation.
4.2.3 Full name
Indicates the full name corresponding to the term component (when the main term component is an abbreviation). 4.2.4 Domain-specific annotation
Indicates whether the term component is a domain-specific component. 4.2.5 Source
Indicates the source language and corresponding original text of the term component. Example: software (English, software) 4.2.6 Other basic information
Other relevant information can be set according to research needs. 4.3 Description of statistical information related to the position of term components 4.3.1 Frequency description
Describes the frequency of term components appearing in different positions of the term. Including: a) The frequency of term components appearing independently as term entries; b) The frequency of term components appearing at the beginning of a term; c) The frequency of term components appearing in the middle of a term; d) The frequency of term components appearing at the end of a term. Example: Register (1, 63, 87, 786) means that the term component "register" appears once as a term entry in the term database; 63 times at the beginning of the term; 87 times in the middle; and 786 times at the end. 4.3.2 Frequency description
Describes the frequency of term components appearing in different positions of the term. Including: a) the frequency of term components appearing independently as term entries; b) the frequency of term components appearing at the beginning of the term; 2
c) the frequency of term components appearing in the middle of the term; d) the frequency of term components appearing at the end of the term. GB/T19102—2003
Example: Register (0.1%, 7%, 9%, 84%) means that the term component "register" appears 0.1% as a term entry in the term database: 7% at the beginning of the term; 9% in the middle; and 84% at the end. 4.3.3 Other statistical information
Other statistical information can be described according to research needs. 4.4 Description of grammatical information of term components
4.4.1 Part of speech
Indicate the part of speech of the term component.
You can choose the appropriate part of speech tag set according to the research needs. The determination of the part of speech of the term component is based on its use in the term database. The same term component can be marked with more than one part of speech, that is, it is allowed to have polysemous categories. 4.4.2 The part of speech sequence composed of the term component and other components when it constitutes a term Indicate the part of speech sequence composed of the term component and other components when it constitutes a term. The determination of the part of speech library column is based on the performance of the term component in the term database. There may be many different part of speech sequences when the term component constitutes a term. You can choose the appropriate marking method according to your needs, for example: a) Only mark the part of speech sequence with the highest frequency; Example: circuit (noun + circuit)
Note: "+" means linear combination. Same below. Mark all part of speech sequences and attach frequency information; b)
Example: circuit (noun + circuit 280; circuit + noun 105: verb + circuit 20) c) Mark all part of speech sequences and attach frequency information. Example: Circuit (noun + circuit 69%; circuit + noun 26%: verb + circuit 5%) 4.4.3 Other grammatical information
Other grammatical information can be set according to research needs. 4.5 Semantic information description of term components
Describe the semantic information of term components.
Describe from different perspectives according to research needs, for example: a) Set a semantic classification system to describe the position of term components in the semantic classification system, that is, give each term component a suitable semantic class tag. The semantic classification system should be field-specific; b)
Combined with the establishment of the term concept system, set some semantic relationships to describe the changes in the semantic relationship between terms caused by the term components forming terms. See Appendix A for details. 5 Construction of term component library
The construction of the term component library is combined with the construction of a term database in a specific professional field, serving the research work such as automatic discovery of new terms in this field and establishment of a term concept system. The construction of the term component library should clarify its relevant professional fields, application goals, and the latest update date.
The construction of term component library should comply with the relevant national regulations on information system construction, coordinate with term corpus and term database, and realize information exchange and resource sharing.
The basic process of term component library construction is shown in Figure 1. 3
GB/T19102—2003
Terminology database
Segmentation and annotation of terminology entries
Terminology components
5.1Terminology database
Analysis of basic information
Analysis of position information
Analysis of grammatical information
Analysis of semantic information
Description of basic information
Narration of position information
Description of grammatical information
Description of semantic information
Basic process of building a terminology component library
The generation of a terminology component librarywww.bzxz.net
is the basis for building a terminology component library and the main source of terminology component information. The terminology database should reach a certain scale and have domain specificity.
5.2 Segmentation and annotation of terminology entries
It is a necessary prerequisite for accurately obtaining terminology components. In principle, the segmentation of terminology entries should follow the word segmentation specifications of GB/T13715. Language fragments that are closely combined, have strong generation capabilities, and are used stably in specific professional fields should also be regarded as a segmentation unit. The part-of-speech tag set of terminology entries should be consistent with the part-of-speech tag set of component tags. In operation, mature word segmentation and part-of-speech tagging software can be used to perform computer automatic segmentation and tagging first, and then manual proofreading.
5.3 Extraction of terminology components
Based on the segmentation and annotation of terminology entries, extract terminology components. 5.4 Information analysis of terminology components
Based on the terminology database, the basic information, position information, grammatical information, and semantic information of the extracted terminology components are counted and analyzed item by item. The statistics of various types of information should be automatically realized by computer under the participation and guidance of experts. 5.5 Information description of term components
Based on the analysis of term component information, the basic information, position information, grammatical information and semantic information of term components are described item by item.
5.6 Generation of term component library
Based on the above work, a complete term component library is generated. The term component library should be a structured system that can easily access, retrieve, modify, delete, update and supplement data. For the construction of the database, please refer to the relevant provisions of GB/T13725. A.1 Terms and definitions
Left component leftcomponent
Appendix A
(Informative Appendix)
Description of structural semantic information of term components
GB/T19102—2003
If a term can be split into a term component and another term in the same profession, and the component is located on the left of the term, then the component is called the left component of the term.
For example, in "message packet exchange", "message" is the left component of the term; in "packet exchange", "packet" is the left component of the term. A,1.2
right componentrightcomponent
If a term can be split into another professional term and a term component, and the component is located on the right side of the term, then the component is called the right component of the term.
For example, in "decoder", "device" is the right component of the term; in "virtual space", "space" is the right component of the term. A.2 Description of structural semantic information of term components The structural semantic information of term components describes the semantic relationship between the term component as the left component or right component of the term and the modified component of the term. The description of structural semantic information is related to the classification standard set in the term concept system, and can be used to guide the positioning of new terms in the term concept system. The required description content can also be set according to the application requirements of automatic discovery of new terms and automatic definition of terms.
The same term component generally has different structural semantic properties when it appears as a left component or a right component in a specific term entry. Therefore, the description process is divided into the following two aspects: A.2.1 Description of the structural semantic information of the left component The structural semantic information of the left component refers to the structural semantic characteristics of the term component when it appears as a left component. The structural semantic information of the left component can be defined as an N-tuple. The value of N is equal to the number of relations contained in the selected concept system. Its value can be a probability value obtained based on statistics, or a 0-1 attribute value obtained under a set threshold. Example: The conceptual relationship in the field of information science and technology includes two types of classification standards: "methods and techniques" and "materials used". When only these two relationships are examined, the structural semantic information of the left component can be described as a two-tuple Q (h, p). Parallel (1, 0) means that when the left component "parallel" is attached to an original term X to form term Y, the concept referred to by term Y is a genus concept of the original term X, and the concept Y can be formally defined as: a type of X that uses "parallel" technology. But it will not be formed with the original term: a type of X that uses "parallel" materials.
A.2.2 Description of the structural semantic information of the right component The description of the structural semantic information of the right component refers to the structural semantic characteristics of the term component when it appears as the right component. The description of the structural semantic information of the right component can also be defined as an N-tuple. The value of N is equal to the number of relations contained in the corresponding concept system. Its value can be a probability value obtained based on statistics, or it can be a 0-1 attribute value obtained under a set threshold.
Example: In the field of information science and technology, the right component can often guide the classification relationship of "equipment, device" and "operation". Some commonly used words and suffixes such as "processing", "device" and "device", "machine", etc., when used as the right component of a term, can often guide the conceptual relationship of "equipment, device". In other words, if a term Y can be analyzed as "X + device", it can generally be derived that Y is a device (equipment) that produces (completes) X.
GB/T19102-2003
People's Republic of China
National Standard
Information Description Specification of Terminology Component Library
GB/T19102—2003
Published by China Standards Press
No. 16, Sanlihebei Street, Fuxingmenwai, Beijing
Postal Code: 100045
Tel: 6852394668517548
China Standards Press Printed by Huangdao Printing Factory Issued by Xinhua Bookstore Beijing Distribution Office| |tt||Sold by Xinhua Bookstores in various places
Format 880×12301/16
Printing sheet 3/4
Word count 15 words
First edition in October 2003
First printing in October 2003
Print runs 1-1500
Book number: 155066: 1-19897
Website: bzcbs.com
Copyright reserved
Infringements will be investigated
Report phone number: (010) 68533533
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.