This standard specifies the information description specification of terminology component library. This standard applies to the research, development, maintenance and related management of terminology component library, and can also be used as a reference in the field of information retrieval. GB/T 19102-2003 Information Description Specification of Terminology Component Library GB/T19102-2003 Standard Download Decompression Password: www.bzxz.net
This standard specifies the information description specification of terminology component library. This standard applies to the research, development, maintenance and related management of terminology component library, and can also be used as a reference in the field of information retrieval.
Some standard content:
ICS01.020 National Standard of the People's Republic of China GB/T19102—2003 Specification for description of term component databaseIssued on 2003-05-14 General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China Implementation on 2003-12-01 1 Scope 2 Normative references 3 Terms and definitions 4 Information description of term component database 5 Construction of term component database Appendix A (informative) Structural semantic information description of term component GB/T19102—2003 This standard is one of the series of national standards for term database. The series of standards that have been issued are: GB/T13726—1992 GB/T16785—1997 GB/T16786—1997 GB/T17532—1998 GB/T18155—2000 Magnetic tape format for recording and exchanging terminology and dictionary entries Terminology work Harmonization of concepts and terms Computer applications Data categories Terminology work computer applications GB/T19102—2 003 Computer application of terminology Machine-readable terminology interchange format (MARTIF) negotiation and exchange GB/T13725—2001 General principles and methods for establishing terminology databases GB/T15387.1—2001 Guide for the preparation of terminology database development documents GB/T15387.2—2001Guide for the development of terminology databases GB/T15625—2001Guide for the technical evaluation of terminology databases GB/T19101—2003General principles and methods for establishing terminology corpora Appendix A of this standard is an informative appendix. This standard is proposed by the National Technical Committee for Terminology Standardization. This standard is under the jurisdiction of the China Standards Research Center. This standard was drafted by the China Standards Research Center, Institute of Computational Linguistics of Peking University and other units. The main drafters of this standard are: Ye Sheng, Wu Yunfang, Song Min, Sui Zhifang, Cheng Yonghong, Hu Junfeng, Xiao Yujing. GB/T19102—2003 Term component library is a knowledge base containing rich information. This rich information is helpful for the automatic discovery of new terms, the automatic definition of terms, the establishment of term concept system and other related research work. 1 Scope Information description specification of term component library This standard specifies the information description specification of term component library. GB/T19102—2003 This standard is applicable to the research, development, maintenance and related management of term component library, and can also be used as a reference in the field of information retrieval. 2 Normative references The clauses in the following documents become the clauses of this standard through reference in this standard. For all dated referenced documents, all subsequent amendments (excluding errata) or revisions are not applicable to this standard. However, the parties who reach an agreement based on this standard are encouraged to study whether the latest versions of these documents can be used. For all undated referenced documents, the latest versions are applicable to this standard. GB/T13715 Standard for the segmentation of modern Chinese words for information processing GB/T13725 General principles and methods for establishing terminology database GB/T15237.1-2000 Vocabulary for terminology work Part 1: Theory and application (egvISO1087-1:2000) GB/T17532-1998 Vocabulary for computer applications of terminology work (eqvISO/DIS1087-2-2:1996) 3 Terms and definitions The terms and definitions established in GB/T15237.1-2000 and GB/T17532-1998 apply to this standard. For ease of use, this standard repeats some of the terms and definitions. 3.1 Term term The word designation of general concepts in a specific professional field. [GB/T15237.1—2000, 3.4.3]3.2 TerminologicaldatabaseA database containing terminological data. [GB/T17532—1998, 7.6]3.3 Singlewordterm A term consisting of a single word. Multi-wordterm A term consisting of multiple words. Term componenttermcomponent The words that make up a multi-word term. Language fragments that are closely combined, have strong generation capabilities, and are used stably in a specific professional field can also be regarded as term components, such as "ultra-large scale" and "optical coupling" can also be regarded as term components in the field of information science and technology. 3.6 Term component databasetermcomponentdatabaseA database that stores term component information. Domain specificitydomainspecific Features that are unique to a specific professional field and closely related to the subject matter of the professional field. 1 GB/T19102—2003 Domain-specific component domainspecificcomponent is a terminology component with domain-specificity in a specific professional field, generally a word term in the field. For example, "semiconductor" in "semiconductor materials". 4 Information description of terminology component library 4.1 Information structure of terminology component library The information description of terminology component library can be carried out from four aspects: a) basic information description of terminology component; b) statistical information description related to the position of terminology component; c) grammatical information description of terminology component; d) semantic information description of terminology component. The construction of terminology component library for different application targets can select different description aspects according to needs. The above-mentioned relevant information of terminology component is obtained from the terminology database of specific professional fields. 4.2 Basic information description of terminology component 4.2.1 Main entry terminology component Indicate the terminology component itself. 4.2.2 Abbreviations Indicates whether the term component is an abbreviation. 4.2.3 Full name Indicates the full name corresponding to the term component (when the main term component is an abbreviation). 4.2.4 Domain-specific annotation Indicates whether the term component is a domain-specific component. 4.2.5 Source Indicates the source language and corresponding original text of the term component. Example: software (English, software) 4.2.6 Other basic information Other relevant information can be set according to research needs. 4.3 Description of statistical information related to the position of term components 4.3.1 Frequency description Describes the frequency of term components appearing in different positions of the term. Including: a) The frequency of term components appearing independently as term entries; b) The frequency of term components appearing at the beginning of a term; c) The frequency of term components appearing in the middle of a term; d) The frequency of term components appearing at the end of a term. Example: Register (1, 63, 87, 786) means that the term component "register" appears once as a term entry in the term database; 63 times at the beginning of the term; 87 times in the middle; and 786 times at the end. 4.3.2 Frequency description Describes the frequency of term components appearing in different positions of the term. Including: a) the frequency of term components appearing independently as term entries; b) the frequency of term components appearing at the beginning of the term; 2 c) the frequency of term components appearing in the middle of the term; d) the frequency of term components appearing at the end of the term. GB/T19102—2003 Example: Register (0.1%, 7%, 9%, 84%) means that the term component "register" appears 0.1% as a term entry in the term database: 7% at the beginning of the term; 9% in the middle; and 84% at the end. 4.3.3 Other statistical information Other statistical information can be described according to research needs. 4.4 Description of grammatical information of term components 4.4.1 Part of speech Indicate the part of speech of the term component. You can choose the appropriate part of speech tag set according to the research needs. The determination of the part of speech of the term component is based on its use in the term database. The same term component can be marked with more than one part of speech, that is, it is allowed to have polysemous categories. 4.4.2 The part of speech sequence composed of the term component and other components when it constitutes a term Indicate the part of speech sequence composed of the term component and other components when it constitutes a term. The determination of the part of speech library column is based on the performance of the term component in the term database. There may be many different part of speech sequences when the term component constitutes a term. You can choose the appropriate marking method according to your needs, for example: a) Only mark the part of speech sequence with the highest frequency; Example: circuit (noun + circuit) Note: "+" means linear combination. Same below. Mark all part of speech sequences and attach frequency information; b) Example: circuit (noun + circuit 280; circuit + noun 105: verb + circuit 20) c) Mark all part of speech sequences and attach frequency information. Example: Circuit (noun + circuit 69%; circuit + noun 26%: verb + circuit 5%) 4.4.3 Other grammatical information Other grammatical information can be set according to research needs. 4.5 Semantic information description of term components Describe the semantic information of term components. Describe from different perspectives according to research needs, for example: a) Set a semantic classification system to describe the position of term components in the semantic classification system, that is, give each term component a suitable semantic class tag. The semantic classification system should be field-specific; b) Combined with the establishment of the term concept system, set some semantic relationships to describe the changes in the semantic relationship between terms caused by the term components forming terms. See Appendix A for details. 5 Construction of term component library The construction of the term component library is combined with the construction of a term database in a specific professional field, serving the research work such as automatic discovery of new terms in this field and establishment of a term concept system. The construction of the term component library should clarify its relevant professional fields, application goals, and the latest update date. The construction of term component library should comply with the relevant national regulations on information system construction, coordinate with term corpus and term database, and realize information exchange and resource sharing. The basic process of term component library construction is shown in Figure 1. 3 GB/T19102—2003 Terminology database Segmentation and annotation of terminology entries Terminology components 5.1Terminology database Analysis of basic information Analysis of position information Analysis of grammatical information Analysis of semantic information Description of basic information Narration of position information Description of grammatical information Description of semantic information Basic process of building a terminology component library The generation of a terminology component librarywww.bzxz.net is the basis for building a terminology component library and the main source of terminology component information. The terminology database should reach a certain scale and have domain specificity. 5.2 Segmentation and annotation of terminology entries It is a necessary prerequisite for accurately obtaining terminology components. In principle, the segmentation of terminology entries should follow the word segmentation specifications of GB/T13715. Language fragments that are closely combined, have strong generation capabilities, and are used stably in specific professional fields should also be regarded as a segmentation unit. The part-of-speech tag set of terminology entries should be consistent with the part-of-speech tag set of component tags. In operation, mature word segmentation and part-of-speech tagging software can be used to perform computer automatic segmentation and tagging first, and then manual proofreading. 5.3 Extraction of terminology components Based on the segmentation and annotation of terminology entries, extract terminology components. 5.4 Information analysis of terminology components Based on the terminology database, the basic information, position information, grammatical information, and semantic information of the extracted terminology components are counted and analyzed item by item. The statistics of various types of information should be automatically realized by computer under the participation and guidance of experts. 5.5 Information description of term components Based on the analysis of term component information, the basic information, position information, grammatical information and semantic information of term components are described item by item. 5.6 Generation of term component library Based on the above work, a complete term component library is generated. The term component library should be a structured system that can easily access, retrieve, modify, delete, update and supplement data. For the construction of the database, please refer to the relevant provisions of GB/T13725. A.1 Terms and definitions Left component leftcomponent Appendix A (Informative Appendix) Description of structural semantic information of term components GB/T19102—2003 If a term can be split into a term component and another term in the same profession, and the component is located on the left of the term, then the component is called the left component of the term. For example, in "message packet exchange", "message" is the left component of the term; in "packet exchange", "packet" is the left component of the term. A,1.2 right componentrightcomponent If a term can be split into another professional term and a term component, and the component is located on the right side of the term, then the component is called the right component of the term. For example, in "decoder", "device" is the right component of the term; in "virtual space", "space" is the right component of the term. A.2 Description of structural semantic information of term components The structural semantic information of term components describes the semantic relationship between the term component as the left component or right component of the term and the modified component of the term. The description of structural semantic information is related to the classification standard set in the term concept system, and can be used to guide the positioning of new terms in the term concept system. The required description content can also be set according to the application requirements of automatic discovery of new terms and automatic definition of terms. The same term component generally has different structural semantic properties when it appears as a left component or a right component in a specific term entry. Therefore, the description process is divided into the following two aspects: A.2.1 Description of the structural semantic information of the left component The structural semantic information of the left component refers to the structural semantic characteristics of the term component when it appears as a left component. The structural semantic information of the left component can be defined as an N-tuple. The value of N is equal to the number of relations contained in the selected concept system. Its value can be a probability value obtained based on statistics, or a 0-1 attribute value obtained under a set threshold. Example: The conceptual relationship in the field of information science and technology includes two types of classification standards: "methods and techniques" and "materials used". When only these two relationships are examined, the structural semantic information of the left component can be described as a two-tuple Q (h, p). Parallel (1, 0) means that when the left component "parallel" is attached to an original term X to form term Y, the concept referred to by term Y is a genus concept of the original term X, and the concept Y can be formally defined as: a type of X that uses "parallel" technology. But it will not be formed with the original term: a type of X that uses "parallel" materials. A.2.2 Description of the structural semantic information of the right component The description of the structural semantic information of the right component refers to the structural semantic characteristics of the term component when it appears as the right component. The description of the structural semantic information of the right component can also be defined as an N-tuple. The value of N is equal to the number of relations contained in the corresponding concept system. Its value can be a probability value obtained based on statistics, or it can be a 0-1 attribute value obtained under a set threshold. Example: In the field of information science and technology, the right component can often guide the classification relationship of "equipment, device" and "operation". Some commonly used words and suffixes such as "processing", "device" and "device", "machine", etc., when used as the right component of a term, can often guide the conceptual relationship of "equipment, device". In other words, if a term Y can be analyzed as "X + device", it can generally be derived that Y is a device (equipment) that produces (completes) X. GB/T19102-2003 People's Republic of China National Standard Information Description Specification of Terminology Component Library GB/T19102—2003 Published by China Standards Press No. 16, Sanlihebei Street, Fuxingmenwai, Beijing Postal Code: 100045 Tel: 6852394668517548 China Standards Press Printed by Huangdao Printing Factory Issued by Xinhua Bookstore Beijing Distribution Office| |tt||Sold by Xinhua Bookstores in various places Format 880×12301/16 Printing sheet 3/4 Word count 15 words First edition in October 2003 First printing in October 2003 Print runs 1-1500 Book number: 155066: 1-19897 Website: bzcbs.com Copyright reserved Infringements will be investigated Report phone number: (010) 68533533 Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.