General principles and methods for establishing terminology database
Some standard content:
ICS.01.020
National Standard of the People's Republic of China
GB/T 137252001
Replaces GB/T137251992
General principles and methods for establishing terminology database2001-11-14Published
2002-06-01Implementation
People's Republic of China
General Administration of Quality Supervision, Inspection and Quarantine
Normative references
Terms and definitions
Construction of terminology database
Types and information flow of terminology database
6 Basic requirements of terminology database system
Basic process of establishing terminology database
8 Generation and use of terminology database
9 Management and maintenance of terminology database system
10 Information resource sharing among terminology databases
Appendix A (Normative Appendix) Entity-relationship diagram of terminology database
GB/T 13725--2001
This standard replaces GB/T 13725-1992 "General principles and methods for establishing terminology database". GB/T13725—2001
This standard has made the following modifications to G13/T137251992 "General Principles and Methods for Establishing Terminology Databases": Added the following normative references:
a) GB/T15237.1—2000 Vocabulary for Terminology Working Part 1: Theory and Application
GL/T16786—1997 Data Categories for Computer Application of Terminologyb)
GB/I17532—1998 Vocabulary for Computer Application of Terminology
Machine Readable Terminology Interchange Format (MARTIF) Negotiation GB/T 18155—2000
Terminology work computer application
Added a chapter on terms and definitions;
Changed the obsolete devices in input and output devices in Chapter 5 to the currently used devices; added normalization processing in Chapter 5;
Modified the content of Article 6,1.8;
Added multimedia information such as images and sounds in G.2. ]1; added two contents of high reliability and networking function in 6.2.2; modified the grammatical information in 6.3.2.1 to word classes and other grammatical information\---Modified the table;
-Added two contents of reviewing user reports and reviewing new search reports in 7.5.2.2: added the content of "network and multimedia technology; modified the inappropriate use of text, and revised and adjusted the layout format of GB/T13725—1992 standard. It meets the requirements of GB/T1.12000 "Guidelines for Standardization Work Part 1: Structure and Writing Rules of Standards". This standard is one of the series of national standards for terminology databases. The series of standards that have been issued are: - GB/T13726-1992 Magnetic tape format for the exchange of records of terminology and dictionary entries GB/T15387.1—2001 Guidelines for the preparation of terminology database development documents - GB/T15387.2:2001 Guidelines for the development of terminology databases - GB/T15625—2001 Guidelines for the technical evaluation of terminology databases - GB/T16785-1997 Coordination of concepts and terms in terminology work GBT16786—1997 Data categories for computer applications of terminology GB/17532—1998 Vocabulary for computer applications of terminology work GB/T18155-2000 Negotiation and exchange of machine-readable terminology interchange format (MARTIF) for computer applications of terminology work Appendix A of this standard is a normative appendix. This standard was proposed by the National Technical Committee for Terminology Standardization. This standard is under the jurisdiction of the China Standards Research Center. The drafting units of this standard are: China Standards Research Center, and the Scientific Research and Planning Institute of the State Post Bureau. This standard is interpreted by the National Technical Committee for Terminology Standardization. The main drafters of this standard are Ben Mingfei, Ye Sheng, Zhang Zhiyun, Xiao Yujing, Lu Lili and Xu Junrong. The previous versions of the standards replaced by this standard are: GB/T 13725--1992
1 Scope
General principles and methods for establishing terminology databases This standard specifies the general principles and methods for establishing terminology databases (referred to as "terminology databases"). GB/T 13725-2001
This standard applies to the research, development, maintenance and related management of terminology databases. It can also be used as a reference in other work involving terminology data processing.
2 Normative references
The clauses in the following documents become the clauses of this standard through the reference of this standard. For all dated referenced documents, all subsequent amendments (excluding error-prone content) or revisions are not applicable to this standard. However, parties to an agreement based on this standard are encouraged to study whether the latest versions of these documents can be used. For any un-dated referenced documents, the latest version shall apply to this standard. GB4943 Safety of information technology equipment (including electrical business equipment) (GB4943-1995, IEC950: 1991, IDT) GB/T10112 Principles and methods of terminology work (GB/T10112: 1999, 1SO704: 1997, NEQ) GB/T15237.1-2000 Vocabulary of terminology work Part 1: Theory and application (ISO1087-1: 2000, FQV) GB/T16786-1997 Data categories for computer application of terminology work (ISO/DIS12620: 1996, EQV) GB/117532-1998 Vocabulary of computer application of terminology work (ISO/D1S1087-2-2: 1996EQV) GB/T18155 Computer application of terminology work Machine readable terminology interchange format (MARTIF) Negotiation exchange (GB/T 18155---2000,1ISO 12200:1999,EQV)GB/T20001.12001 Standard Preparation Rules Part 1: Terminology (IS010241:1992Internationalterminologystandards-Preparationandlayout,NEQ)3Terms and definitions
The terms and definitions established in GB/T15237.1-2000GB/T16786-1997 and GB/T17532-1998 shall apply to this standard. For the convenience of use, this standard repeats some of the terms and definitions. 3.1
Terminology
The word designation of a general concept in a specific professional field. [GB/T 15237.1-2000.3.4.3]3.2
TerminologicaldatabaseA database that stores terminological data.
Note: modified from GB/T 17532-1998,7.6.3.3
data elementdata element
a data unit with distinguishing characteristics in a certain context. [GB/T 17532-1998,7.11]
data fielddatafield
GB/T 13725-2001
a variable-length or fixed-length part stored in a record for a specific data element. [GB/T 17532-1998,7.12]
data categorydata category
data element typedataelemcnttype
a description of the type of a given data field. [GB/T 17532--1998,7-14]
terminological entryterminological data about a concept contained in a data set. _GB/T 17532—1998,3.22]
4 Construction of terminology database
4.1 The terminology database can be divided into three levels:
a) National standardization terminology database;
b) Professional field terminology database;
c) Grassroots terminology database.
4.1.1 The national standardization terminology database has the function of managing my country's standardized terminology. 4.1.2 Professional field terminology databases should clarify professional division of labor, define scope, coordinate well, avoid duplication, omission and waste, 4.1.3 Relevant units can establish grassroots terminology databases according to work needs. 4.2 The construction of terminology databases should follow the relevant national regulations on information system construction, coordinate well with other terminology databases, and realize information exchange and resource sharing.
5 Types of terminology databases and communication processes
5.1 Types of terminology databases
5.1.1 Concept-oriented terminology
It highlights the rationality and hierarchy of the concept system, and includes strict definitions of concepts, and the terminology with authoritative definitions. 5.1.2 Translation-oriented terminology
It contains the corresponding words of terms in two or more languages as needed, and contains more linguistic information (such as part of speech, context, use case, etc.).
5.1.3 Target-oriented terminology
It is a terminology established as a component of expert system, knowledge base system, machine translation system, etc. to meet various specific target requirements.
5.1.4 Other special-purpose terminology
It is a terminology designed according to actual needs, such as word library, etc. 5. 2 Termbase system information flow
The terminology system information flow is shown in Figure 1. 5.2.1 Terminology Information Sources
Terminology information can come from national standards, industry standards and other standard documents, or from authoritative dictionaries, encyclopedias and other reference books and documents; or from the definitions and references of new concepts provided by experts, scholars and users, or through networking with other terminology databases, exchanging terminology data and recording media, etc. 5.2.2 Normalization Processing
Terminology information obtained from various channels is processed according to the established standard format or rules. 2
5.2.3 Input
Terminology Information Sources
Normalization
Terminology
Other terminology databases
Figure 1 Flowchart of Terminology System Information
GB/T 13725—-2001
After the original terminology information is normalized, it is input into the terminology database system through input devices such as keyboards, text recognition devices, voice recognition devices, etc.
5.2.4Termbase System
Termbase system processes input information (data) and stores it in memory, which can be easily accessed, retrieved, modified, deleted, updated and supplemented.
Users use the information in the termbase through output devices such as screen displays, printers, floppy disk drives, CD-ROM drives, voice equipment, microfilm equipment, phototypesetting equipment, etc. 5. 2. 6Users
Termbase users include: makers and editors of standards, translators, lexicographers, editors, educators, phonetics workers, scientific and technological workers, students and other users. 5. 2.7Information Sharing
Information resources can be shared with other termbase systems through networking, exchanging data record carriers, etc. 6 Basic requirements for terminology system
6.1 Design principles and quality requirements
6.1.1 Purpose
Investigate and analyze the needs of users in various aspects, and develop terminology based on the requirements of most users on terminology functions, performance, data, etc., taking into full consideration the social and economic benefits. The development of terminology should meet the needs of actual use. 6.1.2 Scientificity
The theories and technologies of various disciplines involved in the development of terminology should be fully studied, and the terminology should be developed on a scientific basis using the method of system engineering.
6.1.3 Usability
The system should be simple to learn and easy to use.
6.1.4 Economicity
A design scheme that is technologically advanced and economically reasonable should be selected. 6.1.5 Reliability
The selection of hardware configuration and software should ensure the high reliability of the terminology. 6.1.6 Maintainability
a) Preventive maintenance to keep the system in good working condition and prevent accidents before they happen;3
GB/T 13725—2001
b) Corrective maintenance to overcome faults;c) Adaptive maintenance to enable the software product to continue to be used in a changed environment;d) Perfection maintenance to improve performance, etc. 6.1.7 Security
a) The hardware design and installation of the system should be carried out in accordance with the requirements of GB4943;b) A hierarchical management code to ensure the security of the terminology system equipment should be formulated;c) Provisions should be made for the access rights of various users to data within various scopes under different conditions;d) It should be able to prevent the intrusion of computer viruses that may appear in the data exchange process, and have effective measures to check and remove viruses;e) Protection mechanisms and confidentiality measures should be provided for the confidentiality of special data as needed. 6.1.8 Scalability
It should be easy to expand or reduce system functions according to changes in needs. 6.2 Requirements for computer systems
6.2.1 Basic requirements
6.2.1.1 The terminology database computer system should have strong text processing capabilities and support Chinese character information processing. It can support multiple languages, texts, symbols, formulas, graphics, images, sounds and other multimedia information as needed. 6.2.1.2 The large-scale terminology database system should be able to share information resources with other large-scale terminology databases in the same country and major terminology databases in the world through the Internet.
Basic requirements for hardware
a) Select the appropriate computer according to the system design requirements; b) It can easily realize the matching of the host and peripherals; c) There is enough memory and external storage space; d) The data processing speed and system input and output capabilities should meet the requirements of business types and the number of users; e) The system should have good compatibility and convenient maintenance; f) The system is safe and highly reliable; g) It has networking function; h) It has strong expandability and can easily realize on-site upgrades. 6.2.3 Basic requirements for software
a) It should be complete and complete. It should form a system. It includes system software, Chinese character support software, database management software, communication control software, network management system, security and confidentiality and other application software: b) It should have good flexibility and portability, and have strong adaptability to the operating environment; c) It should have strong expandability and be able to upgrade according to needs! d) It should have good human-computer interaction capabilities;
e) The database management system has strong functions and can easily access data, check, supplement, modify and delete, etc.: () It has good security and preservation
g) It should use the character set specified by national standards and relevant international standards. The character set should be expandable as much as possible, so that special characters can be directly accessed, and the compatibility of multiple languages should be considered as needed. 6.2.4 Requirements for communication system
According to needs, it can support the realization of advanced computer network communication, support open system interconnection, and realize database access through the network. 6.3 Requirements for terminology data
6.3.1 Basic requirements
6.3. 1.1 Correctness
Terminology data in the database should be verified to be correct and valid. 6.3.1.2 Consistency
Inconsistencies due to different sources of terminology data should be eliminated. 6.3.1.3 Integrity
The integrity of terminology data elements, data categories and data structures should be ensured. 6.3.1.4 Independence
Data should be independent of the computer system, storage method and access method. 6.3.1.5 Timeliness
Terminology data should be updated in a timely manner.
6. 3. 2 Selection of data categories
Data categories should first be selected from the following five categories: 6.3.2.1 Data describing terms
mainly include:
[Chinese] main entry terms;
-abbreviation (acronym);
-full name (when the main entry term is an abbreviation); a synonym;
corresponding words in other languages;
-symbol;
word class,
other grammatical information;
phonetic notation;
term annotation:
see.
6.3.2.2 Data describing concepts
mainly include:
definition of concepts;
-description of concepts;
context:
examples, formulas, tables, graphics, etc.
6.3.2.3 Data describing the concept system
mainly include:
classification (taxonomy);
descriptors (descriptive tables)
a hypernym;
a broad word (if the hypernym is unclear): a hyponym;
a acquired meaning (if the hyponym is unclear);
a servative.
6.3.2.4 Data used for management
Mainly include:
Record identification;
Language code;
Document source code;
Record creation date;
GB/T 137252001
GB/T 13725. - 2001
Data revision date;
Responsible person code;
Regional limitation of use,
Standardized or non-standardized;
Current usage or obsolete usage;
Preferred or adopted or rejected or replaced;
Industry terms (industry scope);
Terms used within an organization;
Portability code,
6.3.2.5 Data representing documents
Mainly include:
Type of document (such as standard, dictionary, encyclopedia, manual, etc.); Document information·
, Author (editor):
·Title:
Date of publication;
·Publishing organization:
·Volume and issue number of the publication:
. Recommendation number;
*Page number of the term information in the document.
6.3.2.6 Other data items
It should be taken into account that different types of terminology bases require different data categories, and different user groups (such as students, translators, experts in subject areas) need different types of information. A versatile terminology base should be flexible and allow the addition of new data categories. 6.3.3 Data structure
When conducting data analysis, a data structure model should be established. 6.3.3.1 Relationships between terminology data elements
Terminology data can be concept-oriented, repeatable or non-repeatable. They can be composed of other data elements. The data elements of entries in the terminology base can be information related to concepts (such as definitions, descriptions, etc.) or information related to terms (such as grammatical information, context, etc.), as shown in Figure 2. Discipline (major) Frequency City
Source data
Intersemen
Source data
Source data
Terms (corresponding words in other languages)
Grammar data
Concept-related and term-related letters Figure 2
6.3.3.2 Multilingual correspondence of terms
The correspondence between terms of the same concept in different languages can be of the following three types: 6
GB/T13725-2001
8) Complete correspondence
Concept The concept system is established independently in two languages. The definition of the concept expressed by the term and the position of the concept in the concept system are exactly the same in both languages.
b) Incomplete correspondence
When the concept cannot be completely matched in the two languages, but the difference can be translated by several terms of the two concepts, these terms should be listed under the entry and an annotation should be added to point out the difference and similarity. c) Completely no correspondence
When a concept has no corresponding term in the other language, the term (or blank) can be translated, and the position should be specially marked in the term base. 6.3.3.3 Description of data structure
The data structure can be described using an entity-relationship diagram (E-R diagram). See Appendix A. The entity-relationship diagram of the term base should separate each data element independently and describe the logical connection between different data elements in the term base. 6. 3.3.4 Modification of data structure
Add a field;
-Add a hierarchy;
-Change the order of fields,
-Subdivide and (or) merge fields:
Change field names:
Change field data types;
Other modifications,
6.4 Requirements for term information sources
6.4.1 Scientificity
The concepts, definitions and terms of the database should comply with the provisions of GB/T10112, 6.4.2 Authority
Terms should be selected from authoritative literature and approved by relevant experts. 6.4.3 Systematicity
The selection and collection of terms should be carried out systematically, and the integrity of the concept system should be guaranteed. 6.4.4 Consistency
When reviewing the terminology of the terminology database, it should be avoided that a concept in a professional field is expressed by multiple terms, or a term refers to multiple concepts, especially to avoid inconsistent definitions of the same concept. 6.5 Service methods of the terminology database
The service methods should be convenient for users to use, and can be selected according to needs when building the database. For example: query;
screen display;
printing;
disk, CD-ROM recording;
-typesetting;
abbreviation,
online retrieval;
-download through the Internet;
-other available data exchange methods.
7 Basic process of establishing a terminology database
7.1 Basic process and required documents for terminology database developmentGR/T13725—2001
The basic process and required documents for terminology database development are shown in Table 1: Table 1 Basic process and required documents for terminology database development Initial period
Planning and demonstration
Project application
User requirements report
Feasibility study report
System development workbook
Requirement analysis
Development period
System design
System implementation
Data requirements specification Conceptual model design Technical report review and acceptance
Domain analysis report
Logical model design Programming specification "User report
Functional requirements specification
Physical model design Data entry rules
Entry work sheet
Hardware requirements specification
Software requirements specification
Work plan and task allocation document
Design review Check report
Testing plan
System design detailed plan and work flow chart Review report
Acceptance report
New search report
Operation period
System operation
Operation manual
User manual
Maintenance manual
Data dictionary
(Operation) management rules
Complete system concurrent file archive and project abnormal work summary report Note: The work process and documents described in Table 1 are essential for establishing a high A large-scale terminology database of quality is necessary, which can be selected according to the scale and specific needs of system development.
7.2 Planning and demonstration stage
7.2.1 Preliminary preparation
Based on extensive and focused investigation and analysis of user needs, project applications, user demand reports, and feasibility study reports are submitted from the two aspects of needs and practical possibilities. After approval by relevant departments, a task book (contract, agreement) is formed, and a project working group (institution) is formed to officially start the system development work. 7.2. 2. Develop a work plan and task assignment After the project is determined, a detailed work plan should be developed first, including: the monthly records of each stage of the project; the work arrangements and completion dates of each stage; the division of labor, etc. 7.3 Requirements analysis stage The requirements analysis should be detailed and specific, and form necessary work documents 7.3.1 Investigate and study the terminology data requirements in detail, collect, select and log the original data. Determine the scope of the included terms (which can be arranged by stage), clarify the requirements for data types and data structures, and data processing requirements , input and output requirements, etc. Complete the data requirements specification and functional requirements specification.
7.3.2 Based on the data demand analysis, propose requirements for system functions and performance, including hardware requirements, software requirements, quality requirements, etc., clarify the goals of the system to be developed, analyze the existing conditions, propose a list of software and hardware to be purchased or developed, and propose a proposal for the transformation and expansion of the original system. Complete the software and hardware requirements specification, 7.4 System Design Phase
7.4.1 Conceptual Model Design
Establish the entity relationship diagram of the term library and prepare the conceptual model design description. 7.4.2 Logical Model Design
Determine the logical model of the database based on the entity-relationship diagram and the type of database management system used, and write a logical model design specification.
7.4.3 Physical model design
GB/T13725—2001
According to the functions provided by the database management system, map the logical model to the system implementation and write the physical model design specification. 7.5 System implementation stage
7. 5. 1 Program compilation
Complete the program compilation work according to the requirements of the terminology database system and write the program comprehensive specification. 7.5.2 Data processing
Compile the entry rules according to the determined data types and data structures, design the entry work sheet, select, analyze, organize and review the database terms, definitions, descriptions, examples, etc. According to the needs and possibilities, choose the corresponding foreign language. If necessary, first establish the concept correspondence in Chinese and other languages, and complete the data normalization preprocessing. 7.5.3 System debugging
Install and debug the hardware and software, test the system functions, performance, quality, etc., make improvement designs for existing problems, and establish a simulation library after the system is perfected and test run. 7.5.4 Data entry
After the terminology information of the anthropomorphic database is standardized, large-scale data entry is completed. 7.6 Review and acceptance
Review and acceptance should be carried out in a planned and organized manner. 7.6. 1 Review
Review should run through all stages of the entire process of terminology database construction and be carried out along with each work step. 7.6.1.1 Reviewers
The composition of reviewers should consider the following factors: a) Experts in terminology, standardization, computer, linguistics, etc. related to the terminology database; b) Users. bzxZ.net
7.6.1.2 Review methods
a) Document review:
b) Meeting review;
c) System testing.
7.6.1.3 Review content
Review the corresponding work items of 7.1-~7.4 in accordance with the requirements of Chapter 6 and other relevant national standards. 7. 6.1.4 Review report
The review results should be recorded in writing, including a) review time:
b) review method:
c) review content;
d) reviewers:
c) review conclusions and opinions, etc.
Problems found in the review should be raised or the relevant personnel should be instructed to raise handling opinions. If necessary, design improvements or corresponding measures should be taken before re-examination, or a special person should be assigned to conduct follow-up review. 7.6.2 Acceptance
After the terminology database is completed, the design and development unit should submit an acceptance application report, and the relevant units shall organize the acceptance. 7.6.2.1 Formal acceptance should be conducted openly. The main organizer and person in charge shall be appointed by the relevant departments, but they should not be direct participants in the development of the database.
7.6.2.2 An acceptance meeting should be held for formal acceptance, and at least the following procedures should be followed: 9
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.