Alphabetical ordering of multilingual terminological and lexicographical data represented in the Latin alphabet
Some standard content:
ICS 01.020
National Standard of the People's Republic of China
GB/T 19708—2005/ISO 12199:2000 Aiphabetical ordering of nultilingual terminological and lexicographical data represented in the Lalin alphabet(IS012199:2000.IDT)
Published on March 23, 2005
General Administration of Quality Supervision, Inspection and Quarantine of the People's Republic of China Administration of Standardization of the People's Republic of China
Implemented on October 1, 2005
GB/T 19708—2005/IS0 12199:2000 This standard is equivalent to ISO 12199:2000 "Alphabetical Arrangement of Multilingual Terminology and Dictionary Material Using the Latin Alphabet System" (English version).
Appendices A and G of this standard are normative appendices, and Appendix B, Appendix C, Appendix L), Appendix E and Appendix F are normative appendices. It is an informative appendix. This standard was proposed by the National Technical Committee for Standardization of Terminology. This standard is under the jurisdiction of the National Technical Committee for Standardization of Terminology. This standard was initiated by the China National Institute of Standardization. The main drafters of this standard are: Yu Xinli, Ye Sheng, Chen Yuzhong, Zhang Zhiyun, Cheng Yonghong, Xu Junrong, Xiao Xia, Song Min, and Lu Lili. GB/T19708—2005/IS012199:2000 Introduction
In the work of writing terminology sets and dictionary materials (including printing formats and database formats), in order to facilitate user retrieval, it is necessary to follow the internationally recognized Latin alphabetical sorting rules for manual terminology and dictionary materials. At the same time, this rule is also conducive to the exchange of terminology and dictionary materials, so it is very necessary to adopt the Latin alphabetical sorting rules specified in international standards. 1 Scope
GB/T 19708---2005/ISO 12199.2000 Rules for alphabetical sorting of multilingual terminology and dictionary material using the Latin alphabet
This standard specifies rules for the alphabetical sorting of multilingual terminology and dictionary material (terms, term components or words) using the Latin alphabet. This standard takes into account the character sets of various languages in which terminology and dictionary material are recorded using the Latin alphabet, and also takes into account the character sets used when transcribing material in other languages into the Latin alphabet in accordance with international normative conventions. The alphabet character rules given in this standard only apply to situations where multiple languages are used simultaneously and do not affect the alphabetical sorting of the languages themselves. The main text of this standard specifies the method for alphabetical sorting of character strings. Annex A of this standard gives rules for alphabetical sorting, which has been widely used in the past. Annex B of this standard introduces two additional rules that help in the sorting of dictionary terms. Annex C of this standard gives rules for sorting chemical names. The closing D of this standard lists various symbols of the Latin alphabet system, and the appendix E of this standard lists the languages using the Latin alphabet system. Appendix F of this standard gives some languages using the Latin alphabet system that deviate from the alphabet order specified in this standard. Appendix G of this standard provides normative explanations for the main text of this standard, and the rules are in accordance with [SC)/IFC14651. 2 Normative references
The clauses in the following documents become clauses of this standard through reference in this standard. For all the referenced documents with a date, all subsequent amendments (excluding errata) or revisions are not applicable to this standard. However, the parties to the agreement based on this standard are encouraged to study whether the latest versions of these documents can be used. For all the referenced documents without a date, the latest versions are applicable to this standard. GB/T 4880-1091 Code for language names (ISO 639: 1988) GB/T 4880.2-2000 Code for language names Part 2: Three-letter code (ISO 639-2: 1998) GB 13000.1 Information technology Universal octet coded character set (UCS) Part 1: Architecture and basic multilingual platform (GB 13000. 11993.idt ISO/IEC 10646. 11993) GB/T 15237.1-2000 Vocabulary for terminology Working vocabulary Part 1: General purpose (ISO 1087-12000) GB/T 17532-1998 Technical Notes Vocabulary for computer applications (ISO 150/DIS 1087-2-2:1996) ISO/IFC:14651 Information technology International string ordering conventions Methods for comparing and tailoring string ordering 3 Terms and definitions
GB/T 15237.1-2000 and GB/T 17532-1998 and the following technical definitions apply to this standard. 3.1
character character
an element in a set of elements used to organize, control or represent data. 3.2
letter letter
a graphic symbol used alone or in combination with other letters to represent a phonological unit in spoken language 3.3
digit digit
a character used to represent a numerical value or a number.
GB/T 19708--2005/S012199:2080 3.4
special character special character
A graphic symbol in the character set that is neither a letter nor a number nor a spacer. Example: The space character is a special character. 3.5
A conjoined character Iigatnre
A symbol formed by connecting two or more letters. Note: In some cases, a conjoined character is considered a single letter. 3.6
Palygraph
For some purposes, it can be considered as two or more consecutive letters of the same letter. Note: It contains two or three letters. The combination of multiple letters can be called digraphs and trigraphs respectively. 3.7
Diacriticalmark
A character that is placed above, below or through a letter or a group of letters, but is not a letter itself. 3.8
Ordering
The operation of arranging some strings into a clear order according to the comparison rules of strings. 4 Preparation
In the process of alphabetical sorting, the comparison of characters is carried out according to certain rules. This standard specifies the rules for alphabetical order, but does not involve the method of selecting related strings. It also proposes what kind of modifications may be required to the string escape for the specified purpose. Therefore, before using these sorting rules, the following preparations need to be carried out according to the specific situation. It may be necessary to select characters related to the subject first, for example, it may be necessary to extract terms related to the subject from the corpus; it may be necessary to make appropriate modifications to the string, for example, it may be necessary to change the capital letter at the beginning of the sentence to a lowercase letter, or to change the monotonous plural form to the singular form, etc.; it may be necessary to add leading spaces, reverse the leading spaces, reverse the leading spaces, etc. For example, in a sequence containing mathematics, a sequence of multiple letters is added. Multiple letters are combined to form a sequence of independent letters. When applying a sorting rule, data can be sorted according to several sorting methods, and several independent comparison methods can be used to determine the sorting order. However, this standard recommends using only one rule (character comparison method) for sorting. When sorting, only the characters appearing in the string and their order of arrangement are considered. In addition to the sorting rule, no other knowledge of the words in the string is used. For example, no reference data or rules on grammar, pronunciation, and semantics are used. 5 First-level sorting
5. 1 First-level sorting value
When comparing the strings to be sorted, the first thing to consider is the first-level sorting value of the string. When the first-level sorting values of two or more strings are equal, the next-level sorting value needs to be considered. For sorting in multiple languages, the following sorting rules should be used (see Appendix A, word-by-word sorting). 5.2 Order of the first order
Numbers and letters have the following sorting values:
a) Numbers. 0123456789
Note 1: If the order of numbers is sorted from left to right, the following order can be obtained; 1101001111011112191902213, Note 2: If leading zeros have been added, the following order can be obtained: 00010002000300100011C01200190021010001100111 0190.
Basic alphabet
GB/T19708—2005/ES012199:2000gGhHiljdkkILmMnNoppqQrRsstT
aAbBccdDeEfF
Note 1: In order to use this order in a multilingual environment, it is necessary to avoid conflicts with individual languages. Appendix F lists examples of alphabetical sorting rules for some languages that deviate from this order.
In the first level of sorting, uppercase and lowercase letters are considered equal (not equal in the third level of sorting, see Chapter 7). Latin letters with diacritical marks are considered equal to the corresponding basic Latin letters (see Chapter 6). Special Latin letters and basic Latin letters are considered equal according to Table 1 of 5.3.
Turkish also distinguishes between 1/I and /, while other languages only have the pair i/. In order to arrange multilingual data including Turkish text, i/ is extended as follows:
L0131/U0049
dotless Latin letter I (Turkish)
U0069/U0049
Latin letter I (non-Turkish)
1/10069/U0130 dotted Latin letter 1 (Turkish) It should also be noted that dotless Latin small letter 1 with acute accent is represented by (UOOED Latin small letter with acute accent) in regular print. However, when sorting, it is regarded as equal to i (U0069 Latin small letter) in the first level sort. Note 2: In this standard, UXXXX& indicates the position of the corresponding character in GB13000.1, where X is an arbitrary hexadecimal number. The names of Latin letters often begin with "Latin small letter \ and \Latin capital letter ..." When referring to both lowercase and uppercase letters, the term "Latin letters" shall be used. Sometimes even the word "Latin letters" may be omitted when this does not cause misunderstanding.) Letters of other alphabets
Letters of other alphabets follow their own order. The order among non-Latin alphabets shall be: Greek letters, Cyrillic letters, other letters:
Note: The order of letters of non-Latin alphabets is outside the scope of this standard. The order of letters of the Greek letters is as follows: EAM
All other characters, such as punctuation marks, are not considered, see Chapter 8. 5.3 Equivalence between special Latin letters and basic letters According to Table 1, special Latin letters and basic Latin letters are considered equal. Uppercase letters and lowercase letters are also considered equal. Table 1 Equivalence between special Latin letters and basic Latin letters
In GB Character names in 13000.1
Latin T AE
Latin letter A with hook
Latin letter C with hook
Latin letter D with hook
Latin ETH (Icelandic)
Latin letter G with hook
Latin letter H with bar
Latin letter K with regular hook
Small letter A/capital letters in
G3 13000. 1 Symptom placement in
130181
GB/T 19708-2005/IS012199:2000
In GB 13000, 1 Character name in
Latin lowercase letter KRA (Greenland)
Latin letter 1 with bar
Latin letter ENIG (Latin love language)
Latin letter with bar
Latin "conjoined character () E
Latin" small letter pointed S (German)
Latin mother with bar"
No corresponding uppercase letter
Third level sorting
6.1 Second level sorting Value
Table 1 (continued)
Lowercase/uppercase letters in
GB13000.1
UO!413
If the first-level sorting values of two character strings are equal, the first-level sorting values shall be used as specified in 6.2. The comparison order shall be from left to right. 6.2 Special Latin letters and letters with diacritical marks are regarded as special Latin letters equal to basic Latin letters in Table 1 and shall be compared as specified in Table 1. The diacritical marks should be arranged in the order given in Table 2.
Note: In order to use this order in a multilingual environment, it is necessary to avoid conflicts with the order of the alphabetical letters in some languages. Appendix F lists some examples of the order of the alphabetical letters in some languages that deviate from this order. Table 2 Order of diacritical marks
040
Acute
Obtuse
Breve
Short and sharp| |tt||Short note and pure note
: short note and. with hook
Short note and wavy mark
Short note and dot
Short note and comma
People
People and acute mark
People and blunt mark
People and equal mark
People and quiescent mark
People and dot
Distinguishing mark in GB 130rO.1 position
lower add person character
inverted person character
balance person character derivative and 2-shaped character
upper add circle
upper like circle and acute accent
double dot
double dot and lower add dot
double dot and long accent
double acute accent
upper add tree
wave character
upper add dot
lower add dot
add yi-shaped character
F: add and lower Add a late sign
Small tail
Long note
Long note and lower dot||Long note
Add a withdrawal sign before ()
Add a withdrawal sign after (\)
Angle sign and acute
Angle sign and blunt
Angle sign and upper hook
Angle sign and wavy
: Angle sign and dot
Add the encounter sign 7 above and below the basic learning sign
Third level sorting
7. 1 Third-level sorting values
Table 2 (continued)
Position of GB/T19708--2005/ISO 12199:2000 diacritics in GB13000.1
TJ0323
U0313 and U0326
TJ0304
If the first-level and second-level sorting values of two strings are equal, the third-level sorting values shall be used in accordance with 7.2. The comparison order shall be from left to right.
7.2 Sorting order starting with uppercase letters
Lowercase letters shall be placed before the corresponding uppercase letters (see 5.2 bh)). Note: The terms "lowercase letters" and "uppercase letters" refer to "b." and "A." respectively. GB/T19708--2005/IS012199:20008 Fourth-level sorting
8.1 Fourth-level sorting value
If the first-level, second-level and third-level sorting values of two characters are equal, the fourth-level sorting value shall be used according to the provisions of 8.2. The comparison order shall be from left to right.
8.2 Sorting by special characters
Special characters shall be sorted according to the order of the default template in ISO/1EC14651. For most special symbols: This is the order listed in (B13000.1
Note: In word-by-word sorting (see Appendix A) + space characters and other characters may have special functions, namely as keyword separators. A. 1 Background
(Normative Appendix)
Word-by-word sorting rules
GB/T19708--2005/IS012199;2000 This standard only specifies the rules for sorting the Ning character string by Ningmu. Sui word sorting is also a widely used book order system that can replace this system. Table A, 1 illustrates the difference between alphabetical sorting and word-by-word sorting. Table A1 The difference between alphabetical sorting and word-by-word sorting Alphabetical sorting
Adhesive
ad hoe
ed ininirum
Adipose
A.2 Multi-keyword sorting
ad hoc
ad iefinitum
adhesive
adipose
sort by word
The main part of this standard describes the sorting rules for a single keyword. In the sorting of multiple keywords, the first keyword should be sorted according to this rule, and then the next keyword should be sorted after the sorting is completed....keywords are sorted until all the keywords have been considered or a unique order has been established.
Note: A typical example of sorting multiple keywords is a list of delegates at a conference. The first keyword in the list may be the name of the delegate's country, and the second keyword may be the delegate's surname; the third keyword may be the delegate's name. In this example, if a country has only one delegate, the surname of the first keyword does not need to be considered. 4.3 Separators
In word-by-word sorting, the space character is generally used as a keyword separator (other characters can also be specified as separators). Separators only serve to separate keywords and do not participate in sorting. When a string is divided into a series of keywords, the sorting rules of this standard are selected to apply only to one keyword. Note 1: In addition to the space character, other punctuation characters can also be defined as keyword separators. In this way, only some space characters can be defined as keyword separators, while other space characters are still used as special characters in keywords. How to choose depends on the type of word string. Juice 2: If both the space character and the hyphen character are used as keyword separators, the store phrase \Word-by-woruurderinksmultiplc-kcyardering\ can be divided into the following keywords: Word by uxrdurdering,as multipie ky>.ardering>, each keyword is placed between \\ and \". In order to improve readability, spaces are added. 4.4 Simple word-by-word sorting
If the text to be sorted using word-by-word sorting does not contain many special letters and diacritical marks, the provisions of this standard can be extended as follows:
In the first level sorting (see 5.2), add the space character as item a), so that the original items a), b) and ) in 6.2 become items b), c) and l). In the fourth level sorting, the space character is not considered a special character (see Chapter 8). Note: Depending on the language and the type of string to be sorted, other special characters (such as the digit character) may also be treated as concatenated characters. GB/T19708—2005/ISO12199:2000B.1 Background
Appendix B
(Informative Appendix)
Special rules for sorting dictionaries and terms
In the case of dictionaries and terms, in addition to the rules described in this standard, additional sorting rules are sometimes required. The features described in this appendix are not easy to describe in the form specified in ISO/IEC14651B,2 Position relative to the baseline
When sorting, it is sometimes necessary to distinguish the relative position of the character to the baseline, such as 1m2,m,mz. It is recommended to sort by case when sorting at the third level (see Chapter 7). The relative position of the characters to the baseline can be determined based on Table B.1. Table B.1 Position of characters relative to the baseline
Characters close to the baseline
B.3 Arrange by font
Characters above the baseline, superscript characters
Characters below the baseline, subscript characters
If the first to fourth levels of sorting cannot produce a unique order, then consider using the printing type as the fifth level of sorting. Font sorting can be performed according to Table B.2. Table B.2 Font order
Ningbo name
Roman
Black italic
ahednfghij
ahcdefghij
abedefghij
C.1 Background
Appendix C
(Informative)
Ordering rules for chemical names
GB/19708—2005/IS0 2199:2000 There is currently no generally accepted ordering rule for chemical names. If necessary, this standard may be used in conjunction with the extended rules for word-by-word ordering described in Appendix A to order chemical names. However, some indexes and databases, particularly those of the Chemical Abstracts Service (CAS), use a specially designed multi-key field ordering system. The main features of this system are outlined below. C.2 The tree is divided into three key fields
C.2.1 Mother name
The first key field contains the mother name, which usually consists of Roman letters and blank characters, and may be interspersed with italic letters, Greek letters, numbers or special characters (such as punctuation marks). C.2.2 First elements
The second key field contains the first element, which is all the characters before the first Roman letter. C.2. 3 Other elements
The third key field contains non-first elements, that is, all the remaining characters. Note: The name of \2-Butanone-1-1,1-d,3,3-dimethyl\ can be divided into the following two key fields: .R.:ranone dimethyl><2-> --l, 1,1. uta.3.3 C.3 Sorting rules within each key field
The first key field is sorted according to the rules of the text of this document. In the second and third key fields, the sorting method is as follows; - letters of the Latin alphabet (in italics), in the order specified in 5.2 h); letters of the Greek alphabet, in the order specified in 5.2 c); - numbers in numerical order.
C, 4 Sorting result example
Table C, 1 compares the results obtained by sorting according to the rules described in this appendix with the results obtained by sorting according to the rules of the main text of this standard. Table C.1 Comparison of the results of sorting under two rules According to the order in Appendix C
Bramine fluride(BrF, )
Bromine fluoride(BrF,)
2-unol(R)
2-Butunol.($).
2-Hutano,sndium sait,(.S)-
2 Bntael.1 hlt
[-But.anonr:
According to the general order
1 Butanone,i-phenyl-
2-Butanl. 2-chloro-
2-Rulatcl,4-(trimethylstannyl)2-Hutatal.(R)
2-Butaul.(S)-1 Background
Appendix B
(Informative)
Special rules for sorting dictionaries and terminology
In the case of dictionaries and terminology, additional sorting rules are sometimes required in addition to the rules described in this standard. The features described in this annex are not easily described in the form specified in ISO/IEC 14651. B, 2 Position relative to the baseline
When sorting, it is sometimes necessary to distinguish the relative position of characters to the baseline, such as 1m2, m, mz. It is recommended to sort by case when sorting at the third level (see Chapter 7). The relative position of characters to the baseline can be determined based on Table B.1. Table B.1 Position of characters relative to the baseline
Characters close to the baseline
B.3 Arrange by font
Characters above the baseline, superscript characters
Characters below the baseline, subscript characters
If the first to fourth levels of sorting cannot produce a unique order, then consider using the printing type as the fifth level of sorting. Font sorting can be performed according to Table B.2. Table B.2 Font order
Ningbo name
Roman
Black italic
ahednfghij
ahcdefghij
abedefghij
C.1 Background
Appendix C
(Informative)
Ordering rules for chemical names
GB/19708—2005/IS0 2199:2000 There is currently no generally accepted ordering rule for chemical names. If necessary, this standard may be used in conjunction with the extended rules for word-by-word ordering described in Appendix A to order chemical names. However, some indexes and databases, particularly those of the Chemical Abstracts Service (CAS), use a specially designed multi-key field ordering system. The main features of this system are outlined below. C.2 The tree is divided into three key fields
C.2.1 Mother name
The first key field contains the mother name, which usually consists of Roman letters and blank characters, and may be interspersed with italic letters, Greek letters, numbers or special characters (such as punctuation marks). C.2.2 First elements
The second key field contains the first element, which is all the characters before the first Roman letter. C.2. 3 Other elements
The third key field contains non-first elements, that is, all the remaining characters. Note: The name of \2-Butanone-1-1,1-d,3,3-dimethyl\ can be divided into the following two key fields: .R.:ranone dimethyl><2-> --l, 1,1. uta.3.3 C.3 Sorting rules within each key field
The first key field is sorted according to the rules of the text of this document. In the second and third key fields, the sorting method is as follows; - letters of the Latin alphabet (in italics), in the order specified in 5.2 h); letters of the Greek alphabet, in the order specified in 5.2 c); - numbers in numerical order.
C, 4 Sorting result example
Table C, 1 compares the results obtained by sorting according to the rules described in this appendix with the results obtained by sorting according to the rules of the main text of this standard. Table C.1 Comparison of the results of sorting under two rules According to the order in Appendix C
Bramine fluride(BrF, )
Bromine fluoride(BrF,)
2-unol(R)
2-Butunol.($).
2-Hutano,sndium sait,(.S)-
2 Bntael.1 hlt
[-But.anonr:
According to the general order
1 Butanone,i-phenyl-
2-Butanl. 2-chloro-
2-Rulatcl,4-(trimethylstannyl)2-Hutatal.(R)
2-Butaul.(S)-1 Background
Appendix B
(Informative)
Special rules for sorting dictionaries and terminology
In the case of dictionaries and terminology, additional sorting rules are sometimes required in addition to the rules described in this standard. The features described in this annex are not easily described in the form specified in ISO/IEC 14651. B, 2 Position relative to the baseline
When sorting, it is sometimes necessary to distinguish the relative position of characters to the baseline, such as 1m2, m, mz. It is recommended to sort by case when sorting at the third level (see Chapter 7). The relative position of characters to the baseline can be determined based on Table B.1. Table B.1 Position of characters relative to the baselinewww.bzxz.net
Characters close to the baseline
B.3 Arrange by font
Characters above the baseline, superscript characters
Characters below the baseline, subscript characters
If the first to fourth levels of sorting cannot produce a unique order, then consider using the printing type as the fifth level of sorting. Font sorting can be performed according to Table B.2. Table B.2 Font order
Ningbo name
Roman
Black italic
ahednfghij
ahcdefghij
abedefghij
C.1 Background
Appendix C
(Informative)
Ordering rules for chemical names
GB/19708—2005/IS0 2199:2000 There is currently no generally accepted ordering rule for chemical names. If necessary, this standard may be used in conjunction with the extended rules for word-by-word ordering described in Appendix A to order chemical names. However, some indexes and databases, particularly those of the Chemical Abstracts Service (CAS), use a specially designed multi-key field ordering system. The main features of this system are outlined below. C.2 The tree is divided into three key fields
C.2.1 Mother name
The first key field contains the mother name, which usually consists of Roman letters and blank characters, and may be interspersed with italic letters, Greek letters, numbers or special characters (such as punctuation marks). C.2.2 First elements
The second key field contains the first element, which is all the characters before the first Roman letter. C.2. 3 Other elements
The third key field contains non-first elements, that is, all the remaining characters. Note: The name of \2-Butanone-1-1,1-d,3,3-dimethyl\ can be divided into the following two key fields: .R.:ranone dimethyl><2-> --l, 1,1. uta.3.3 C.3 Sorting rules within each key field
The first key field is sorted according to the rules of the text of this document. In the second and third key fields, the sorting method is as follows; - letters of the Latin alphabet (in italics), in the order specified in 5.2 h); letters of the Greek alphabet, in the order specified in 5.2 c); - numbers in numerical order.
C, 4 Sorting result example
Table C, 1 compares the results obtained by sorting according to the rules described in this appendix with the results obtained by sorting according to the rules of the main text of this standard. Table C.1 Comparison of the results of sorting under two rules According to the order in Appendix C
Bramine fluride(BrF, )
Bromine fluoride(BrF,)
2-unol(R)
2-Butunol.($).
2-Hutano,sndium sait,(.S)-
2 Bntael.1 hlt
[-But.anonr:
According to the general order
1 Butanone,i-phenyl-
2-Butanl. 2-chloro-
2-Rulatcl,4-(trimethylstannyl)2-Hutatal.(R)
2-Butaul.(S)-
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.