title>GB 18030-2000 Information technology - Extension of the basic set of Chinese coded character sets for information exchange - GB 18030-2000 - Chinese standardNet - bzxz.net
Home > GB > GB 18030-2000 Information technology - Extension of the basic set of Chinese coded character sets for information exchange
GB 18030-2000 Information technology - Extension of the basic set of Chinese coded character sets for information exchange

Basic Information

Standard ID: GB 18030-2000

Standard Name: Information technology - Extension of the basic set of Chinese coded character sets for information exchange

Chinese Name: 信息技术 信息交换用汉字编码字符集基本集的扩充

Standard category:National Standard (GB)

state:Abolished

Date of Release2000-03-01

Date of Implementation:2000-07-01

Date of Expiration:2006-05-01

standard classification number

Standard ICS number:Information technology, office machinery and equipment >> 35.040 Character sets and information coding

Standard Classification Number:Electronic Components and Information Technology >> Information Processing Technology >> L71 Coding, Character Set, Character Recognition

associated standards

alternative situation:Replaced by GB 18030-2005

Publication information

publishing house:China Standards Press

ISBN:155066.1-17504

Publication date:2004-03-26

other information

Review date:2004-10-14

Drafting unit:Electronic Industry Standardization Institute of Ministry of Information Industry

Focal point unit:National Information Technology Standardization Technical Committee

Publishing department:State Administration of Quality and Technical Supervision

competent authority:National Standardization Administration

Introduction to standards:

This standard applies to the processing, exchange, storage, transmission, display, input and output of graphic character information. GB 18030-2000 Information Technology Information Interchange Chinese Character Coded Character Set Basic Set Extension GB18030-2000 Standard Download Decompression Password: www.bzxz.net

Some standard content:

GB18030—2000
This standard, as the Ning character encoding standard of the GB/T2311 system, specifies the basic graphic characters for information exchange and the hexadecimal representation of their binary codes.
This standard is applicable to the processing, exchange, storage, transmission, display, input and output of graphic character information. This standard is an expansion of GB2312.
This standard specifically specifies the single-byte encoding and double-byte encoding of graphic characters, and stipulates the four-byte encoding system architecture. Appendix A, Appendix B, Appendix C, Appendix D and Appendix E of this standard are the appendices of the standard. From the date of entry into force, this standard will replace the technical specification guidance document "Chinese Character Internal Code Extension Specification (GBK)" Version 1.0 jointly issued and implemented by the former State Technical Supervision Bureau Standardization Department and the former Ministry of Electronics Industry Science and Technology and Quality Supervision Department in the form of Technical Supervision Letter [1995] No. 229.
This standard is proposed by the Ministry of Information Industry of the People's Republic of China. This standard is under the jurisdiction of the Electronic Industry Standardization Institute of the Ministry of Information Industry. This standard was drafted by: Electronic Industry Standardization Institute of the Ministry of Information Industry, Institute of Computer Technology of Peking University, Founder Group of Peking University, Beijing Founder New World Network Technology Co., Ltd., Sizhu Group, China Science and Technology Software Institute, Great Wall Software Company, Sitong Lifang Company, Chinasoft Corporation, Kingsoft Software Company. Lenovo. The main drafters of this standard are Chen Wangqiu, Huang Jiang, Hu Wanjin, Zhang Jianguo and Chen Zhuang. 25
National Standard of the People's Republic of China
Information Technology
Chinese ideograms coded character setfor information interchange-Extension for thebasic set1Function
GB 18030—2000www.bzxz.net
This standard, as the coded character standard of GB/T 2311 system, specifies the graphic characters for information interchange and the hexadecimal representation of their binary codes.
This standard applies to the processing, exchange, storage, transmission, display, input and output of graphic information. 2 Referenced standards
The following standards contain the accompanying texts, which constitute the provisions of this standard through reference in this standard. When this standard is published, the versions shown are valid. All standards are subject to revision, and the parties using this standard should explore the possibility of using the latest versions of the following standards. GB/T2311—1990 Information processing seven-bit and eight-bit coded character set code protection technology (eqVIS02022, 1986) GB2312—1980 Basic set of Chinese coded character set for information interchange GB/T11383—1989 Information processing, structure and encoding rules for eight-bit codes for information interchange (idtISC) 4873.1986) GB/T12345—1990 Supplementary set of Sinology coded character set for information interchange GB 13000.1-1993 Information technology Universal eight-bit coded character set (UCS) Part 1: Architecture and basic multilingual plane (idt ISO/IEC10646-1—1993) 3 Principles
This standard is downwardly compatible with the de facto internal code standard corresponding to the national standard B2312 Information processing interchange code. This standard supports all Chinese, Korean (CIK) unified Chinese characters and all CJK unified Chinese characters in GB 13000.1.
4 Definitions
This standard adopts the following definitions.
4.1 Repertoire
A specified set of characters represented by a coded character set. 4.2 Character
An element in a set of elements used to organize, control or represent data. 4.3 Coded character
Character and its coded representation.
4.4 Reserved zone
Area reserved in this standard for future international standardization. Approved by the State Administration of Quality and Technical Supervision on March 17, 2000. Implemented on March 17, 2000
5 Li Hui
GB 18030—2000
The characters included in this standard are encoded in single-byte, double-byte and four-byte codes. 5.1 Single-byte part
In this standard, the single-byte part includes all 128 characters from 0x00 to 0x7F of GB/T 11383. 5.2 Double-byte part
In this standard, the double-byte part includes the following: a) All CJK unified Chinese characters of GB 13000.1—1993: b) GB 13000.1-1993 but not included in GB2312; d) 31 other characters included in GB13000.1-1993; e) Chinese characters in GB2312-1980; f) 19 punctuation marks in a row in GB12345-1990; 10 lowercase Roman numerals not included in GB2312-1980; h) 5 Chinese phonetic letters with tones not included in GB 2312-1980 and g; i) 13 descriptors of Sinology number “
ideographic characters:
k) for GB 13000.1---1993 added 80 characters and radicals/components: 1) Double-byte encoded Euro symbol.
5.3 Four-byte part
The four-byte part of this standard includes all characters in GB13000.1, including CJK Unified Sinology Extension A, except the above two-byte characters.
6 Overall structure
In this standard, three methods of character encoding are used: single-byte, double-byte and four-byte. Any byte in this standard is composed of an eight-bit binary bit string, and any eight-bit value is represented by hexadecimal notation from 0x00 to 0xFF. The single-byte part adopts the coding structure and rules of GB/T11383, using code points from 0x00 to 0x7F. The double-byte part uses two eight-bit binary bit strings to represent a character, with the first byte code points from 0x81 to 0xFE, and the last byte code points from 0x40 to 0x7E and 0x80 to 0xFE respectively. The four-byte part uses 0x30 to 0x39, which was not adopted in GB/T11383, as the suffix for expanding the double-byte code. The range of the expanded four-byte code is 0x81308130 to 0xFE39FE39. See Table 1 and Figure 1. Table 1 Code point model allocation diagram
Number of bytes
Single byte
Double byte
Four bytes
First byte
0x81~0xFE
First byte
0x81~0xFE
Code space
0x00~-0x7F
Th byte
0x30~~ 0x3g
Second byte
x40~0x7E,
0x80~0xFE
Third byte
0x81-0xFE
Fourth byte
0x30~0x39
Number of code positions
128 code positions
23940 positions
1587600 code positions
The encoding of four-byte characters starts from the fourth byte, with code positions from 0x30 to 0x39+ followed by the third byte, with code positions from 0x81 to 0xFE; then the second byte, with code positions from 0x30 to 0x39, and finally the first byte, with code positions from 0x81 to 0xFE. That is:
0x81308130 to 0x81308139;
0x81308230 to 0x81308239
0x8130FE30 to 0x8130FE39
0x81318130 to 0x81318139+
0x8131FE30 to 0x8131FE39;
0x82308130 to 0x82308139;
0x8230FE30 to 0x8230FE39
0xFE308130 to 0xFE308139;
0xFE39FE30 to 0xFE39FE39.
GB18030—2000
Note: In this standard, numbers preceded by Ux are expressed in hexadecimal, and numbers ending with Ux are expressed in decimal. 253
First byte
Second byte
First and second bytes
Full byte
GB 18030—2000
Single-byte structure
Double-byte structure
Total 1260 characters
Third byte
Third and fourth byte
Third group
Overall structure diagram
Fourth byte
Four-byte overall structure
Character arrangement sequence
7.1 Arrangement sequence of characters in the single-byte part
GB 18030—2000
All characters in the single-byte part of this standard are arranged in the order of the corresponding characters in GB/T11383. See Figure 2, 7.2 Order of characters in the double-byte part
The order of characters in the double-byte part of this standard is shown in Appendix A. 7.3 Order of characters in the four-byte part
There are 50,400 code positions from 0x81308130 to 0x8439FE39, which correspond to all GB13000.1 characters not included in the double-byte part of this standard. They are arranged in the order of the corresponding characters in GB13000.1, and the remaining code positions are reserved. There are 12,600 code positions from 0x85308130 to 0x8539FE39, which are reserved for this standard and will be used for future expansion of Chinese characters. There are 126,000 code positions from 0x86308130 to 0x8F39FE39, which are reserved for this standard and will be used for future expansion of Chinese characters.
There are 1058400 code positions from 0x90308130 to 0xE339FE39, which are used to correspond to the 16 auxiliary planes of GB13000.1. The character arrangement order is completely in accordance with the corresponding code position order of the 16 auxiliary planes of GB13000.1, and the remaining code positions are reserved. There are 315000 code positions from 0xE4308130 to 0xFC39FE39, which are reserved for future standard expansion. There are 25200 code positions from 0xFD308130 to 0xFE39FF39, which are user-defined areas. 8 Code position allocation
8.1 Code position allocation of single-byte part
In this standard, the code position allocation of the single-byte part can be found in GB/T11383. See Figure 2. 25.5
2 Code position allocation of double-byte part
GB18030—2000
Figure 2 Code position diagram of single-byte area
In this standard, the code position arrangement of the double-byte part is divided into two parts, x8140 to 0xFE7E and x8180 to 0xFEFE, with a total of 23940 code positions. See Figure 3 and Table 2.
Ken Xue Jie
0xA8-0xA9
Double-byte 5 area:
192 code positions
Chinese character area (21008)
Double-byte 3 area: 6080 code positions
En byte
Double-byte user area 3+672 code positions
Double-byte 4 area: 8160 code positions
oxA0 oxA1
Graphic symbol area (1038)
Double-byte area 1: 846 code positions
Double-byte user area 1+564 code particles
Double-byte area 2: 6768 code positions
Double-byte user area 2: 658 code positions
User-defined text area (1894)
Figure 3 Shuangning section division coding space structure diagram
Total: 23940 code positions
GB 18030—2000
Symbol area
Chinese character area
User
Fixed text area
Double-byte area 1
Double-byte area 5
Double-byte area 2
Double-byte area 3
Double-byte area 4
Double-byte user area 1
Double-byte user area 2
Number Ningjie user area 3
GB18030-2000
Code position arrangement of the double-byte part of the table art
Code position range
A1A1--A9FE| |tt||A840~A9A0
B0AI--F7FE
8140~A0FE
AA40--FEA0
AAA1--AFFE
F8A1~FEFE
A140A7A0
Number of code bits
Number of characters
Character type
Graphic symbol
Graphic number
In this standard, in the double-byte Chinese area (i.e. double-byte area 2, 3, 4), CJK unified Chinese characters are in front and supplementary Chinese characters are in the back. Among them, the abbreviated Chinese characters of GB2312 are arranged in the double-byte area 2. The 21 CJK compatible Chinese characters selected in GB13000.1 are encoded in double-byte area 4 from 0xFD9C to 0xFDA0 and UxFE40 to 0xFE4F. 80 additional Chinese characters and radicals/components are encoded in double-byte area 4. 139 graphic characters used in Taiwan, my country, which are included in GB13000.1 but not in GB2312, and 13 modified characters, numbers*\ and ideographic descriptors are encoded in double-byte area 5. Non-Chinese characters in GB2312, 5 Chinese phonetic characters with tones not included in GB2312, 9 characters and 9 characters, 10 lowercase Roman numerals not included in GB2312, 19 vertical punctuation marks in GB/T12345 and the Euro symbol encoded in double bytes (code (xA2E3) are encoded in double-byte 1 area.
B.3 Code allocation of four-byte part
For the code allocation of four-byte part, see 7.3. 258
Contents of Table A1
GB 18030—2000
Appendix A
(Standard Appendix)
Double-byte character decay
This table gives all GB13000.1 codes and their corresponding glyphs for the double-byte part of this standard. The decay description
example is as follows:
First byte
Second byte high
Second byte low
Double-byte 3 area
GB 13000.1 glyph
GB 13000.1 code
2+日+
GB 18030—2000
Double-byte 1 area
上:3
GB18030—2000
Double-byte 1 area
10元8
30+点1. The corresponding code positions of the 16 auxiliary planes are arranged in order, and the remaining code positions are reserved. From 0xE4308130 to 0xFC39FE39, there are 315,000 code positions, which are reserved for this standard and are reserved for future standard expansion. From 0xFD308130 to 0xFE39FF39, there are 25,200 code positions + user defined areas. 8 Code position allocation
8.1 Code position allocation of single-byte part
In this standard, the code position allocation of the single-byte part can be found in GB/T11383. See Figure 2. 25.5
2 Code position allocation of double-byte part
GB18030—2000
Figure 2 Code position diagram of single-byte area
In this standard, the code position arrangement of the double-byte part is divided into two parts, x8140 to 0xFE7E and x8180 to 0xFEFE, with a total of 23940 code positions. See Figure 3 and Table 2.
Ken Xue Jie
0xA8-0xA9
Double-byte 5 area:
192 code positions
Chinese character area (21008)
Double-byte 3 area: 6080 code positions
En byte
Double-byte user area 3+672 code positions
Double-byte 4 area: 8160 code positions
oxA0 oxA1
Graphic symbol area (1038)
Double-byte area 1: 846 code positions
Double-byte user area 1+564 code particles
Double-byte area 2: 6768 code positions
Double-byte user area 2: 658 code positions
User-defined text area (1894)
Figure 3 Shuangning section division coding space structure diagram
Total: 23940 code positions
GB 18030—2000
Symbol area
Chinese character area
User
Fixed text area
Double-byte area 1
Double-byte area 5
Double-byte area 2
Double-byte area 3
Double-byte area 4
Double-byte user area 1
Double-byte user area 2
Number Ningjie user area 3
GB18030-2000
Code position arrangement of the double-byte part of the table art
Code position range
A1A1--A9FE| |tt||A840~A9A0
B0AI--F7FE
8140~A0FE
AA40--FEA0
AAA1--AFFE
F8A1~FEFE
A140A7A0
Number of code bits
Number of characters
Character type
Graphic symbol
Graphic number
In this standard, in the double-byte Chinese area (i.e. double-byte area 2, 3, 4), CJK unified Chinese characters are in front and supplementary Chinese characters are in the back. Among them, the abbreviated Chinese characters of GB2312 are arranged in the double-byte area 2. The 21 CJK compatible Chinese characters selected in GB13000.1 are encoded in double-byte area 4 from 0xFD9C to 0xFDA0 and UxFE40 to 0xFE4F. 80 additional Chinese characters and radicals/components are encoded in double-byte area 4. 139 graphic characters used in Taiwan, my country, which are included in GB13000.1 but not in GB2312, and 13 modified characters, numbers*\ and ideographic descriptors are encoded in double-byte area 5. Non-Chinese characters in GB2312, 5 Chinese phonetic characters with tones not included in GB2312, 9 characters and 9 characters, 10 lowercase Roman numerals not included in GB2312, 19 vertical punctuation marks in GB/T12345 and the Euro symbol encoded in double bytes (code (xA2E3) are encoded in double-byte 1 area.
B.3 Code allocation of four-byte part
For the code allocation of four-byte part, see 7.3. 258
Contents of Table A1
GB 18030—2000
Appendix A
(Standard Appendix)
Double-byte character decay
This table gives all GB13000.1 codes and their corresponding glyphs for the double-byte part of this standard. The decay description
example is as follows:
First byte
Second byte high
Second byte low
Double-byte 3 area
GB 13000.1 glyph
GB 13000.1 code
2+日+
GB 18030—2000
Double-byte 1 area
上:3
GB18030—2000
Double-byte 1 area
10元8
30+点1. The corresponding code positions of the 16 auxiliary planes are arranged in order, and the remaining code positions are reserved. From 0xE4308130 to 0xFC39FE39, there are 315,000 code positions, which are reserved for this standard and are reserved for future standard expansion. From 0xFD308130 to 0xFE39FF39, there are 25,200 code positions + user defined areas. 8 Code position allocation
8.1 Code position allocation of single-byte part
In this standard, the code position allocation of the single-byte part can be found in GB/T11383. See Figure 2. 25.5
2 Code position allocation of double-byte part
GB18030—2000
Figure 2 Code position diagram of single-byte area
In this standard, the code position arrangement of the double-byte part is divided into two parts, x8140 to 0xFE7E and x8180 to 0xFEFE, with a total of 23940 code positions. See Figure 3 and Table 2.
Ken Xue Jie
0xA8-0xA9
Double-byte 5 area:
192 code positions
Chinese character area (21008)
Double-byte 3 area: 6080 code positions
En byte
Double-byte user area 3+672 code positions
Double-byte 4 area: 8160 code positions
oxA0 oxA1
Graphic symbol area (1038)
Double-byte area 1: 846 code positions
Double-byte user area 1+564 code particles
Double-byte area 2: 6768 code positions
Double-byte user area 2: 658 code positions
User-defined text area (1894)
Figure 3 Shuangning section division coding space structure diagram
Total: 23940 code positions
GB 18030—2000
Symbol area
Chinese character area
User
Fixed text area
Double-byte area 1
Double-byte area 5
Double-byte area 2
Double-byte area 3
Double-byte area 4
Double-byte user area 1
Double-byte user area 2
Number Ningjie user area 3
GB18030-2000
Code position arrangement of the double-byte part of the table art
Code position range
A1A1--A9FE| |tt||A840~A9A0
B0AI--F7FE
8140~A0FE
AA40--FEA0
AAA1--AFFE
F8A1~FEFE
A140A7A0
Number of code bits
Number of characters
Character type
Graphic symbol
Graphic number
In this standard, in the double-byte Chinese area (i.e. double-byte area 2, 3, 4), CJK unified Chinese characters are in front and supplementary Chinese characters are in the back. Among them, the abbreviated Chinese characters of GB2312 are arranged in the double-byte area 2. The 21 CJK compatible Chinese characters selected in GB13000.1 are encoded in double-byte area 4 from 0xFD9C to 0xFDA0 and UxFE40 to 0xFE4F. 80 additional Chinese characters and radicals/components are encoded in double-byte area 4. 139 graphic characters used in Taiwan, my country, which are included in GB13000.1 but not in GB2312, and 13 modified characters, numbers*\ and ideographic descriptors are encoded in double-byte area 5. Non-Chinese characters in GB2312, 5 Chinese phonetic characters with tones not included in GB2312, 9 characters and 9 characters, 10 lowercase Roman numerals not included in GB2312, 19 vertical punctuation marks in GB/T12345 and the Euro symbol encoded in double bytes (code (xA2E3) are encoded in double-byte 1 area.
B.3 Code allocation of four-byte part
For the code allocation of four-byte part, see 7.3. 258
Contents of Table A1
GB 18030—2000
Appendix A
(Standard Appendix)
Double-byte character decay
This table gives all GB13000.1 codes and their corresponding glyphs for the double-byte part of this standard. The decay description
example is as follows:
First byte
Second byte high
Second byte low
Double-byte 3 area
GB 13000.1 glyph
GB 13000.1 code
2+日+
GB 18030—2000
Double-byte 1 area
上:3
GB18030—2000
Double-byte 1 area
10元8
30+点3. 258
Contents of Table A1
GB 18030—2000
Appendix A
(Standard Appendix)
Double-byte Character Decay
This table gives all GB13000.1 codes and their corresponding glyphs for the double-byte part of this standard. The following are examples of the explanation of decay:
First byte
Second byte high
Second byte low
Double byte 3 area
GB 13000.1 font
GB 13000.1 code
2+日+
GB 18030—2000
Double byte 1 area
Above: 3
GB18030—2000
Double byte 1 area
10 yuan 8
30+点3. 258
Contents of Table A1
GB 18030—2000
Appendix A
(Standard Appendix)
Double-byte Character Decay
This table gives all GB13000.1 codes and their corresponding glyphs for the double-byte part of this standard. The following are examples of the explanation of decay:
First byte
Second byte high
Second byte low
Double byte 3 area
GB 13000.1 font
GB 13000.1 code
2+日+
GB 18030—2000
Double byte 1 area
Above: 3
GB18030—2000
Double byte 1 area
10 yuan 8
30+点
Tip: This standard content only shows part of the intercepted content of the complete standard. If you need the complete standard, please go to the top to download the complete standard document for free.