Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

字符集与字符编码 #200

Open
teazean opened this issue Jan 2, 2018 · 0 comments
Open

字符集与字符编码 #200

teazean opened this issue Jan 2, 2018 · 0 comments

Comments

@teazean
Copy link
Owner

teazean commented Jan 2, 2018

阅读:https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/

  • 字符集: charset
  • 字符编码:charactor encoding,字符集的存储实现方式

首先呢,一开始只有 ascii 字符集ascii 本身之规定了前 128 个字符;然后就开始有人使用 ascii 的后 128 个字符做各种扩展,被称为 code pages,这些 code pages 各种各样,并且还不能相互通用;后来遇到亚洲文字就 sb 了,比如中文后来就产生了各种各样的字符集,如 gbk 等;之后国际标准为了统一各种字符集,于是产生了 unicode 字符集;但是 unicode 如果以固定字节去实现所有字符,就会导致空间的严重浪费,就产生了 utf-8utf-16等各种编码实现

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant