Should the UtfCodepoint
type be renamed?
#4107
Replies: 2 comments 2 replies
-
Another data point: the Gleam compiler error for a bad unicode code point refers to "Invalid Unicode codepoint" rather than "Invalid UTF codepoint".
(That error was generated on the Gleam tour page.) |
Beta Was this translation helpful? Give feedback.
-
Hello! Thank you for this thorough write up. I agree with you that the current name is a mistake, and it would be good to correct that. I spoke to the team and we're not in agreement that it's worth the deprecation. The old name would continue to exist until Gleam v2 (which may never happen), emitting a warning if used, and that's some baggage the language ideally would not have. Do you have thoughts on this? |
Beta Was this translation helpful? Give feedback.
-
I wonder if the
UtfCodepoint
type and its associated functions in the stdlibstring
module should be renamed.I haven't seen the term UTF codepoint used outside of Gleam's stdlib and prelude really. That doesn't necessarily mean that the term is technically incorrect, but it does seem to indicate that it may not be the best term to use.
The
UtfCodepoint
type and its functions from the stdlib are dealing with what is generally called "unicode code point" or just "code point". For example, in these docs the thing that Gleam callsUtfCodepoint
is called a unicode code point or simply code point, rather than UTF code point.I've pulled a couple of definitions from the Unicode glossary, for reference.
Potential Alternative Names
So it seems that a better name may be something like
UnicodeCodepoint
(orUnicodeCodePoint
): explicit, and seems to follow with accepted definitions, but potentially awkward soundingCodepoint
(orCodePoint
): shorter and nicer to type, but it does assume that we're working in the Unicode codespace (which is probably a fine assumption for Gleam)Ucp
: short and looks nice in the function names, but maybe not very gleamy to introduce an acronym like thisCurrent functions in the stdlib:
Could look like this with one of those new names:
or
or
Summary
UtfCodepoint
does not seem to follow with the general usage of the terms involved (unicode, UTF, code point).Beta Was this translation helpful? Give feedback.
All reactions