Independent vowels are confusing #95
Labels
doc:guru
gap
i:encoding
Characters & encoding
l:pa
Punjabi, Gurmukhi script
p:advanced
s:guru
Gurmukhi script
x:guru
Like other Indic scripts, Gurmukhi has independent vowels which may be visualised as made up of 2 code points, whereas Unicode provides precomposed code points for each independent vowel. The precomposed code points and the decomposed sequences that may be rendered to look the same are not canonically equivalent in Unicode, and therefore may be problematic for users who are unaware.
This is particularly pronounced for Gurmukhi because in principle independent vowels are (visually) a vowel carrier plus a vowel sign. For more information see Standalone vowels.
Searching Google for the word ਅਾਲੂ (potato), where the initial 'a' sound is composed of 2 code points, rather than the precomposed code point recommended by Unicode, produces 2,570 pages, compared to 361,000 using the precomposed character. While this is small in comparison (0.7%), it is large enough to indicate an issue.
Browsers should be able to recognise the decomposed sequences and treat them as equivalent to the precomposed code points for sorting, search, collation, etc.
Many fonts produce a dotted circle or fail to correctly align the glyphs of the decomposed sequence, which also helps reduce this issue, however some fonts do not (such as the Gurmukhi MN Mac system font).
The text was updated successfully, but these errors were encountered: