Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Independent vowels are confusing #95

Open
r12a opened this issue Feb 5, 2020 · 3 comments
Open

Independent vowels are confusing #95

r12a opened this issue Feb 5, 2020 · 3 comments
Labels
doc:guru gap i:encoding Characters & encoding l:pa Punjabi, Gurmukhi script p:advanced s:guru Gurmukhi script x:guru

Comments

@r12a
Copy link
Contributor

r12a commented Feb 5, 2020

Like other Indic scripts, Gurmukhi has independent vowels which may be visualised as made up of 2 code points, whereas Unicode provides precomposed code points for each independent vowel. The precomposed code points and the decomposed sequences that may be rendered to look the same are not canonically equivalent in Unicode, and therefore may be problematic for users who are unaware.

This is particularly pronounced for Gurmukhi because in principle independent vowels are (visually) a vowel carrier plus a vowel sign. For more information see Standalone vowels.

Searching Google for the word ਅਾਲੂ (potato), where the initial 'a' sound is composed of 2 code points, rather than the precomposed code point recommended by Unicode, produces 2,570 pages, compared to 361,000 using the precomposed character. While this is small in comparison (0.7%), it is large enough to indicate an issue.

Browsers should be able to recognise the decomposed sequences and treat them as equivalent to the precomposed code points for sorting, search, collation, etc.

Many fonts produce a dotted circle or fail to correctly align the glyphs of the decomposed sequence, which also helps reduce this issue, however some fonts do not (such as the Gurmukhi MN Mac system font).

@r12a
Copy link
Contributor Author

r12a commented Feb 5, 2020

The first comment in this issue contains text that will automatically appear in the Gurmukhi gap-analysis document as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

@lianghai
Copy link

lianghai commented Mar 7, 2020

This is not a Gurmukhi-specific issue. All Indic script encoded with the ISCII model suffer from this issue, for example, Devanagari आ ≠ अ + ा. A mismatch between phonetic segmentation (the base of the ISCII model) and graphic segmentation of text.

The only special aspect in Gurmukhi is that, the three vowel-sign carriers (not “independent vowels”) recognized by the native analysis as letters are not all used as independent vowels (ie, ੳ and ੲ being non-independent-vowel vowel-sign-carrier native letters).

But from a confusability’s point of view, ੳ and ੲ are just two directly encoded letters (and thus accessible to users when inputting). Gurmukhi ਉ ≠ ੳ + ੁ isn’t quite different from Malayalam ഓ ≠ ഒ + ാ.

@r12a r12a added the x:guru label May 17, 2021
@r12a
Copy link
Contributor Author

r12a commented Feb 6, 2023

Rewrote Kulpreet's original text. Will add links to the upcoming Gurmukhi layout page when it is available.

@r12a r12a added the l:pa Punjabi, Gurmukhi script label May 1, 2024
@r12a r12a moved this to Issue identified, needing investigation in Gap-analysis pipeline Jun 20, 2024
@r12a r12a added the s:guru Gurmukhi script label Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
doc:guru gap i:encoding Characters & encoding l:pa Punjabi, Gurmukhi script p:advanced s:guru Gurmukhi script x:guru
Projects
Status: Issue identified, needing investigation
Development

No branches or pull requests

2 participants