Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tamil conjuncts are not selected as a single unit when styling initials #116

Open
r12a opened this issue Mar 30, 2021 · 2 comments
Open

Comments

@r12a
Copy link
Contributor

r12a commented Mar 30, 2021

When the start of a line contains a consonant cluster that uses a conjunct (rather than visible virama), ::first-letter should highlight the whole cluster. Usually, modern Tamil has only two of these conjuncts, however one of them can be created in two ways (making a total of 3 clusters to test).

This doesn't work well if segmentation relies on Unicode grapheme clusters, since a conjunct with two consonants will be parsed as two grapheme clusters (the first ending after the virama, and the second starting with the second consonant and including any following vowel-signs or other combining characters).

For these situations it is necessary to tailor the segmentation algorithm, so that it recognises the whole consonant cluster plus any attached vowel-signs or combining characters as a single unit. This is a particular issue for Tamil, since all other clusters are typically decomposed and show the virama.

Specs:

css-text-3 CSS uses the concept of 'typographic character unit', rather than grapheme cluster, in its specs with the explanation that the cases just described go beyond the scope of the grapheme cluster concept and that implementations should provide appropriate support. The spec doesn't provide details about the support needed for each language.

The Unicode Consortium made some attempts to address this issue, but it has so far not yielded results. CLDR now flags up a few scripts for which conjuncts are common. Tamil is not among them.

Tests & results:
Interactive test, When ::first-letter is applied to Tamil the browser will select the KSHA and SHRI conjuncts as a single unit

Gecko produces the expected result. Blink, and Webkit only select the first consonant+pulli.

Browser bug reports:
ChromiumWebkit

Priority:
The impact here is advanced, since the impact of the failures cited here on the user is likely to be very small, especially since they can resort to markup in the rare cases where the conjuncts are not properly handled. Not many words begin with the conjuncts tested. (One example of such would be ஶ்ரீநகர்)

@r12a
Copy link
Contributor Author

r12a commented Mar 30, 2021

The first comment in this issue contains text that will automatically appear in one or more gap-analysis documents as a subsection with the same title as this issue. Any edits made to that comment will be immediately available in the document. Proposals for changes or discussion of the content can be made in comments below this point.

Relevant gap analysis documents include:
Tamil

@r12a r12a added p:advanced and removed p:basic labels Mar 30, 2021
@r12a r12a changed the title Conjuncts are not selected as a single unit when styling initials Tamil conjuncts are not selected as a single unit when styling initials May 18, 2021
@xfq
Copy link
Member

xfq commented Dec 2, 2021

Added links to bug reports.

@r12a r12a added the l:ta Tamil language & script label May 1, 2024
@r12a r12a moved this to Bug in discussion in Gap-analysis pipeline Jun 20, 2024
@r12a r12a added the s:taml Tamil script label Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Bug in discussion
Development

No branches or pull requests

2 participants