Skip to content

Commit

Permalink
Merge pull request n8willis#83 from n8willis/base-clarification
Browse files Browse the repository at this point in the history
Base clarification in Indic. This updates instances of the term "base consonant" to either "syllable base" or to "base consonant or syllable base", whichever is more appropriate in the context.

Notably, it also changes `POS_BASE_CONSONANT` to `POS_SYLLABLE_BASE` in the sort-ordering list.

The result ought to be more flexible for treating consonant-based and independent-vowel–based (IV-based) syllables identically. 

This affects all Indic script docs linked to from the "indic-general" document. Note that in some situations, there may not actually be any attestation that a particular feature will occur in an IV-based syllable in a particular language, but we are aiming for consistency across the whole script set, and carving out exceptions does not make things simpler.

Also note that the current change set does _not_ affect "Brahmic"-derived scripts that aren't handled by the old Indic/Indic2 shaper models (such as Myanmar and Khmer). In all likelihood we'll make the same change there, too, but since there are other major differences, we're working on them separately.
  • Loading branch information
n8willis authored Feb 11, 2020
2 parents f25925e + 2fd7fb4 commit 7c7f49e
Show file tree
Hide file tree
Showing 11 changed files with 1,398 additions and 736 deletions.
230 changes: 145 additions & 85 deletions opentype-shaping-bengali.md

Large diffs are not rendered by default.

215 changes: 140 additions & 75 deletions opentype-shaping-devanagari.md

Large diffs are not rendered by default.

203 changes: 134 additions & 69 deletions opentype-shaping-gujarati.md

Large diffs are not rendered by default.

223 changes: 141 additions & 82 deletions opentype-shaping-gurmukhi.md

Large diffs are not rendered by default.

116 changes: 71 additions & 45 deletions opentype-shaping-indic-general.md
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,11 @@ consonants. Some of these substitutions create **above-base** or
**below-base** forms. The **Reph** form of the consonant "Ra" is an
example.

Syllables may also begin with an **indepedent vowel** instead of a
consonant. In these syllables, the independent vowel is rendered in
full-letter form, not as a matra, and the independent vowel serves as the
syllable base, similar to a base consonant.

Where possible, using the standard terminology is preferred, as the
use of a language-specific term necessitates choosing one language
over all of the others that share a common script.
Expand Down Expand Up @@ -219,12 +224,12 @@ the shaping process.

There are four basic _mark-placement subclasses_ for dependent vowels
(matras). Each corresponds to the visual position of the matra with
respect to the base consonant to which it is attached:
respect to the syllable base to which it is attached:

- `LEFT_POSITION` matras are positioned to the left of the base consonant.
- `RIGHT_POSITION` matras are positioned to the right of the base consonant.
- `TOP_POSITION` matras are positioned above the base consonant.
- `BOTTOM_POSITION` matras are positioned below base consonant.
- `LEFT_POSITION` matras are positioned to the left of the syllable base.
- `RIGHT_POSITION` matras are positioned to the right of the syllable base.
- `TOP_POSITION` matras are positioned above the syllable base.
- `BOTTOM_POSITION` matras are positioned below syllable base.

These positions may also be referred to elsewhere in shaping documents as:

Expand Down Expand Up @@ -327,7 +332,7 @@ syllable are reordered during the shaping process,
POS_PREBASE_MATRA
POS_PREBASE_CONSONANT

POS_BASE_CONSONANT
POS_SYLLABLE_BASE
POS_AFTER_MAIN

POS_ABOVEBASE_CONSONANT
Expand All @@ -350,17 +355,17 @@ could reduce an implementation's code size and complexity.

The basic positions (left to right) are "Reph" (`POS_RA_TO_BECOME_REPH`), dependent
vowels (matras) and consonants positioned before the base
consonant (`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base
consonant (`POS_BASE_CONSONANT`), above-base consonants
consonant or syllable base (`POS_PREBASE_MATRA` and `POS_PREBASE_CONSONANT`), the base
consonant or syllable base (`POS_SYLLABLE_BASE`), above-base consonants
(`POS_ABOVEBASE_CONSONANT`), below-base consonants
(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base consonant
(`POS_BELOWBASE_CONSONANT`), consonants positioned after the base consonant or syllable base
(`POS_POSTBASE_CONSONANT`), syllable-final consonants (`POS_FINAL_CONSONANT`),
and syllable-modifying or Vedic signs (`POS_SMVD`).

In addition, several secondary positions are defined to handle various
reordering rules that deal with relative, rather than absolute,
positioning. `POS_AFTER_MAIN` means that a character must be
positioned immediately after the base consonant. `POS_BEFORE_SUBJOINED`
positioned immediately after the base consonant or syllable base. `POS_BEFORE_SUBJOINED`
and `POS_AFTER_SUBJOINED` mean that a character must be positioned
before or after any below-base consonants, respectively. Similarly,
`POS_BEFORE_POST` and `POS_AFTER_POST` mean that a character must be
Expand Down Expand Up @@ -388,8 +393,8 @@ track. These include:
- How "Reph" is encoded or requested in a syllable.

- Whether the below-base forms feature is applied only to consonants
after the base consonant, or to consonants before the base
consonant and those after the base consonant.
after the base consonant or syllable base, or to consonants before the base
consonant and those after the base consonant or syllable base.

- The ordering positions for dependent vowels
(matras). Specifically, the ordering for left-side, right-side,
Expand Down Expand Up @@ -493,7 +498,7 @@ table:
- immediately before the first subjoined (below-base) consonant, in
the ordering position `POS_BEFORE_SUBJOINED`.

- immediately after the base consonant, in the ordering position `POS_AFTER_MAIN`.
- immediately after the base consonant or syllable base, in the ordering position `POS_AFTER_MAIN`.

- immediately after the last subjoined (below-base) consonant, in
the ordering position `POS_AFTER_SUBJOINED`.
Expand Down Expand Up @@ -577,10 +582,10 @@ table:

Below-base consonant forms (the `blwf` feature) may be applied:

- Only to consonants after the base consonant. This is designated
- Only to consonants after the base consonant or syllable base. This is designated
`BLWF_MODE_POST_ONLY`.

- To consonants occurring before or after the base consonant. This is
- To consonants occurring before or after the base consonant or syllable base. This is
designated `BLWF_MODE_PRE_AND_POST`.


Expand Down Expand Up @@ -655,7 +660,7 @@ Above-base matras may be positioned:
- immediately before the first subjoined (below-base) consonant, in
the ordering position `POS_BEFORE_SUBJOINED`.

- immediately after the base consonant, in the ordering position `POS_AFTER_MAIN`.
- immediately after the base consonant or syllable base, in the ordering position `POS_AFTER_MAIN`.

- immediately after the last subjoined (below-base) consonant, in
the ordering position `POS_AFTER_SUBJOINED`.
Expand Down Expand Up @@ -741,8 +746,8 @@ combining with the base consonant (e.g., "_str_", "_pl_") but they
do not add a vowel sound.

Some Indic scripts also include special consonants that can occur after the
base consonant. These post-base consonants and final consonants will
also be separated from the base consonant by a "Halant" mark; the
base consonant or syllable base. These post-base consonants and final consonants will
also be separated from the base consonant or syllable base by a "Halant" mark; the
algorithm for correctly identifying the base consonant includes a test
to recognize these sequences and not mis-identify the base consonant.

Expand Down Expand Up @@ -836,6 +841,12 @@ _other_ = `OTHER` | `NUMBER` | `MODIFYING_LETTER`
> cantillation marks in the Devanagari Extended block do not belong to
> the _consonant_ identification class.
> Note: The _placeholder_ identification class includes codepoints
> that are often used in place of vowels or consonants when a document
> needs to display a matra, mark, or special form in isolation or
> in another context beyond a standard syllable. Examples include
> hyphens and non-breaking spaces.
> Note: The _other_ identification class includes codepoints that
> do not interact with adjacent characters for shaping purposes. Even
> though some of these codepoints (such as `MODIFYING_LETTER`) can
Expand Down Expand Up @@ -983,7 +994,7 @@ The final sort order of the ordering categories should be:
POS_PREBASE_MATRA
POS_PREBASE_CONSONANT

POS_BASE_CONSONANT
POS_SYLLABLE_BASE
POS_AFTER_MAIN

POS_ABOVEBASE_CONSONANT
Expand All @@ -1010,7 +1021,7 @@ the [sort ordering](#sort-ordering) section of this document.
#### 2.1: Base consonant ####

The first step is to determine the base consonant of the syllable, if
there is one, and tag it as `POS_BASE_CONSONANT`.
there is one, and tag it as `POS_SYLLABLE_BASE`.

The algorithm used to find the base consonant varies according to the
base-consonant shaping characteristic of the script.
Expand Down Expand Up @@ -1125,7 +1136,7 @@ matched later in the shaping process.

#### 2.5: Pre-base consonants ####

Fifth, consonants that occur before the base consonant must be tagged
Fifth, consonants that occur before the syllable base must be tagged
with `POS_PREBASE_CONSONANT`. Excluding initial "Ra,Halant" sequences
that will become "Reph"s:

Expand Down Expand Up @@ -1174,33 +1185,40 @@ Marks in the `BINDU`, `VISARGA`, `AVAGRAHA`, `CANTILLATION`,
be tagged with `POS_SMVD`.

All "Nukta"s must be tagged with the same positioning tag as the
preceding consonant.
preceding consonant, independent vowel, placeholder, or dotted circle.

All remaining marks (not in the `POS_SMVD` category and not "Nukta"s)
must be tagged with the same positioning tag as the closest non-mark
character the mark has affinity with, so that they move together
during the sorting step.

There are two possible cases: those marks before the base consonant
and those marks after the base consonant.
There are two possible cases: those marks before the syllable base
and those marks after the syllable base.

1. Initially, all remaining marks should be tagged with the same
positioning tag as the closest preceding consonant.

2. For each consonant after the base consonant (such as post-base
2. For each consonant after the syllable base (such as post-base
consonants, below-base consonants, or final consonants), all
remaining marks located between that current consonant and any
previous consonant should be tagged with the same positioning tag as
the current (later) consonant.

In other words, all consonants preceding the base consonant "own" the
marks that follow them, while all consonants after the base consonant
In other words, all consonants preceding the syllable base "own" the
marks that follow them, while all consonants after the syllable base
"own" the marks that come before them. When a syllable does not have
any consonants after the base consonant, the base consonant should
any consonants after the syllable base, the syllable base should
"own" all the marks that follow it.

With these steps completed, the syllable can be sorted into the final sort order.

<!--- EXCEPTION: Uniscribe does NOT move a halant with a preceding -->
<!--left-matra. HarfBuzz follows suit, for compatibility reasons. --->

<!--- HarfBuzz also tags everything between a post-base consonant or -->
<!--matra and another post-base consonant as belonging to the latter -->
<!--post-base consonant. --->


### 3: Applying the basic substitution features from GSUB ###

Expand Down Expand Up @@ -1243,10 +1261,10 @@ pages for further script-specific information.

The final reordering stage repositions marks, dependent-vowel (matra)
signs, and "Reph" glyphs to the appropriate location with respect to
the base consonant. Because multiple substitutions may have occurred
during the application of the basic-shaping features in the preceding
stage, these repositioning moves could not be performed during the
initial reordering stage.
the base consonant or syllable base. Because multiple substitutions
may have occurred during the application of the basic-shaping features
in the preceding stage, these repositioning moves could not be
performed during the initial reordering stage.

Like the initial reordering stage, the steps involved in this stage
occur on a per-syllable basis.
Expand All @@ -1260,16 +1278,24 @@ because it was almost certainly lost in the preceding GSUB stage.
#### 4.1: Base consonant ####

The final reordering stage, like the initial reordering stage, begins
with determining the base consonant of each syllable, following the
with determining the syllable base of each syllable, following the
same algorithm used in stage 2, step 1.

The codepoint of the underlying base consonant will not change between
the search performed in stage 2, step 1, and the search repeated
here. However, the application of GSUB shaping features in stage 3
means that several ligation and many-to-one substitutions may have
taken place. The final glyph produced by that process may, therefore,
be a conjunct or ligature form — in most cases, such a glyph will not
have an assigned Unicode codepoint.
In a syllable that begins with an independent vowel, the independent
vowel will always serve as the syllable base. In a standalone sequence or
other syllable that begins with a placeholder or a dotted circle, the
placeholder or dotted circle will always serve as the syllable base.

In a syllable that begins with a consonant, the shaping engine must
repeat the base-consonant search algorithm used in stage 2, step 1.

The codepoint of the underlying base consonant or syllable base will
not change between the search performed in stage 2, step 1, and the
search repeated here. However, the application of GSUB shaping
features in stage 3 means that several ligation and many-to-one
substitutions may have taken place. The final glyph produced by that
process may, therefore, be a conjunct or ligature form — in most
cases, such a glyph will not have an assigned Unicode codepoint.

#### 4.2: Pre-base matras ####

Expand All @@ -1286,8 +1312,8 @@ position is defined as:

This means that the matra will move to the right of all explicit
"consonant,Halant" subsequences, but will stop to the left of the base
consonant, all conjuncts or ligatures that contains the base
consonant, and all half forms.
consonant or syllable base, all conjuncts or ligatures that contain
the base consonant or syllable base, and all half forms.

#### 4.3: Reph ####

Expand All @@ -1298,7 +1324,7 @@ position. The correct final position depends on the script's
#### 4.4: Pre-base reordering consonants ####

Any pre-base-reordering consonants must be moved to immediately before
the base consonant.
the base consonant or syllable base.


#### 4.5: Initial matras ####
Expand Down Expand Up @@ -1373,14 +1399,14 @@ The old Indic shaping model also did not recognize the
`BLWF_MODE_PRE_AND_POST` shaping characteristic. Instead, all scripts
were treated as if they followed the `BLWF_MODE_POST_ONLY`
characteristic. In other words, below-base form substitutions were
only applied to consonants after the base consonant.
only applied to consonants after the base consonant or syllable base.

In addition, left-side dependent vowel marks
(matras) were not repositioned during the final reordering
stage. For `<deva>`, `<beng>`, `<gujr>`, `<guru>`, `<knda>`,
`<orya>`, and `<telu>` text, the left-side matra was always positioned
at the beginning of the syllable. For `<mlym>` and `<taml>` text, the
left-side matra was positioned immediately before the base consonant.
left-side matra was positioned immediately before the base consonant or syllable base.


### Advice for handling fonts with old Indic features only ###
Expand Down
Loading

0 comments on commit 7c7f49e

Please sign in to comment.