Default script shaping in OpenType

This document details the default shaping procedure needed to display text runs in non-complex scripts. It may also be used as a fallback model for unrecognized scripts.

Table of Contents

General information
Terminology
Glyph classification
Normalization
The default shaping model

General information

The default OpenType shaping model is used for scripts that are considered non-complex from the shaper's perspective. This designation means that shaping a text run does not involve glyph reordering, contextual joining behavior, or the substitution of context-dependent forms for linguistic or orthographic correctness.

Text runs in non-complex scripts may, however, involve ligature substitution, Unicode normalization, mark positioning, kerning, and the application of other features from the active font's GSUB and GPOS tables.

The non-complex scripts covered by this model include Latin, Cyrillic, Greek, Armenian, Georgian, Ethiopic, Cherokee, Tifinagh, and many others.

Terminology

Many of these scripts support diacritics and other marks. Unicode may contain precomposed mark-and-base codepoints for some or all combinations of marks and base letters in the script. For combinations without a codepoint, the desired form can be achieved by following the base letter with a combining mark codepoint.

The primary concern for the shaping engine is processing the text run into the correct normalized form, so that the best glyphs from the active font can be selected from among the available precomposed and combining alternatives.

Fonts for non-complex scripts might not include a GSUB or GPOS table at all.

However, GSUB and GPOS may also be used to implement a variety of OpenType smart features, including several classes of ligature, contextual alternate, or contextual positioning rules. Because these features are not required in order to render the text run orthographically correct, the features are not considered shaping features. Nevertheless, the shaping engine may be expected to apply these features in order to simplify the overall text-rendering architecture of the implementation.

Normalization

Unicode defines algorithms for normalizing a sequence of input codepoints into either a canonical composed form or a canonical decomposed form. The purpose of these algorithms and of the defined normalization forms is to determine equivalent representations of input sequences regardless of variations in the input sequences.

For example, a base letter with an attached mark might exist in Unicode as a single codepoint, but an input sequence might consist of the base letter codepoint followed by the combining mark codepoint. Unicode normalization can be used to determine that the "letter, mark" sequence is equivalent to the single codepoint. This simplifies sorting, searching, string comparison, and many other common tasks.

OpenType shaping utilizes Unicode normalization, but OpenType shaping has a distinctly different goal: to select the best or most appropriate representation of the input codepoint sequence that is available in the active font. A full description of the algorithm is available in the normalization document.

Shaping some complex scripts involves explicit composition or decomposition steps. The default shaping model does not involve any such steps, but it does proceed with the general assumption that text runs have been normalized as part of input sanitization.

For convenience, shaping engines may choose to implement a single normalization routine for all scripts, default and complex. If normalization is done before the shaping-model–specific processing is done, then there may be no work required in certain shaping steps (such as the processing of ccmp substitutions from GSUB). However, these steps will always be described in the relevant script's shaping document.

The default shaping model

Processing a run of text in the default shaping model involves three top-level stages:

Applying the basic substitution features from GSUB
Applying typographic substitution features from GSUB
Applying the positioning features from GPOS

Together, these stages cover the application of all GSUB and GPOS features that are required or that have been defined by OpenType as being on by default.

For convenience, shaping engines may also choose to apply any optional or off-by-default OpenType features that have been activated for the text run (including those that have been enabled by the user and those that have been enabled at the application level). However, the order in which such features should be applied and how they should interact with OpenType shaping features is beyond the scope of this document.

The default shaping model does not involve syllable-identification, word-identification, or other preprocessing of the input sequence. Shaping engines may choose how to segment longer text runs for processing, or may choose to reply on higher-level applications to make segmentation decisions.

1: Applying the basic substitution features from GSUB

The basic-substitution stage applies mandatory substitution features using the rules in the font's GSUB table. In preparation for this stage, glyph sequences should be tagged for possible application of GSUB features.

These substitutions include those features designed to provide linguistic and orthographic correctness.

The order in which these features are applied is not canonical; they should be applied in the order in which they appear in the GSUB table in the font.

locl
ccmp
rlig

The locl feature replaces default glyphs with any language-specific variants, based on examining the language setting of the text run.

Note: Strictly speaking, the use of localized-form substitutions is not part of the shaping process, but of the localization process, and could take place at an earlier point while handling the text run. However, shaping engines are expected to complete the application of the locl feature before applying the subsequent GSUB substitutions in the following steps.

The ccmp feature allows a font to substitute mark-and-base sequences with a pre-composed glyph including the mark and the base, or to substitute a single glyph into an equivalent decomposed sequence of glyphs.

If present, these composition and decomposition substitutions must be performed before applying any other GSUB lookups, because those lookups may be written to match only the ccmp-substituted glyphs.

Note: The ccmp feature may perform compositions or decompositions of glyph sequences that do not have a canonical decomposition defined in Unicode.

The rlig feature substitutes glyph sequences with mandatory ligatures. Substitutions made by rlig cannot be disabled by application-level user interfaces.

2: Applying typographic substitution features from GSUB

The typographic-substitution phase applies all remaining substitution features using the rules in the font's GSUB table. In preparation for this stage, glyph sequences should be tagged for possible application of GSUB features.

These substitutions include those features designed to provide typographic consistency and correctness.

The order in which these features are applied is not canonical; they should be applied in the order in which they appear in the GSUB table in the font.

rclt
calt
clig
liga

The rclt feature substitutes glyphs with contextual alternate forms. In general, the rclt feature is used to perform such substitutions that are required by the orthography of the active script and language. Substitutions made by rclt cannot be disabled by application-level user interfaces.

The calt feature substitutes glyphs with contextual alternate forms. In general, the calt feature performs substitutions that are not mandatory for orthographic correctness. However, unlike rclt, the substitutions made by calt can be disabled by application-level user interfaces.

The clig feature substitutes optional ligatures that are on by default, but which are activated only in certain contexts. Substitutions made by clig may be disabled by application-level user interfaces.

The liga feature substitutes standard, optional ligatures that are on by default. Substitutions made by liga may be disabled by application-level user interfaces.

3: Applying the positioning features from GPOS

The positioning stage adjusts the positions of mark and base glyphs. In preparation for this stage, glyph sequences should be tagged for possible application of GPOS features.

The order in which these features are applied is not canonical; they should be applied in the order in which they appear in the GSUB table in the font.

curs
dist
kern
mark
mkmk

The curs feature perform cursive positioning. Each glyph has an entry point and exit point; the curs feature positions glyphs so that the entry point of the current glyph meets the exit point of the preceding glyph.

The dist feature adjusts the horizontal positioning of glyphs. Unlike kern, adjustments made with dist do not require the application or the user to enable any software kerning features, if such features are optional.

The kern adjusts glyph spacing between pairs of adjacent glyphs.

The mark feature positions marks with respect to base glyphs.

The mkmk feature positions marks with respect to preceding marks, providing proper positioning for sequences of marks that attach to the same base glyph.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

opentype-shaping-default.md

opentype-shaping-default.md

Default script shaping in OpenType

General information

Terminology

Normalization

The default shaping model

1: Applying the basic substitution features from GSUB

2: Applying typographic substitution features from GSUB

3: Applying the positioning features from GPOS

Files

opentype-shaping-default.md

Latest commit

History

opentype-shaping-default.md

File metadata and controls

Default script shaping in OpenType

General information

Terminology

Normalization

The default shaping model

1: Applying the basic substitution features from GSUB

2: Applying typographic substitution features from GSUB

3: Applying the positioning features from GPOS