Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How is te-kerende used? #28

Open
r12a opened this issue Mar 13, 2023 · 22 comments
Open

How is te-kerende used? #28

r12a opened this issue Mar 13, 2023 · 22 comments
Labels
i:segmentation Grapheme/word segmentation & selection l:nqo N'Ko script & language question Further information is requested s:nkoo

Comments

@r12a
Copy link
Contributor

r12a commented Mar 13, 2023

The Unicode Proposal for inclusion of te-kerende describes it as:

a character used to link compounds together.

and gives the following examples:

Screenshot 2023-03-13 at 13 42 16

It has been suggested that 'link compounds together' is not related to compound nouns, but is rather a special kind of distributional construction that N’ko authors sometimes mark this way. Can anyone explain this usage in a little more detail or provide me with some better wording for the lreq doc?

@r12a r12a added question Further information is requested s:nkoo i:words labels Mar 13, 2023
@donaldsoncd
Copy link

donaldsoncd commented Mar 14, 2023

I was the one that brought this up.

I don't know how you should word things in the document, but these constructions are not Manding "compounds" in a linguistic sense.

Do you want an linguistic explanation of the the "distributive construction" or are you looking for an explanation of the orthographic convention used to represent it?

In Latin-based Bambara, there is no special way to mark a distributive construction like there is in the N'ko based tradition. For instance, in the Latin-based tradition:

ko o ko

'each and every affair' [NOTE: You glossed it differently in your examples image]

But in the N'ko orthographic tradition it would be done like this:

ߞߏ_ߏ_ߞߏ߫
{Kó ̀_ó ̀_kó}
Ko o ko
'each and every affair'

[NOTE: I also see that your vowel length isn't right on mɔɔ ɔ mɔɔ. You also don't include tonal diacritics. Not sure what your convention for transliterating or interpreting N'ko in Latin-based orthography is in the document.]

@r12a
Copy link
Contributor Author

r12a commented Mar 14, 2023

[Just to be clear, the image containing examples above is screen snapped from the Unicode proposal for adding a character to the N'Ko block. The transcriptions are nothing to do with me.]

Looks like i did attempt some transcriptions at https://r12a.github.io/scripts/nkoo/nqo.html#word though (which may indeed need to be changed - i don't remember the source of those). In those notes i assume that one is supposed to use U+07FA NKO LAJANYALAN to represent the te-kerende. Any thoughts on that?

@donaldsoncd do you have a pointer to an explanation of these kinds of linguistic device are used? Searching doesn't seem to yield anything useful. I think i get the general idea, but is it just used for a handful of words, or can one generate one's own te-kerende linked sequences?

@DD-fwd
Copy link

DD-fwd commented Mar 15, 2023

I will love to hear @donaldsoncd's response. I am commenting on @r12a's question about "... it is just used for a handful of words, or ...?" I think this is a common expression in Mandin languages to express the individuality, repetitiveness, or infinity of related action. For example: su-u-su 'every night' in your posting can also be expressed for soma-a-soma (soma soma) 'every morning', tele-e-tele (tele tele) 'every afternoon', wura-a-wura (wura wura) 'every evening'. Same thing is possible for mo-o-mo 'everyone', ke-e-ke (ke ke) 'every man', moso-o-moso (moso moso) 'every woman'; you get the idea.

@donaldsoncd
Copy link

In Bambara and Jula, the distributive construction is built by inserting an o between two nouns like in all the examples given. It is infinitely productive. You can do it with any noun and it then means 'each/every/any X' depending on the context. For instance:

Cɛ o cɛ

'Each and every man'

Baara o baara

'Any (line of) work'

In some varieties of Manding (and in N'ko orthography), the vowel that is o in the first above example actually changes to harmonize with the noun (in other cases it basically is elided, but its tone [which I have ignored here both in terms of writing and its role in the grammatical construction itself] remains and influences the tonal realization of the two nouns). That is is why we have @DD-fwd's examples:

kɛ o kɛkɛ ɛ kɛkɛ-kɛ
man DIST man

'each/every man'

In N'ko orthography the convention is to always write this grammatical construction with vowel harmony option (as well as the appropriate tonal diacritics since they play a role as well) PLUS the te-kerende underscore line.

For more details on this construction across Manding varieties, you could consult linguistic reference grammars such as:

  • Creissels, Denis. 2009. Le Malinké de Kita: Un Parler Mandingue de l’ouest Du Mali. Mande Langues and Linguistics 9. Cologne, Germany: Rüdiger Köppe Verlag.
  • Dumestre, Gérard. 2003. Grammaire fondamentale du bambara. Paris, France: Karthala.
  • Vydrin, Vydrin. 2019. Cours de grammaire bambara. Paris: Presses de l’INALCO.

@r12a
Copy link
Contributor Author

r12a commented Mar 15, 2023

Very helpful, @donaldsoncd. So would you agree that this is written using U+07FA NKO LAJANYALAN?

@NeilSureshPatel
Copy link

As far as I can tell, the te-kerende was never encoded. The lajanyalan is different since it connects to the letters on both sides. The te-kerende should not connect to the letters.

@r12a
Copy link
Contributor Author

r12a commented Mar 15, 2023

@NeilSureshPatel i concur about no encoding for a separate lajanyalan, but i didn't find any rationale, or indication of what should be used. I'll see whether i can get some enlightenment from the Unicode Editorial folks on Thursday.

In my examples i've been using lajanyalan surrounded by spaces to create the appearance.

@NeilSureshPatel
Copy link

@r12a I was just submitting an issue on the Noto N'ko repo and I saw this other issue. notofonts/nko#5
that may hint at why the te-kerende wasn't encoded.

Part way down Denis says the following:
"A resolution would be to add contextual positioning when 07FD NKO DANTAYALAN is next to a U+07F8 NKO COMMA, U+2010 HYPHEN, U+2011 NON-BREAKING HYPHEN. Note: U+2010 HYPHEN and U+2011 NON-BREAKING HYPHEN sit on the baseline in NKo, they need to be added to Noto Sans NKo."

This seems to suggest that the plan for N'ko is to use standard hyphens that are moved to the baseline. This seems a bit odd though.

@donaldsoncd
Copy link

Very helpful, @donaldsoncd. So would you agree that this is written using U+07FA NKO LAJANYALAN?

I know nothing about the encoding of this. I just do Latin underscores if I have to write it.

@r12a
Copy link
Contributor Author

r12a commented Mar 17, 2023

Debbie Anderson pointed me to a discussion at the UTC in 2016. See point 11 at https://www.unicode.org/L2/L2016/16037-script-rec.pdf.

The first character that is proposed, TE‐KERENDE, can be represented using U+2010 HYPHEN or U+2011 NON‐BREAKING HYPHEN, but would need to be designed in a font on the baseline. Note that U+2010 HYPHEN is used in such a way in Arabic text. The other three characters are well‐documented and straight‐forward.

This may be the source of the comments by @moyogo. I also think it's a bit odd. I took a look at the few resources i have to hand that provide selectable online text and found the following.

Wikipedia uses hyphen and lajanyalan on the same page, where the former look like ordinary hyphens (mid height, and no spacing), while the latter (surrounded by spaces) is used for what look like te-kerende. eg.
ߖߌ߰ ߺ ߡߊ߬ ߺ ߖߊ߲߬ߝߊ߬ߓߊ߫
ߊߟߏ، ߋ-ߖߘߍ߬ߘߊ߲ߘߊ،ߌ-
So it may be important to not make hyphens drop to the baseline and grow in size. As long as lajanyalan is surrounded by spaces (which appears to be the expected use), it seems to work more intuitively, visually.

Silabosoona at http://cormand.huma-num.fr/maninkabiblio/periodiques/silabosoona5.pdf also uses lajanyalan for te-kerende, but also for general phrase separators, eg.

ߊ߭ߜߊߘߡߝ ߺ ߋ ߺ ߋ߫ߝ߸ߋ߫ߦ
ߏߟߍߘ ߍߞ ߺ ߂߂ ߀߂

@NeilSureshPatel
Copy link

This certainly is a bit messy. The standard mid height hyphen is used with numbers. This can be seen on page 1 of Silabosoona.

߆߂߁-߇߄-߀߃-߀߀

My guess is that the regular hyphen used in text on Wikipedia is more of a workaround rather than preference. It is intuitive to use it since it used in the Latin orthography, whereas typing spaces around a lajanyalan is less convenient.

The use of the lajanyalan with spaces does come with other problems. The lajanyalan is really wide compared to a te-kerende. This is exacerbated by the fact that is has negative side bearings for its normal joining behavior. When you add spaces the extra length becomes exposed. The other problem is that the parts of the lajanyalan that overlap with adjacent letters may not have square edges. This varies by font but there are times the bottoms need to be curved or chamfered so that it doesn't punch though the join between an adjacent letter and its baseline stroke. This can be more extreme if any negative kerning is used. For example:

image

If the edge were squared off the corner makes the join not smooth.

image

It subtle in this example. If one were to apply effects, like outlined text, etc this could become more obvious and problematic.

image

Without separate encoding, I think the best way to handle this is to have an alternate N'ko hyphen that is pushed down to the baseline which is replaced contextually (when nested between or following N'ko letters) in the font via rclt. This way the presentation can be controlled (squared edges, positive side bearings, narrower width, etc). If a font fails to do this you end up with a standard hyphen, without having to change the way the text is input. From what I recall, rclt works for N'ko shaping in all shaping engines. This is how I would be inclined to handle it anyway.

@r12a
Copy link
Contributor Author

r12a commented Mar 17, 2023

Hmm. Another part of the messiness is whether or not other people will be inclined to use the hyphen with the expectation that it will magically change position and shape in the required contexts, or will they (as they seem to be doing) just go for the thing that looks to them as if it's what they want to see on the page (ie. the lajanyalan). I looked at a number of other online resources, and those that contain te-kerende and dashes that separate phrases all use lajanyalan, so it seems it may have already become the de facto way of doing this.

I wonder whether it makes sense to do the opposite of what you're suggesting @NeilSureshPatel: ie. to fix the font so that the lajanyalan is the right width and has the right shaping when it appears between spaces. This may be an easier context to detect, given that spaces are, it seems, always present, and the joining behaviour is not relevant if spaces are on either side?

@NeilSureshPatel
Copy link

NeilSureshPatel commented Mar 17, 2023

Ahh, yes good point @r12a. I guess once a workaround gets normalized we kind of have to work with it. I can see what you are suggesting working. The lajanyalan can be narrowed, squared off and have zero or near zero sidebearings. When strung together for justification it should still make a solid line.

A related approach is to take advantage of the fact the lajanyalan can have positional forms. Therefore, the isolated form can be more tuned for use as a te-kerende and then the positional forms can have more flexibility in design depending on the font. Spaces will break the shaping and default to the isolated form as you say. A thin space would be ideal over a word space but this can be handled with in a handful of different ways.

@jfkthame
Copy link

jfkthame commented May 3, 2023

One issue with using lajanyalan surrounded by spaces is that this will tend to allow a line-break to happen either side of it, whereas my understanding is that if a line-break is needed, it should always occur after the te-kerende. In theory, if the preceding space were a non-breaking space, that wouldn't be a problem, but in practice users will inevitably type normal spaces most of the time.

@r12a
Copy link
Contributor Author

r12a commented May 15, 2023

I brought this up with the Script AdHoc (SAH) Unicode committee and consensus was reached that it is ok to use lajanyalan for te-kerende and certain other hyphen-like uses where the glyph is expected to look like a baseline extension surrounded by spaces.

@NeilSureshPatel
Copy link

Thanks for the update @r12a. I was curious to know where the discussion landed on the matter. What did the SAH say about the line-breaking concern that @jfkthame brought up? I think from a font production standpoint, I would still substitute a lajanyalan nested between spaces with an alternate form just to make it narrower and remove any modeling of the overlapping parts of the stroke.

@r12a
Copy link
Contributor Author

r12a commented May 16, 2023

@NeilSureshPatel The line-breaking discussion was put off for another day. A proposal would need to be submitted. Personally, i'm not so worried about that – just as with dashes in English, such as the one i just typed, people can use a nbsp if needed. I think the problem of handling line breaks around punctuation that is separated from the preceding text is a lot bigger than just N'Ko (think of dandas, French question marks, Mongolian commas, etc. etc.) and may need a more generalised solution.

I think that the proposal to shape the lajanyalan appropriately makes sense. I was planning to raise an issue in the Noto repo – would you prefer to do that? (You're better qualified than me to put the right points.)

Btw, i'm about to raise then close a gap report about this in our gap analysis framework, so that we can make the progress visible.

@NeilSureshPatel
Copy link

That makes sense, thanks. I'll take a look at the Noto design again to see if it would need to be adjusted and how. Noto uses very simple connections so it may only need a width adjustment. I'll raise an issue in the repo with the proper recommendation.

@jfkthame
Copy link

jfkthame commented May 16, 2023

@NeilSureshPatel The line-breaking discussion was put off for another day. A proposal would need to be submitted. Personally, i'm not so worried about that – just as with hyphens, such as the one i just typed, people can use a nbsp if needed.

I wasn't part of the background discussion here, so may be missing lots of context. But personally, I think the conclusion is unfortunate, from a serving-the-users point of view.

Judging from the examples in Figure 5 of the Unicode proposal document, I don't think users would perceive the te-kerende as being separated by spaces from the surrounding words, so the natural instinct will be to type it without spaces. When they notice that this produces a joined form (because that's how lajanyalan behaves), they're just as likely to try something else such as a generic HYPHEN-MINUS or LOW LINE as to figure out that they should put spaces each side of it (and depending on the font in use, the result of adding spaces may look so bad — because lajanyalan is too long — that they reject that and go for HYPHEN-MINUS or even borrow the Arabic-script KASHIDA instead).

My suspicion is that "correct" use of <nbsp> <lajanyalan> <space> to represent te-kerende, along with a "smart" font that shapes lajanyalan appropriately for this context, will be an exotic rarity.

(The "hyphen" comparison isn't very persuasive, IMO. I notice that what's actually in your
comment is not a hyphen but an en-dash — perhaps thanks to an autocorrect feature? When a
punctuation dash with surrounding spaces is used in English, breaking the line before the dash
— so that it appears at start-of-line — is much less jarring than breaking before te-kerende
would be. The Latin-script analogue to N'Ko te-kerende would be a hyphen without any
surrounding spaces, which does not permit a preceding break.)

@r12a
Copy link
Contributor Author

r12a commented May 16, 2023

My suspicion is that "correct" use of <nbsp> <lajanyalan> <space> to represent te-kerende, along with a "smart" font that shapes lajanyalan appropriately for this context, will be an exotic rarity.

One of the things driving this discussion was that i looked at a number of online texts to figure out what users do, and they all used <space><lajanyalan><space> for the te-kerende (and for various other hyphen/dash-like places).

It may be better to move this discussion to a separate issue focused on line-breaking for te-kerende.

(in my earlier comment i have just changed 'hyphens' to 'dashes in English', which i intend to cover hyphens and other dashes.)

@r12a
Copy link
Contributor Author

r12a commented May 16, 2023

... even borrow the Arabic-script KASHIDA instead

I'm not sure why they would do that, or why they would try not to use spaces. The lajanyalan is the N'Ko equivalent of the Arabic tatweel (which i assume you mean by kashida). And using it without spaces would immediately produce incorrect results, because (a) it would join with the adjacent characters (as would the tatweel), and (b) it wouldn't produce the gaps either side which always appear with te-kerende. So i don't think that users are likely to omit the spaces. (That said, for fine typography, they may perhaps choose slightly smaller spaces.)

@NeilSureshPatel
Copy link

NeilSureshPatel commented May 16, 2023

These things are always weird. I think if the te-kerende were encoded from the get-go, it would have been used readily. However, without it the most convenient thing to do is <space><lajanyalan><space>, thus making it the typical method. Probably, what should have happened is that the lanjanyalan should not have been encoded, since the tatweel can be used for this purpose. The te-kerende should have been encoded instead.

@r12a r12a added i:segmentation Grapheme/word segmentation & selection l:nqo N'Ko script & language and removed i:words labels May 3, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
i:segmentation Grapheme/word segmentation & selection l:nqo N'Ko script & language question Further information is requested s:nkoo
Projects
None yet
Development

No branches or pull requests

5 participants