Skip to content
This repository has been archived by the owner on Nov 2, 2020. It is now read-only.

Localization concept needs improvement #40

Open
tofi86 opened this issue Feb 12, 2017 · 9 comments
Open

Localization concept needs improvement #40

tofi86 opened this issue Feb 12, 2017 · 9 comments
Labels
ISO Schematron Related to the ISO standard

Comments

@tofi86
Copy link
Contributor

tofi86 commented Feb 12, 2017

Hey,

after attending the first ever Schematron Users Meetup at XML Prague this year, I'm thrilled to see that schematron is coming back to life — thanks @rjelliffe, @AndrewSales and @tgraham-antenna for your work!

As a contributor to the EpubCheck project (EPUB validation) and the SQF Schematron QuickFix project, I'd like to open up this issue and start a discussion about improvements to the Schematron localization concepts — or at least for the Skeleton implementation.

The EpubCheck project uses Java properties files for localization, but also has several Schematron checks which cannot be localized at the moment because the official Skeleton implementation used by Jing validator does not support this. There has been discussion about this since October 2014 at issue w3c/epubcheck#474

And more recently, the SQF project struggled with this as well in schematron-quickfix/sqf#1.

Annex G of the ISO Schematron specification defines the use of multilingual Schematron as follows:

Diagnostics in multiple languages may be supported by using a different diagnostic element for each language, with the appropriate xml:lang language attribute, and referencing all the unique identifiers of the diagnostic elements in the diagnostics attribute of the assertion.
Annex G gives a simple example of a multi-lingual schema.

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" xml:lang="en" >
    <sch:title>Example of Multi-Lingual Schema</sch:title>
    <sch:pattern>
        <sch:rule context="dog">
            <sch:assert test="bone" diagnostics="d1 d2">A dog should have a bone.</sch:assert>
        </sch:rule>
    </sch:pattern>
    <sch:diagnostics>
        <sch:diagnostic id="d1" xml:lang="en">A dog should have a bone.</sch:diagnostic>
        <sch:diagnostic id="d2" xml:lang="de">Ein Hund sollte ein Bein haben.</sch:diagnostic>
    </sch:diagnostics>
</sch:schema>

However, this never worked in the original Skeleton implementation, as it would display both messages and not only the one from the current locale.

oXygen XML has implemented a workaround for this issue with tweaking the original Skeleton implementation and only showing the current locale. Possibly they can contribute this change as a PullRequest.

However, there's another shortcoming of the diagnostic based localization concept: the developer has to actively reference every language with a separate ID in the diagnostics attribute, which makes it hard to add new localizations.

At XML prague, Octavian from oXygen XML (@octavianN), Nico from the SQF project (@nkutsche), Patrik (@PStellmann) & Vanessa (@vanessakastmann) from the DITA-SEMIA project and me sat together to discuss the SQF issue schematron-quickfix/sqf#1 but quickly came to the conclusion, that there needs to be made improvements to the localization support in the Schematron standard or the Skeleton implementation in order to properly resolve issues like the EpubCheck or SQF one.

We discussed the following solutions which I want to outline here as a discussion basis. You should also know, that we discussed this with the usecase of externalizing the messages to separate files (e.g. fro Translation Memory Systems) in mind.

Solution 1: Fix the Skeleton

The Skeleton should be fixed to at least support the Annex G example properly: Only output the message in the current locale and not ALL diagnostic elements.

Solution 2: Remove ID/IDREF constraint from Schematron schema

This is more like a long-term solution as the standardized schema would need to be changed.

What we like to achieve is something like this:

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" xml:lang="en" >
    <sch:title>Example of Multi-Lingual Schema</sch:title>
    <sch:pattern>
        <sch:rule context="dog">
            <sch:assert test="bone" diagnostics="d1">(Optional) Fallback message.</sch:assert>
        </sch:rule>
    </sch:pattern>
    <sch:diagnostics>
        <sch:diagnostic id="d1" xml:lang="en">English message.</sch:diagnostic>
        <sch:diagnostic id="d1" xml:lang="de">German message.</sch:diagnostic>
    </sch:diagnostics>
</sch:schema>
  1. Only reference the message ID (which isn't of datatype ID anymore) once and let the Skeleton or any other implementation choose the proper diagnostic element.
  2. Schematron rule: Enforce the xml:lang attribute with different values when two or more diagnostic elements with the same id are present.

Current status: This does not validate because of the ID/IDREF datatypes.

Solution 3a: Do it the Java way (hacky)

In Java you just reference messages.properties file and the PropertyReader implementation takes care of resolving the current Locale. In a german environment for xample, Java would try and look for messages_de.properties automatically, although this file isn't referenced in the Java class.

Schematron could do this as follows:

dog.sch

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" xml:lang="en" >
    <sch:title>Example of Multi-Lingual Schema</sch:title>
    <sch:pattern>
        <sch:rule context="dog">
            <sch:assert test="bone" diagnostics="d1">(Optional) Fallback message.</sch:assert>
        </sch:rule>
    </sch:pattern>
    <sch:include href="messages.sch"/>
</sch:schema>

messages.sch:

<sch:diagnostics xml:lang="en">
    <sch:diagnostic id="d1">A dog should have a bone.</sch:diagnostic>
</sch:diagnostics>

messages_de.sch:

<sch:diagnostics xml:lang="de">
    <sch:diagnostic id="d1">Ein Hund sollte ein Bein haben.</sch:diagnostic>
</sch:diagnostics>
  1. The Skeleton would need to be changed to look for {include}_{locale}.sch everytime it resolves an include.
  2. That's a bit hacky

Current status: dog.sch would validate without errors, but some of our group had reservations because of the misuse of the include element and also because the german message file messages_de.sch isn't referenced anywhere within the SCH. Personally(!) I could live well with the last one, as it's Java style...

Solution 3b: Do it the Java way (properly)

To address the issue about misusing the include element from solution 3a, I'd like to introduce either a new element for message file references:

<sch:messages href="messages.sch"/>

which would require a diagnostics root element

or at least an additional attribute on the include element:

<sch:include href="messages.sch" type="localization"/>

which would advise Skeleton and any other implementation to look for localized files as well (in the Java form of {include}_{locale}.sch).

Solution 4: Work with business rules for the referenced id's

In my personal opinion this can't be more than a temporary hack, but it was heavily discussed in the group:

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" xml:lang="en" >
    <sch:title>Example of Multi-Lingual Schema</sch:title>
    <sch:pattern>
        <sch:rule context="dog">
            <sch:assert test="bone" diagnostics="d1">(Optional) Fallback message.</sch:assert>
        </sch:rule>
    </sch:pattern>
    <sch:diagnostics>
        <sch:diagnostic id="d1">English message.</sch:diagnostic>
        <sch:diagnostic id="d1_de">German message.</sch:diagnostic>
    </sch:diagnostics>
</sch:schema>
  1. The Skeleton would need to be changed to look for an ID {id}_{locale} diagnostic element if the current locale does not match xml:lang on the root element.
  2. That's more than hacky

Current status: The schematron would validate well.


I layed out different solutions we discussed at our SQF meeting and the more I think about it, the better It would have been to discuss this two days earlier on the Schematron Users Meetup... Anyways...

This should only be a basis for further ongoing discussion and I hope I could make my point why we need improvements to either the standard or the Skeleton.

Kind regards,
Tobias

on behalf of Octavian, Nico, Patrik and Vanessa

@tofi86
Copy link
Contributor Author

tofi86 commented Feb 12, 2017

P.S.: I just wrote this down from the top of my head after a long Prague weekend, so I hope I haven't forgotten something. Octavian, Nico, Patrik, Vanessa, please add to the discussion If I missed something!

@rjelliffe
Copy link
Member

rjelliffe commented Feb 13, 2017 via email

@tofi86
Copy link
Contributor Author

tofi86 commented Feb 17, 2017

Yes, schematron needs to select the correct language for diagnostics. If there is a bug, i wil gix it.

Yeah, at the moment, the default skeleton is not picking up the xml:lang attribute.

Probably Octavian from oXygen XML (@octavianN) is willing to contribute their fixed version?

I think the hardest part would probably be to get the fixed version into third party tools like Jing...

Allow {} like {concat ('file://xxxx/diagnostics_', $lang, '.sch')}

That's also a nice idea of dynamically referencing the external language files.

@tgraham-antenna
Copy link
Member

tgraham-antenna commented Feb 17, 2017 via email

@georgebina
Copy link

Just to clarify, Jing uses inside also a similar approach to Skeleton, but it is a different implementation. Also, there is support only for pre-ISO Schematron, the support for ISO Schematron is very limited, I just enabled that by supporting the new namespace but there is nothing implemented in terms of ISO specific functionality.

@georgebina
Copy link

The oXygen implementation is available under oXygen/frameworks/schematron/impl/ with the same license as the skeleton - it is a fork we made many years ago, so you can surely get whatever update we made back into the skeleton implementation.

@octavianN
Copy link

I added a pull request with the multilingual support that we have in oXygen, based on diagnostics.
The messages are generated automatically in the language specified by the "langCode"e parameter. If there are no messages in the language specified by the "langCode" parameter, all the messages will be generated prefixed by the language.

@tofi86
Copy link
Contributor Author

tofi86 commented Mar 31, 2018

PR #63

Awesome, thanks Octavian! 👍 Looking forward to see this merged!

@dmj
Copy link
Member

dmj commented Mar 6, 2019

Solution 5

Use one diagnostic per message and wrap localizations in a foreign element with @‍xml:lang.

<sch:schema xmlns:sch="http://purl.oclc.org/dsdl/schematron" xml:lang="en" >
    <sch:title>Example of Multi-Lingual Schema</sch:title>
    <sch:pattern>
        <sch:rule context="dog">
            <sch:assert test="bone" diagnostics="d1">(Optional) Fallback message.</sch:assert>
        </sch:rule>
    </sch:pattern>
    <sch:diagnostics>
        <sch:diagnostic id="d1">
          <p xmlns="http://www.w3.org/1999/xhtml">English message.</p>
          <p xmlns="http://www.w3.org/1999/xhtml" xml:lang="de">German message.</p>
       </sch:diagnostic>
    </sch:diagnostics>
</sch:schema>

@tgraham-antenna tgraham-antenna added the ISO Schematron Related to the ISO standard label Feb 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
ISO Schematron Related to the ISO standard
Projects
None yet
Development

No branches or pull requests

6 participants