Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PROPOSAL: improve encoding of xpath and cardinaly #707

Open
carlwilson opened this issue Sep 7, 2023 · 10 comments
Open

PROPOSAL: improve encoding of xpath and cardinaly #707

carlwilson opened this issue Sep 7, 2023 · 10 comments
Assignees

Comments

@carlwilson
Copy link
Collaborator

Currently, the METS Profiles for the E-ARK specifications utilise some "free form" XHTML fields allowed in requirements descriptions to encode information used in the published specifications. Specifically the XPath expression for the element/attribute and the "Cardinality", i.e. how many occurrences are permissible. These are recorded using adapted dictionary term (<dt>) and dictionary definition (<dd>) pairs so:

<requirement ID="SIP19" REQLEVEL="MAY" EXAMPLES="metsHdrElementExample1 metsHdrAgentExample2">
    <description>
        <head>Submitting agent additional information</head>
        <p xmlns="http://www.w3.org/1999/xhtml">The submitting agent has a note providing a unique identification code for the archival creator.</p>
        <dl xmlns="http://www.w3.org/1999/xhtml">
            <dt>METS XPath</dt><dd>metsHdr/agent/note</dd>
            <dt>Cardinality</dt><dd>0..1</dd>
        </dl>
    </description>
</requirement>

The METS Profile schema makes specific allowances for recording automated tests for a requirement with an XML bias. The tests element is a container for test elements that can be used to record XPath tests, and indeed the Schematron validation rules. Using some of the CSIP elements as examples:

<requirement ID="CSIP10" REQLEVEL="MUST">
    <description>
        <head>Agent</head>
        <p xmlns="http://www.w3.org/1999/xhtml">A mandatory agent element records the software used to create the package. Other uses of agents may be described in any local implementations that extend the profile.</p>
        <dl xmlns="http://www.w3.org/1999/xhtml">
            <dt>METS XPath</dt><dd>mets/metsHdr/agent</dd>
            <dt>Cardinality</dt><dd>1..n</dd>
        </dl>
    </description>
</requirement>
<requirement ID="CSIP11" REQLEVEL="MUST" EXAMPLES="metsHdrElementExample1">
    <description>
        <head>Agent role</head>
        <p xmlns="http://www.w3.org/1999/xhtml">The mandatory agent element MUST have a `@ROLE` attribute with the value “CREATOR”.</p>
        <dl xmlns="http://www.w3.org/1999/xhtml">
            <dt>METS XPath</dt><dd>mets/metsHdr/agent[@ROLE='CREATOR']</dd>
            <dt>Cardinality</dt><dd>1..1</dd>
        </dl>
    </description>
</requirement>

These could be encoded so:

<requirement ID="CSIP10" REQLEVEL="MUST">
    <description>
        <head>Agent</head>
        <p xmlns="http://www.w3.org/1999/xhtml">A mandatory agent element records the software used to create the package. Other uses of agents may be described in any local implementations that extend the profile.</p>
    </description>
    <tests>
        <test ID="TEST10-1" TESTLANGUAGE="XPath" TESTLANGUAGEVERSION="3.1">
            <testWrap>
                <testXML>/mets/metsHdr/agent[@ROLE="CREATOR" and @TYPE='OTHER' and @OTHERTYPE='SOFTWARE']</testXML>
            </testWrap>
        </test>
        <test ID="TEST10-2" TESTLANGUAGE="Schematron" TESTLANGUAGEVERSION="ISO" TESTLANGUAGEURI="http://purl.oclc.org/dsdl/schematron">
            <testWrap>
                <testXML>
                    <iso:rule context="/mets:mets/mets:metsHdr">
                    <iso:assert id="CSIP10" role="ERROR" test="count(mets:agent)&gt;=1">The metsHdr element MUST contain an agent element that records the software used to create the package.</iso:assert>
                    </iso:rule>
                </testXML>
            </testWrap>
        </test>
    </tests>
</requirement>
<requirement ID="CSIP11" REQLEVEL="MUST" EXAMPLES="metsHdrElementExample1">
    <description>
        <head>Agent role</head>
        <p xmlns="http://www.w3.org/1999/xhtml">The mandatory agent element MUST have a `@ROLE` attribute with the value “CREATOR”.</p>
    </description>
    <tests>
        <test ID="TEST11-1" TESTLANGUAGE="XPath" TESTLANGUAGEVERSION="3.1">
            <testWrap>
                <testXML>/mets/metsHdr/agent[@ROLE="CREATOR"]</testXML>
            </testWrap>
        </test>
        <test ID="TEST11-2" TESTLANGUAGE="Schematron" TESTLANGUAGEVERSION="ISO" TESTLANGUAGEURI="http://purl.oclc.org/dsdl/schematron">
            <testWrap>
                <testXML>
                    <iso:rule context="/mets:mets/mets:metsHdr">
                    <iso:assert id="CSIP11" role="ERROR" test="count(mets:agent[@ROLE = 'CREATOR']=1">The agent element MUST have a ROLE attribute with the value "CREATOR".</iso:assert>
                    </iso:rule>
                </testXML>
            </testWrap>
        </test>
    </tests>
</requirement>
@carlwilson
Copy link
Collaborator Author

Note that the actual requirement and cardinality might be more succinctly put as:

  <testXML>
      <iso:rule context="/mets:mets/mets:metsHdr">
      <iso:assert id="CSIP10" role="ERROR" test="count(mets:agent[@ROLE="CREATOR" and @TYPE='OTHER' and @OTHERTYPE='SOFTWARE'])=1">The metsHdr element MUST contain an agent element that records the software used to create the package.</iso:assert>
      </iso:rule>
  </testXML>

This might help avoid the ambiguity expressed in #705 where the note element XPath is explicit

  <requirement ID="CSIP16" REQLEVEL="MUST" RELATEDMAT="VocabularyNoteType" EXAMPLES="metsHdrElementExample1">
      <description>
          <head>Classification of the agent additional information</head>
          <p xmlns="http://www.w3.org/1999/xhtml">The mandatory agent element's note child has a `@csip:NOTETYPE` attribute with a fixed value of "SOFTWARE VERSION".</p>
      </description>
...
        <test ID="TEST16-1" TESTLANGUAGE="XPath" TESTLANGUAGEVERSION="3.1">
            <testWrap>
                <testXML>/mets/metsHdr/agent[@ROLE="CREATOR" and @TYPE='OTHER' and @OTHERTYPE='SOFTWARE']/note[@csip:NOTETYPE='SOFTWARE VERSION']</testXML>
            </testWrap>
        </test>
        <test ID="TEST16-2" TESTLANGUAGE="Schematron" TESTLANGUAGEVERSION="ISO" TESTLANGUAGEURI="http://purl.oclc.org/dsdl/schematron">
            <testWrap>
                <testXML>
                    <iso:rule context="/mets:mets/mets:metsHdr/mets:agent[@ROLE = 'CREATOR' and @TYPE='OTHER' and @OTHERTYPE='SOFTWARE']">
                    <iso:assert id="CSIP16" role="ERROR" test="count(mets:note[@csip:NOTETYPE='SOFTWARE VERSION']=1">The mandatory agent element’s note child has a @csip:NOTETYPE attribute with a fixed value of “SOFTWARE VERSION”.</iso:assert>
                    </iso:rule>
                </testXML>
            </testWrap>
        </test>
...
  </requirement>

@karinbredenberg
Copy link
Contributor

karinbredenberg commented Sep 8, 2023

The issue is going to be discussed by the DILCIS Board

@jmaferreira
Copy link

jmaferreira commented Sep 13, 2023

This issue was discussed on the DILCIS Board (2023-09-13).

It is not clear how to cardinality information will be rendered on the output document. The cardinallity used to be concise and explicitly written on the profile. When we change it to testWrapper it is not clear how the same output (simple and human readible) is going to be produced.

Can we mantain both options in the same requirement element? One would serve the user documentation and the other would serve the automatic validation machine.

@carlwilson Can you provide more information on this?

@stephenmackey
Copy link

I understand the point made by @jmaferreira but from the profile creation and maintenance perspective having two different ways of encoding the same information in a single profile for each requirement is very problematic. I would also like the project managers to comment on the availability of time on each specification for re-working the mets profiles, this seems like a lot of unplanned work.

@jmaferreira
Copy link

I agree... having two ways of encoding the same thing is a risk.
If the testWrapper can be used to produce the previous human readable output, I'm all for it.

@stephenmackey
Copy link

Agreed.

@karinbredenberg
Copy link
Contributor

@carlwilson could you add some clarifications?

@carlwilson
Copy link
Collaborator Author

Hi @jmaferreira you've noticed the possible issue. My belief is that I can derive the cardinality for publication on the website and PDF from the recorded tests. I'll admit I think that there might be an issue or two with this IRL. If that's the case we will retain the cardinality mark-up so that it's explicit. I don't think that changing the form of the XPath and adding the Schematron rules are a problem. @stephenmackey is also working on this.

@karinbredenberg
Copy link
Contributor

karinbredenberg commented Jan 22, 2024

The suggestion is:

  • Update how the cardinality and xpath is stated in the METS profiles
  • Current way of giving cardinality and xpath:
<dl xmlns="http://www.w3.org/1999/xhtml">
  <dt>METS XPath</dt><dd>mets/@OBJID</dd>
  <dt>Cardinality</dt><dd>1..1</dd>
</dl>
  • XPath in updated version:
<test ID="TEST1-1" TESTLANGUAGE="XPath" TESTLANGUAGEVERSION="3.1">
   <testString>/mets:mets/@OBJID</testString>
</test>
  • Cardinality will need some more work and the alternative to use will be decided by the publication process
  • Alternative 1: Cardinality in the form of a test for the validators
<test ID="TEST1-2" TESTLANGUAGE="Schematron" TESTLANGUAGEVERSION="ISO" TESTLANGUAGEURI="http://purl.oclc.org/dsdl/schematron">
                        <testWrap>
                            <testXML>
                                <iso:rule context="/mets:mets">
                                    <iso:assert id="CSIP1" role="ERROR" test="@OBJID">The mets/@OBJID attribute is mandatory, its value is a string identifier for the METS document. For the package METS document, this should be the name/ID of the package, i.e. the name of the package root folder. For a representation level METS document this value records the name/ID of the representation, i.e. the name of the top-level representation folder.</iso:assert>
                                </iso:rule>
                            </testXML>
                        </testWrap>
                    </test>
  • Alternative 2: Keep the current way
<dl xmlns="http://www.w3.org/1999/xhtml">
  <dt>Cardinality</dt><dd>1..1</dd>
</dl>

Board members acknowledgment of the issue:
Tick the box in front of you name to indicate that you have looked at the suggestion.

  • Karin Bredenberg (Kommunalförbundet Sydarkivera, chair)
  • Anders Bo Nielsen (National Archives of Denmark)
  • Anja Paulič (National Archives of Slovenia)
  • Arne-Kristian Groven (National Archives of Norway)
  • Gregor Zavrsnik (Geoarh)
  • Janet Anderson (Highbury Research & Development Ltd.)
  • Maya Bangerter (Swiss Federal Archives)
  • Miguel Ferreira (KEEP Solutions)
  • Stephen Mackey (Penwern Limited)
  • Sven Schlarb (Austrian Institute of Technology)

Voting
(Decision making will be carried out on the basis of majority voting by all eligible members of the Board. In the case of a tied vote, decisions will be made at the discretion of the Chair)

Tick the box in front of you name to say yes to the suggestion.

  • Karin Bredenberg (Kommunalförbundet Sydarkivera, chair)
  • Anders Bo Nielsen (National Archives of Denmark)
  • Anja Paulič (National Archives of Slovenia)
  • Arne-Kristian Groven (National Archives of Norway)
  • Gregor Zavrsnik (Geoarh)
  • Janet Anderson (Highbury Research & Development Ltd.)
  • Maya Bangerter (Swiss Federal Archives)
  • Miguel Ferreira (KEEP Solutions)
  • Stephen Mackey (Penwern Limited)
  • Sven Schlarb (Austrian Institute of Technology)

@karinbredenberg karinbredenberg moved this from In progress to Candidates for voting in E-ARK Specification Updates Jan 22, 2024
@karinbredenberg
Copy link
Contributor

7 DILCIS Board members have acknowledge the issue
7 DILCIS Board members agree with the solution

The suggestion of updated encoding will be part of the next release of the specifications

@karinbredenberg karinbredenberg added this to the CSIP version 2.2 milestone Feb 14, 2024
@karinbredenberg karinbredenberg moved this from Candidates for voting to In progress in E-ARK Specification Updates Feb 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In progress
Development

No branches or pull requests

4 participants