Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement legacy path pattern: IETF Internet-Drafts (bibxml3, bibxml-id) #13

Open
Tracked by #11
ronaldtse opened this issue Nov 12, 2021 · 15 comments
Open
Tracked by #11
Assignees
Labels
xml2rfc Related to APIs for compatibility with xml2rfc consumers

Comments

@ronaldtse
Copy link
Collaborator

ronaldtse commented Nov 12, 2021

IETF Internet-Drafts (bibxml3, bibxml-id)

Legacy pattern(s) to implement:

  • Pattern 1: https://{hostname}/public/rfc/bibxml-ids/reference.I-D.{example-name}.xml
  • Pattern 2: https://{hostname}/public/rfc/bibxml-ids/reference.I-D.draft-{example-name}-{draft-number}.xml

We need to parse the pattern to return the appropriate BibXML content.

@strogonoff
Copy link
Collaborator

@ronaldtse where is legacy/XML data repository for bibxml3 in https://github.com/ietf-ribose?

@ronaldtse
Copy link
Collaborator Author

@strogonoff the data source is not available yet for bibxml3/bibxml-ids.

@ronaldtse
Copy link
Collaborator Author

ronaldtse commented Nov 25, 2021

Legacy path specification described here: ietf-ribose/bibxml-data-ids#1

The bibxml3 endpoint matches URLs this way:

    url(r'^bibxml3/%(name)s(?:-%(rev)s)?.xml$' % settings.URL_REGEXPS, views_doc.document_bibxml),

(though an upcoming release will likely escape the . before xml.)

so, yes, you want to be looking at names that look like https://datatracker.ietf.org/doc/bibxml3/draft-ietf-stir-passport-rcd-09.xml

Once you've generated something with a version, it won't change, but you will also need to be able to respond to version-less requests, such as:

https://datatracker.ietf.org/doc/bibxml3/draft-ietf-stir-passport-rcd.xml

@strogonoff
Copy link
Collaborator

Note: we don’t parse name or rev from data, the only formatting variable available in legacy path pattern currently is {ref} representing our canonical reference obtained from filename.

Support for more formatting variables will be filed separately.

@strogonoff
Copy link
Collaborator

strogonoff commented Nov 26, 2021

@ronaldtse

  1. The first pattern in ticket description does not match the second pattern in your last comment.

  2. If we use the second pattern,

    1. I need to know which Relaton fields correspond to “rev” and “name” in this pattern.

      We don’t have Relaton data for bibxml-id, but we can use NIST for example: http://34.229.41.119:8000/api/v1/ref/nist/NISTIR_4790/

      What is “rev” there?

      Note: if “rev” can be missing for some citations, those citations may be inaccessible by their legacy paths.

    2. The reference. prefix is shared for all legacy paths. If it shouldn’t be shared for bibxml-id, let me know.

@ronaldtse
Copy link
Collaborator Author

  • The first pattern in ticket description does not match the second pattern in your last comment.

Let me clarify:

Are the legacy paths for the BibXML service, currently defined here:
https://svn.ietf.org/svn/tools/xml2rfc/website/rfcs/bibxml/bibxml-ids/gen-bibxml-ids

This is code from the Datatracker service given by @rjsparks:

url(r'^bibxml3/%(name)s(?:-%(rev)s)?.xml$' % settings.URL_REGEXPS, views_doc.document_bibxml),

The source is:
https://github.com/ietf-svn-conversion/ietfdb-final/blob/c6fc13a38ef66d2c2b6d4931627ffd1cbdb4aa98/ietf/doc/urls.py#L89-L90

The Datatracker service is the "authoritative" endpoint for I-D documents.

@strogonoff
Copy link
Collaborator

strogonoff commented Nov 26, 2021

  1. This doesn’t answer which pattern should we match. Is Datatracker another external system we need to support? Do we need to support multiple patterns for different legacy systems? Or is Datatracker of interest to GHA that prepares authoritative data for indexing, and not to this public/legacy API service?
  2. See also note 2ii I edited in on reference. prefix.

@ronaldtse
Copy link
Collaborator Author

  1. We should implement the legacy pattern in the original post. The datatracker system is for indexing and updating purpose.
  2. Yes, the "reference." prefix is used for all legacy paths.

@strogonoff
Copy link
Collaborator

Ah, great… I think that means #28 would be unnecessary so far.

@strogonoff
Copy link
Collaborator

strogonoff commented Nov 26, 2021

Although, if filenames in our future bibxml-data-ids dataset don’t contain the “draft” prefix or “draft-number” suffix, the extra flexibility might still be required to support specified path patterns.

@ronaldtse
Copy link
Collaborator Author

ronaldtse commented Nov 27, 2021

2. If we use the second pattern, I need to know which Relaton fields correspond to “rev” and “name” in this pattern.
We don’t have Relaton data for bibxml-id, but we can use NIST for example: http://34.229.41.119:8000/api/v1/ref/nist/NISTIR_4790/

The Relaton models for IETF ID and NIST differ a lot. So let's not make that comparison.

@ronaldtse ronaldtse moved this to High priority in BibXML Dec 2, 2021
@strogonoff strogonoff added the xml2rfc Related to APIs for compatibility with xml2rfc consumers label Dec 17, 2021
@strogonoff
Copy link
Collaborator

Here is a report for a random subset of 128 paths (out of 90k+ total) under bibxml3:
bibxml3-random-subset.zip

Most paths seem to fall back to original xml2rfc data, others resolve automatically to correct new bibitems in relaton-data-ids in which case XML is different and diffs are available in the report. Diffs seem to be manageable.

Testing all paths would take a while and incur many requests to Datatracker (part of path resolution logic) and xml2rfc tools (for reference comparison), but could be done.

Repository owner moved this from High priority to Done in BibXML Jun 25, 2022
@rjsparks
Copy link
Member

If needed, we could build a self-contained test instance with all the needed components (dev instance of the datatracker, etc) and do walk of the entire dataset without affecting the production datatracker, and (I assume) not needing significant other external I/O.

@strogonoff
Copy link
Collaborator

strogonoff commented Jun 28, 2022

If needed, we could build a self-contained test instance with all the needed components (dev instance of the datatracker, etc) and do walk of the entire dataset without affecting the production datatracker, and (I assume) not needing significant other external I/O.

Absolutely, this could help.

Right now to use a different URL than “https://datatracker.ietf.org” as Datatracker API root requires a change in the code (datatracker.request.BASE_DOMAIN), but it’s straightforward to edit the file before running docker-compose. (I could move this value to configuration or environment if warranted.)

Otherwise there should be no issues. The test script can be passed a local BibXML service instance’s URL:

mkdir -p reports && \
    python test_paths.py \
    http://localhost:8000/public/rfc \
    /path/to/local/bibxml-data-archive \
    --dirname bibxml3 --verbosity 2 --reports-dir reports --randomize

@strogonoff strogonoff reopened this Jun 28, 2022
Repository owner moved this from Done to In Progress in BibXML Jun 28, 2022
@rjsparks
Copy link
Member

for the datatracker, you can build a local dev copy quickly. Just clone the datatracker repo and run (cd docker; ./run).
There's more at the github project page.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
xml2rfc Related to APIs for compatibility with xml2rfc consumers
Projects
None yet
Development

No branches or pull requests

4 participants