Implement legacy path pattern: IETF Internet-Drafts (`bibxml3`, `bibxml-id`) #13

ronaldtse · 2021-11-12T15:36:56Z

IETF Internet-Drafts (bibxml3, bibxml-id)

(previous location) http://xml2rfc.tools.ietf.org/public/rfc/bibxml3/
Pattern 1: http://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.example-name.xml
Pattern 2: http://xml2rfc.ietf.org/public/rfc/bibxml-ids/reference.I-D.draft-example-name-99.xml

Legacy pattern(s) to implement:

Pattern 1: https://{hostname}/public/rfc/bibxml-ids/reference.I-D.{example-name}.xml
Pattern 2: https://{hostname}/public/rfc/bibxml-ids/reference.I-D.draft-{example-name}-{draft-number}.xml

We need to parse the pattern to return the appropriate BibXML content.

The text was updated successfully, but these errors were encountered:

strogonoff · 2021-11-15T18:02:17Z

@ronaldtse where is legacy/XML data repository for bibxml3 in https://github.com/ietf-ribose?

ronaldtse · 2021-11-15T18:20:42Z

@strogonoff the data source is not available yet for bibxml3/bibxml-ids.

#12, #13, #14, #15, #16, #17, #18, #19, #20, #22

ronaldtse · 2021-11-25T10:24:46Z

Legacy path specification described here: ietf-ribose/bibxml-data-ids#1

The bibxml3 endpoint matches URLs this way:
    url(r'^bibxml3/%(name)s(?:-%(rev)s)?.xml$' % settings.URL_REGEXPS, views_doc.document_bibxml),
(though an upcoming release will likely escape the . before xml.)

so, yes, you want to be looking at names that look like https://datatracker.ietf.org/doc/bibxml3/draft-ietf-stir-passport-rcd-09.xml

Once you've generated something with a version, it won't change, but you will also need to be able to respond to version-less requests, such as:
https://datatracker.ietf.org/doc/bibxml3/draft-ietf-stir-passport-rcd.xml

strogonoff · 2021-11-26T09:36:10Z

Note: we don’t parse name or rev from data, the only formatting variable available in legacy path pattern currently is {ref} representing our canonical reference obtained from filename.

Support for more formatting variables will be filed separately.

strogonoff · 2021-11-26T09:47:47Z

@ronaldtse

The first pattern in ticket description does not match the second pattern in your last comment.
If we use the second pattern,
1. I need to know which Relaton fields correspond to “rev” and “name” in this pattern.
  
  We don’t have Relaton data for bibxml-id, but we can use NIST for example: http://34.229.41.119:8000/api/v1/ref/nist/NISTIR_4790/
  
  What is “rev” there?
  
  Note: if “rev” can be missing for some citations, those citations may be inaccessible by their legacy paths.
2. The reference. prefix is shared for all legacy paths. If it shouldn’t be shared for bibxml-id, let me know.

ronaldtse · 2021-11-26T10:12:47Z

The first pattern in ticket description does not match the second pattern in your last comment.

Let me clarify:

Are the legacy paths for the BibXML service, currently defined here:
https://svn.ietf.org/svn/tools/xml2rfc/website/rfcs/bibxml/bibxml-ids/gen-bibxml-ids

This is code from the Datatracker service given by @rjsparks:

url(r'^bibxml3/%(name)s(?:-%(rev)s)?.xml$' % settings.URL_REGEXPS, views_doc.document_bibxml),

The source is:
https://github.com/ietf-svn-conversion/ietfdb-final/blob/c6fc13a38ef66d2c2b6d4931627ffd1cbdb4aa98/ietf/doc/urls.py#L89-L90

The Datatracker service is the "authoritative" endpoint for I-D documents.

strogonoff · 2021-11-26T10:14:02Z

This doesn’t answer which pattern should we match. Is Datatracker another external system we need to support? Do we need to support multiple patterns for different legacy systems? Or is Datatracker of interest to GHA that prepares authoritative data for indexing, and not to this public/legacy API service?
See also note 2ii I edited in on reference. prefix.

ronaldtse · 2021-11-26T10:36:34Z

We should implement the legacy pattern in the original post. The datatracker system is for indexing and updating purpose.
Yes, the "reference." prefix is used for all legacy paths.

strogonoff · 2021-11-26T10:39:36Z

Ah, great… I think that means #28 would be unnecessary so far.

strogonoff · 2021-11-26T11:05:06Z

Although, if filenames in our future bibxml-data-ids dataset don’t contain the “draft” prefix or “draft-number” suffix, the extra flexibility might still be required to support specified path patterns.

ronaldtse · 2021-11-27T01:11:13Z

2. If we use the second pattern, I need to know which Relaton fields correspond to “rev” and “name” in this pattern.
We don’t have Relaton data for bibxml-id, but we can use NIST for example: http://34.229.41.119:8000/api/v1/ref/nist/NISTIR_4790/

The Relaton models for IETF ID and NIST differ a lot. So let's not make that comparison.

strogonoff · 2022-06-25T10:48:51Z

Here is a report for a random subset of 128 paths (out of 90k+ total) under bibxml3:
bibxml3-random-subset.zip

Most paths seem to fall back to original xml2rfc data, others resolve automatically to correct new bibitems in relaton-data-ids in which case XML is different and diffs are available in the report. Diffs seem to be manageable.

Testing all paths would take a while and incur many requests to Datatracker (part of path resolution logic) and xml2rfc tools (for reference comparison), but could be done.

rjsparks · 2022-06-27T15:58:29Z

If needed, we could build a self-contained test instance with all the needed components (dev instance of the datatracker, etc) and do walk of the entire dataset without affecting the production datatracker, and (I assume) not needing significant other external I/O.

strogonoff · 2022-06-28T06:35:36Z

If needed, we could build a self-contained test instance with all the needed components (dev instance of the datatracker, etc) and do walk of the entire dataset without affecting the production datatracker, and (I assume) not needing significant other external I/O.

Absolutely, this could help.

Right now to use a different URL than “https://datatracker.ietf.org” as Datatracker API root requires a change in the code (datatracker.request.BASE_DOMAIN), but it’s straightforward to edit the file before running docker-compose. (I could move this value to configuration or environment if warranted.)

Otherwise there should be no issues. The test script can be passed a local BibXML service instance’s URL:

mkdir -p reports && \
    python test_paths.py \
    http://localhost:8000/public/rfc \
    /path/to/local/bibxml-data-archive \
    --dirname bibxml3 --verbosity 2 --reports-dir reports --randomize

rjsparks · 2022-06-28T20:59:11Z

for the datatracker, you can build a local dev copy quickly. Just clone the datatracker repo and run (cd docker; ./run).
There's more at the github project page.

ronaldtse mentioned this issue Nov 12, 2021

Support legacy path patterns from xml2rfc.tools.ietf.org #11

Open

13 tasks

ronaldtse added this to BibXML Nov 12, 2021

ronaldtse assigned strogonoff and yablokov Nov 12, 2021

strogonoff added a commit that referenced this issue Nov 18, 2021

chore: switch to generic legacy ref handling

e92fdb7

#12, #13, #14, #15, #16, #17, #18, #19, #20, #22

strogonoff referenced this issue Nov 18, 2021

chore: configire bibxml-id as alias to ids

d13e2a6

strogonoff unassigned strogonoff and yablokov Nov 18, 2021

ronaldtse assigned strogonoff Nov 26, 2021

strogonoff mentioned this issue Nov 26, 2021

Support data variables in legacy path patterns #28

Closed

ronaldtse moved this to High priority in BibXML Dec 2, 2021

strogonoff added the xml2rfc Related to APIs for compatibility with xml2rfc consumers label Dec 17, 2021

ronaldtse mentioned this issue Jan 4, 2022

Draft versions need to relate to each other ietf-tools/relaton-data-ids#6

Closed

strogonoff mentioned this issue Mar 29, 2022

Return 404 for versioned I-D xml2rfc-style paths #157

Closed

strogonoff closed this as completed Jun 25, 2022

Repository owner moved this from High priority to Done in BibXML Jun 25, 2022

strogonoff reopened this Jun 28, 2022

Repository owner moved this from Done to In Progress in BibXML Jun 28, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement legacy path pattern: IETF Internet-Drafts (`bibxml3`, `bibxml-id`) #13

Implement legacy path pattern: IETF Internet-Drafts (`bibxml3`, `bibxml-id`) #13

ronaldtse commented Nov 12, 2021 •

edited

Loading

strogonoff commented Nov 15, 2021

ronaldtse commented Nov 15, 2021

ronaldtse commented Nov 25, 2021 •

edited

Loading

strogonoff commented Nov 26, 2021

strogonoff commented Nov 26, 2021 •

edited

Loading

ronaldtse commented Nov 26, 2021

strogonoff commented Nov 26, 2021 •

edited

Loading

ronaldtse commented Nov 26, 2021

strogonoff commented Nov 26, 2021

strogonoff commented Nov 26, 2021 •

edited

Loading

ronaldtse commented Nov 27, 2021 •

edited

Loading

strogonoff commented Jun 25, 2022

rjsparks commented Jun 27, 2022

strogonoff commented Jun 28, 2022 •

edited

Loading

rjsparks commented Jun 28, 2022

Implement legacy path pattern: IETF Internet-Drafts (bibxml3, bibxml-id) #13

Implement legacy path pattern: IETF Internet-Drafts (bibxml3, bibxml-id) #13

Comments

ronaldtse commented Nov 12, 2021 • edited Loading

strogonoff commented Nov 15, 2021

ronaldtse commented Nov 15, 2021

ronaldtse commented Nov 25, 2021 • edited Loading

strogonoff commented Nov 26, 2021

strogonoff commented Nov 26, 2021 • edited Loading

ronaldtse commented Nov 26, 2021

strogonoff commented Nov 26, 2021 • edited Loading

ronaldtse commented Nov 26, 2021

strogonoff commented Nov 26, 2021

strogonoff commented Nov 26, 2021 • edited Loading

ronaldtse commented Nov 27, 2021 • edited Loading

strogonoff commented Jun 25, 2022

rjsparks commented Jun 27, 2022

strogonoff commented Jun 28, 2022 • edited Loading

rjsparks commented Jun 28, 2022

Implement legacy path pattern: IETF Internet-Drafts (`bibxml3`, `bibxml-id`) #13

Implement legacy path pattern: IETF Internet-Drafts (`bibxml3`, `bibxml-id`) #13

ronaldtse commented Nov 12, 2021 •

edited

Loading

ronaldtse commented Nov 25, 2021 •

edited

Loading

strogonoff commented Nov 26, 2021 •

edited

Loading

strogonoff commented Nov 26, 2021 •

edited

Loading

strogonoff commented Nov 26, 2021 •

edited

Loading

ronaldtse commented Nov 27, 2021 •

edited

Loading

strogonoff commented Jun 28, 2022 •

edited

Loading