Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Structured Previews #182

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Abbe98
Copy link
Member

@Abbe98 Abbe98 commented Oct 24, 2024

Abstract

This proposal introduces a JSON-based alternative to the existing HTML previews with the intent to allow clients to control the user experience and presentation of said previews. A secondary intent is to make it possible to utilize previews in environments where HTML rendering might not be convenient (terminals, etc).

Status of this Proposal

This proposal reflects our experimental usage rather than the expected end result. For concerns we already have in mind, see the open questions below. If the group considers this a worthy proposal we intend to keep our experiments and this proposal in sync.

Background and Motivation

We use a few hundred reconciliation services and even though we run all services ourselves using a few shared frameworks the look and feel of the HTML previews have diverted over the years.

In addition to the issue above the client has no control(in a sane and safe way) over the look and functionality of the embedded HTML preview causing user settings related to; keyboard shortcuts, dark mode, font size, etc to differ between the client and the preview.

We can avoid some of these issues as we run all services ourselves and have our own clients, however, new problems arise with third-party clients as we do so. For example, many of our HTML previews now support dark-mode natively but OpenRefine(.org) does not, causing previews to use a different theme. With structured previews, we want to give the power of controlling theming and much more to clients like OpenRefine.

Another lesser use case is our wish to use the information provided by preview endpoints in our CLI, where HTML rendering is not an option.

Upstreaming this extension would allow our public reconciliation services to remain compatible with OpenRefine and other clients even as we deprecate HTML previews.

Open Questions and Known Issues

  • The endpoint/capability is not currently announced by the service manifest.
  • The image property should possibly be media and contain additional information needed to render it safely.
  • One could argue that HTML previews should be deprecated altogether given the burden of sandboxing and CSPs. However, they are powerful and might still have their use cases.
  • We imagine that there is a need for custom properties and that such an extension mechanism should maybe be a part of the specification.
  • JSON schemas are missing.
  • Error responses and error schemas.

Copy link

netlify bot commented Oct 24, 2024

Deploy Preview for reconciliation-api-specs ready!

Name Link
🔨 Latest commit 60befb7
🔍 Latest deploy log https://app.netlify.com/sites/reconciliation-api-specs/deploys/671a8ac70dd9b30008488e7f
😎 Deploy Preview https://deploy-preview-182--reconciliation-api-specs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@thadguidry
Copy link
Contributor

  • The image property should possibly be media and contain additional information needed to render it safely.

Hmm, can you speak more about the additional information and what this might look like? And what safety concerns are you thinking of?

Media is a very wide net, and on the Web, there's been much evolution and will continue to be so. As evidence, in Schema.org we've already experienced several iterations of https://schema.org/MediaObject
I could envision some clients wanting previews of Video, Audio, 3D models, hell, even ASCII Art.

@Abbe98
Copy link
Member Author

Abbe98 commented Oct 25, 2024

Hmm, can you speak more about the additional information and what this might look like? And what safety concerns are you thinking of?

By "safety" I do not mean "securely" here but rather if you can determinate enough about the resource to render it correctly. From a purely web-perspective one should determinate be able to determinate what element that should be used for the browser to display the resource. I imagine by simply providing both the resource URL and a MIME type for that resource.

@thadguidry
Copy link
Contributor

thadguidry commented Oct 31, 2024

I think it makes sense that structured previews would use existing structure standards of the Linked Data web. That way, structured data to power previews could actually come from many systems around the world simply through... a link. Linked Data allows an app to start at one piece of Linked Data and follow and retrieve additional data through embedded links and JSON-LD is a lightweight syntax that allows existing JSON to be interpreted as Linked Data with minimal changes.

We also automatically get strings annotated with their language (we don't have to do backflips).
Any terms are then allowed to simply be mapped to IRI's through the use of @context, if desired.
I could go on and on.

It's also easy enough to generate expanded, compacted, table, etc. forms through existing JSON-LD client libraries.
It's dead simple for preview clients to use existing libraries and show full expanded context, or simply a nice tabular nested record, or anything in-between.

The world of Linked Data then automatically helps with preview generation through a simple...link, not even a query itself has to be formed in many cases. But there are SPARQL endpoints around the world that even do that...generate a JSON-LD serialization for their results.

A Few Examples:

Your example could thus look like this (where I've added context and "sameAs" also as an expanded example).
You can paste this into https://json-ld.org/playground and scroll down view the different previews of it.

JSON-LD (with a Schema.org context):

{
	"@context": "https://schema.org",
        "@id": "http://www.wikidata.org/entity/Q2",
	"name": "Earth",
	"description": "third planet from the Sun in the Solar System",
	"url": "https://www.wikidata.org/wiki/Q2",
	"image": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/The_Blue_Marble_%285052124705%29.jpg/330px-The_Blue_Marble_%285052124705%29.jpg",
	"additionalType": [
	    "http://www.wikidata.org/entity/Q128207",
            "http://www.wikidata.org/entity/Q3504248",
            "terrestrial planet",
            "inner planet of the Solar System"
	],
        "sameAs": "https://science.nasa.gov/earth"
}

@Abbe98
Copy link
Member Author

Abbe98 commented Oct 31, 2024

I couldn't agree more @thadguidry, and JSON-LD based previews would actually make some of our services compatible simply by pointing to URIs of our various data services(example: https://fornpunkt.se/lamning/bjyWKNz.jsonld). Not to mention that the reconciliation specification would benefit of RDFs native extension support, alignment with RDF based service/data discovery vocabularies, native multilingual support, and much more.

However, I wouldn't want to make this proposal dependent on such a larger effort as I see this proposal as it could be be beneficial on its own and the easily merged with a JSON-LD effort as illustrated by your example.

@wetneb
Copy link
Member

wetneb commented Nov 12, 2024

Thanks for this proposal!

Previews are one of the main things I would prioritize improving if I wanted to continue developing my Wikibase reconciliation service, as people have been asking for adding much more data to them. I would essentially add all statements of the Wikibase item, displayed in a more compact way (perhaps with a way to prioritize certain properties if there is a sensible way to do so). For all the corresponding data to fit in a response like the one you're proposing, it would essentially amount to returning all entity data in the response.

So, I see the interest in returning structured data instead of HTML from an API purity point of view, but I wonder what makes those structured previews different from fetching the data of the entity, for instance via content negotiation on the URI or via the data extension API. To me, it's not clear that there is sufficient difference between the two use cases to really justify having a separate API endpoint in the specs for that.

I would also be reluctant to deprecate HTML previews, because of the freedom they give to the service to present its data in a suitable way. The data model of the reconciliation API is really basic, so in a lot of cases services will need to coerce their data into it at a loss. I think those previews are a welcome opportunity for the service to render its data in its "native" form, with the appropriate context and rendering conventions.

@Abbe98
Copy link
Member Author

Abbe98 commented Nov 12, 2024

So, I see the interest in returning structured data instead of HTML from an API purity point of view, but I wonder what makes those structured previews different from fetching the data of the entity, for instance via content negotiation on the URI or via the data extension API. To me, it's not clear that there is sufficient difference between the two use cases to really justify having a separate API endpoint in the specs for that.

I see three challenges using content negotiation on the URI(which to some extent also applies to the data extension API):

  1. It's incredibly hard to render RDF(or JSON) in an user friendly way if the data model is not know by the client, even if you resolve contexts and have your client ingest the most common vocabularies. You bring up the tricky question of the importance/ranking of statements but beyond that even knowing the predicates is hard. Imagine you want to render schema:image as an image, first you are very lucky to work with a ontology your client is aware of, now other problems start; that schema:image predicate can either point to schema:ImageObject node, an URI resolving to a schema:ImageObject node or a plain URL pointing to an image. The common solution to this is to define profiles of data models, at which point we are back to this proposal and future work on defining a schema.org subset for use in a preview-service.

  2. Not all services are RDF enabled, CSVs, OpenRefine projects, some third-party service you don't control, etc.

  3. The majority of reconciliation services in the wild are third-party ones. If you add a dependency on URIs you end up in a situation in which you need to proxy through fake(third-party) URIs(which would bring you back to this proposal). Consider for example the GeoNames service over at fornpunkt.se, even if a client would support previewing the geonames.org we would need to setup our own "fake" URIs not to rely on geonames.org, or if we would be okay with relying on geonames.org we would need to open up our CSP to geonames.org.

I would also be reluctant to deprecate HTML previews, because of the freedom they give to the service to present its data in a suitable way. The data model of the reconciliation API is really basic, so in a lot of cases services will need to coerce their data into it at a loss. I think those previews are a welcome opportunity for the service to render its data in its "native" form, with the appropriate context and rendering conventions.

I completely agree and I argue for them not to be deprecated :-)

@acka47
Copy link
Member

acka47 commented Nov 22, 2024

We discussed this in the last meeting and I realized I have some reservations with regard to standardizing structured previews:

  1. The purpose of the entity information in the structured previews is to enable identifying the right match in a set of more or less similar entities,
  2. Thus, I need those properties that actually help for identification.
  3. Depending on the type of entities a reconciliation service is providing, the properties needed for identification differ, e.g. birth/death dates and places for persons, geo information for places etc.

Presumed these three statements, it probably is not possible and does not make sense to standardize identifying properties for preview on protocol level.

@thadguidry
Copy link
Contributor

@acka47 exactly, don't standardize at the protocol level, but instead around the basic structure of a preview, as @Abbe98 and I are encouraging. That way, any service can throw into the structured preview, any and as many "disambiguating properties" that they want clients to help disambiguate between entities.

@Abbe98
Copy link
Member Author

Abbe98 commented Nov 22, 2024

We discussed this in the last meeting and I realized I have some reservations with regard to standardizing structured previews:

1. The purpose of the entity information in the structured previews is to enable identifying the right match in a set of more or less similar entities,

2. Thus, I need those properties that actually help for identification.

3. Depending on the type of entities a reconciliation service is providing, the properties needed for identification differ, e.g. birth/death dates and places  for persons, geo information for places etc.

Presumed these three statements, it probably is not possible and does not make sense to standardize identifying properties for preview on protocol level.

I think that the off-topic discussion about JSON-LD might have lead to some confusion because this is exactly what this proposal addresses(and I properly contributed to that confusion).

A preview "description", "title", etc is not conceptually the same as that of an entity. Consider a location as an entity ; the entity would have properties about the name of the location, a higher level administrative area etc. The preview service would pull several layers of administrative areas from various entities into the description to help with identification, it might also provide a dynamically generated map-thumbnail.

Given the negative summary form the meeting I wonder if the actual use cases this proposal addresses were discussed and if the group considers any alternative approaches to solve them?

@fsteeg
Copy link
Member

fsteeg commented Nov 22, 2024

Given the negative summary form the meeting I wonder if the actual use cases this proposal addresses were discussed and if the group considers any alternative approaches to solve them?

I think the basic idea in the meeting was that such structured previews would be a kind of entity data access, for which don't have an API yet, but which might make sense to add. The discussion started like this: "we have no access to the content of an entity, like data extension, but without having to ask for specific properties", the action item is "comment about possible entity API". (The discussion moved into ways of configuring such an API, which perhaps moved the original "without having to ask for specific properties" part into the background.) So from my understanding, a possible solution would be to add a new kind of entity API, which in the simple, no-config case would return data suitable for previews?

Since it's been a bit hidden in the discussion here (it's in the diff only), this is the example from the original proposal (by @Abbe98):

{
  "id": "http://www.wikidata.org/entity/Q2",
  "name": "Earth",
  "description": "third planet from the Sun in the Solar System",
  "url": "https://www.wikidata.org/wiki/Q2",
  "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/The_Blue_Marble_%285052124705%29.jpg/330px-The_Blue_Marble_%285052124705%29.jpg",
  "tags": [
    ["terrestrial planet", "http://www.wikidata.org/entity/Q128207"],
    ["inner planet of the Solar System", "http://www.wikidata.org/entity/Q3504248"]
  ]
}

If we remove the url (available via view.url), and replace the tags with type, we only have the additional image diverging from what an entity is according to the spec:

{
  "id": "http://www.wikidata.org/entity/Q2",
  "name": "Earth",
  "description": "third planet from the Sun in the Solar System",
  "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/The_Blue_Marble_%285052124705%29.jpg/330px-The_Blue_Marble_%285052124705%29.jpg",
  "type": [
    {"name": "terrestrial planet", "id": "http://www.wikidata.org/entity/Q128207"},
    {"name": "inner planet of the Solar System", "id": "http://www.wikidata.org/entity/Q3504248"}
  ]
}

This could then be returned e.g. for a simple (no-config) GET /entity/{id}.

When requesting specific properties, these could be added in an additional properties field (basically a non-batch data extension API):

{
  "id": "http://www.wikidata.org/entity/Q2",
  "name": "Earth",
  "description": "third planet from the Sun in the Solar System",
  "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/The_Blue_Marble_%285052124705%29.jpg/330px-The_Blue_Marble_%285052124705%29.jpg",
  "type": [
    {"name": "terrestrial planet", "id": "http://www.wikidata.org/entity/Q128207"},
    {"name": "inner planet of the Solar System", "id": "http://www.wikidata.org/entity/Q3504248"}
  ],
  "properties": [{
    "id": "P138",
    "name": "named after",
    "values": [
      {"name": "soil", "id": "Q36133"},
      {"name": "land", "id": "Q11081619"},
      {"name": "ball", "id": "Q838611"}
    ]
  }]
}

@Abbe98
Copy link
Member Author

Abbe98 commented Nov 22, 2024

I think the basic idea in the meeting was that such structured previews would be a kind of entity data access, for which don't have an API yet, but which might make sense to add. The discussion started like this: "we have no access to the content of an entity, like data extension, but without having to ask for specific properties", the action item is "comment about possible entity API". (The discussion moved into ways of configuring such an API, which perhaps moved the original "without having to ask for specific properties" part into the background.) So from my understanding, a possible solution would be to add a new kind of entity API, which in the simple, no-config case would return data suitable for previews?

Thank you @fsteeg for the background. This proposal is only about presenting an entity for identification just like with the existing preview feature, with the difference being the format and a few standardized properties meant to move control of display and accessibility from the reconciliation endpoint to the client.

If we remove the url (available via view.url), and replace the tags with type, we only have the additional image diverging from what an entity is according to the spec:

In hindsight I maybe shouldn't have chosen so similar property names, nor exemplified with Wikidata content. The properties are not intended to represent entity data 1:1 but information that can be used for previewing and identification. To exemplify using the descriptions of tags and description from my proposal:

  • description An optional description which can be provided to disambiguate namesakes, providing more context.
  • tags An optional list of tags and optionally an URL associated with each tag. Tags can represent entity types, categories among other things;

I would like to emphasis that my proposal and initial PR does not mention data access nor is it in conflict with potential efforts around entity data access or JSON-LD. The proposal is solely about ensuring clients can control the display and accessibility of previews and to make it possibly to use previews in non-web environments.

@thadguidry
Copy link
Contributor

thadguidry commented Nov 23, 2024

In my current service prototype, I have disambiguating properties (a few properties used for identification to clearly expose differences between other entities). Previews have a high need for showcasing disambiguating properties (not all properties of an entity - Wikidata has P1963 but it's not purely for a limited set of disambiguating properties, but instead all common properties for a type). My need is for describing my limited set of properties directly within a structured preview.

Thus, the further problem that I have is when I need to give more information about those disambiguating properties themselves. For each Previews' disambiguating property, I was hoping to avoid data duplication and instead use a graph (preferably JSON-LD) so that I could describe the disambiguating properties once at the beginning of the preview.

Ex. 1 Disambiguating Property "affiliation" without using JSON-LD syntax or context, but just descriptors for universal understanding

{
    "affiliation": {
        "alias" : [
            "ally",
            "companion",
            "cohort"
        ],
        "description": "a loose alliance with another organization",
        "about": {
            "type" : "Relationship",
            "alias" : "Connection"
        },
        "sameAs" : "http://www.w3.org/ns/org#memberOf"
    }
}

My service would deal mostly with Organizations and having disambiguating properties such as "leader", "affiliation", "location", "industry".

Preview:

{ "id" : "1234",
  "name" : "Affiliated Metals",
  "affiliation" : "Reliance Steel & Aluminum Co.",
  "location" : "Salt Lake City, Utah",
  "industry" : "423510"
}

@wetneb
Copy link
Member

wetneb commented Dec 4, 2024

I noticed that the proposed contents of such structured previews is quite close to the output of the Suggest Entities endpoint:
https://reconciliation-api.github.io/specs/1.0-draft/#suggest-responses

I wonder if your use case could be satisfied by adding any missing fields to that response (for instance, it could be nice to have images in those auto-complete widgets). You could then call this endpoint with the full entity id as prefix. We could even specify it in the specs that if a valid entity identifier is provided as prefix, the service is expected to return it in the list (which is what a lot of services already do anyway).

To me, it would have the advantage of consolidating an existing endpoint, meaning less implementation effort for service authors.

@Abbe98
Copy link
Member Author

Abbe98 commented Dec 4, 2024

@wetneb I like that idea a lot, the suggest service has in my opinion a similar intent to that of an preview(compared to that of for example data fetching). I also like the idea of just reusing the prefix parameter.

One issue that comes to mind is that an identifier might also be a legitime search, so maybe although you loose some backwards compatibility it might be worth introducing a new parameter.

@thadguidry
Copy link
Contributor

@Abbe98 can you give an example of a "legitime search"?

@Abbe98
Copy link
Member Author

Abbe98 commented Dec 10, 2024

@Abbe98 can you give an example of a "legitime search"?

The case I imagine as most common would be a search for a number in a dataset using numeric identifiers, but as a real world case consider a Wikidata search for Q1. In the case of Wikidata one could of-course move the identifiers to use the HTTP URIs but it's a luxury few reconciliation services have.

@fsteeg
Copy link
Member

fsteeg commented Dec 12, 2024

We could even specify it in the specs that if a valid entity identifier is provided as prefix, the service is expected to return it in the list

To use it for previews though, we would need to be able to pick a specific result, right? So maybe it would have to be not just in the list, but the first element? But would we then still need a way for clients to know if the first element is actually a preview for the given identifier (to decide if it wants to show the preview or not)?

@wetneb
Copy link
Member

wetneb commented Dec 12, 2024

To use it for previews though, we would need to be able to pick a specific result, right?

Well, the identifier of the entity is included in the response, so I think it shouldn't be too hard for the client to filter the list of candidates, checking if their identifier is identical to the one they requested, no?

@fsteeg
Copy link
Member

fsteeg commented Dec 12, 2024

Well, the identifier of the entity is included in the response, so I think it shouldn't be too hard for the client to filter the list of candidates, checking if their identifier is identical to the one they requested, no?

Yes, right. So the Wikidata search for Q1 example would still be doable that way: when using it for previews, the client would pick the exact Q1 match, for an actual suggest search we'd still get / use the full result list.

Here's the original example as a suggest response result (with added image):

{
  "id": "http://www.wikidata.org/entity/Q2",
  "name": "Earth",
  "description": "third planet from the Sun in the Solar System",
  "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/5/5b/The_Blue_Marble_%285052124705%29.jpg/330px-The_Blue_Marble_%285052124705%29.jpg",
  "notable": [
    {"name": "terrestrial planet", "id": "http://www.wikidata.org/entity/Q128207"},
    {"name": "inner planet of the Solar System", "id": "http://www.wikidata.org/entity/Q3504248"}
  ]
}

So indeed the only missing field would be image, right?

@Abbe98
Copy link
Member Author

Abbe98 commented Dec 12, 2024

To use it for previews though, we would need to be able to pick a specific result, right?

Well, the identifier of the entity is included in the response, so I think it shouldn't be too hard for the client to filter the list of candidates, checking if their identifier is identical to the one they requested, no?

I don't think we should push such complexity on to the client for things like previews, not considering that the requested entity might not be on the first page of suggested entities.

I think it would be beneficial to know for sure if the value is an identifier, as it would allow service implementations to make quick identifier lookups rather than searches.

@wetneb
Copy link
Member

wetneb commented Jan 14, 2025

I have opened two PRs with the aim to satisfy the use case expressed here without introducing a new endpoint:

@wetneb
Copy link
Member

wetneb commented Jan 14, 2025

I think it would be beneficial to know for sure if the value is an identifier, as it would allow service implementations to make quick identifier lookups rather than searches.

I understand why you'd want that as a service implementer, but given that we already ask services to recognize identifiers when supplied in the prefix field (because it makes sense from an end user perspective), I think it would be redundant to have a different parameter to supply an identifier.

@Abbe98
Copy link
Member Author

Abbe98 commented Jan 14, 2025

I think it would be beneficial to know for sure if the value is an identifier, as it would allow service implementations to make quick identifier lookups rather than searches.

I understand why you'd want that as a service implementer, but given that we already ask services to recognize identifiers when supplied in the prefix field (because it makes sense from an end user perspective), I think it would be redundant to have a different parameter to supply an identifier.

For our use this is fine as the vast majority of our identifiers are URIs, however, I imagine this will be rather painful to implement for services using less verbose identifiers such as numbers as the service can't know the intent of the call(it being ether a single preview or a suggest list).

@Abbe98
Copy link
Member Author

Abbe98 commented Jan 16, 2025

I think it would be beneficial to know for sure if the value is an identifier, as it would allow service implementations to make quick identifier lookups rather than searches.

I understand why you'd want that as a service implementer, but given that we already ask services to recognize identifiers when supplied in the prefix field (because it makes sense from an end user perspective), I think it would be redundant to have a different parameter to supply an identifier.

For our use this is fine as the vast majority of our identifiers are URIs, however, I imagine this will be rather painful to implement for services using less verbose identifiers such as numbers as the service can't know the intent of the call(it being ether a single preview or a suggest list).

We discussed this today and wonder if a compromise would to have both #189 as well as introducing a limit parameter. The later has been requested many times on OpenRefine's side for existing suggestion interfaces and a limit of 1 would help us detect the intent to use the response in a preview.

@wetneb
Copy link
Member

wetneb commented Jan 16, 2025

I think introducing a limit parameter would make sense.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants