Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add WebAPI for fetching torrent metadata #21015

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

Piccirello
Copy link
Member

@Piccirello Piccirello commented Jul 1, 2024

This PR implements two new APIs for fetching a torrent's metadata. The APIs accept a magnet URI, torrent hash, .torrent URL, or uploaded .torrent file, and return the torrent's associated metadata. This PR also modifies the /torrents/add API to support downloading a torrent whose metadata has been previously fetched. The ultimate goal is for the WebUI to provide an Add Torrent experience equivalent to that of the GUI, where content can be reprioritized/unchecked before the torrent is added.

/fetchMetadata API

HTTP request

To request metadata for a torrent, specify the torrent in the source query parameter of a GET request to /api/v2/torrents/fetchMetadata. Torrents are supported in the following formats:

  • magnet URI (e.g. magnet:?xt=urn:btih:a8eeefc8a0dc402b24686ddfd775a409fe4b00e0&dn=example)
  • hash (e.g. a8eeefc8a0dc402b24686ddfd775a409fe4b00e0)
  • .torrent URL (e.g. https://example.com/example.torrent)

HTTP response

Given the asynchronous nature of retrieving metadata, there are two successful HTTP status codes used.

When metadata is requested for a torrent that requires asynchronous background work (i.e. connecting to DHT/peers), the client will receive a 202. A 202 indicates that the request was successful, but additional background work must be completed before a meaningful response can be provided. The response will contain the info hashes, if available, or an empty object.

GET /api/v2/torrents/fetchMetadata?source=abc
HTTP/1.1 202 OK
{
  "hash": string,
  "infohash_v1": string,
  "infohash_v2": string
} | {}

When metadata is available for the torrent, either because the torrent exists in the transfer list or because the metadata has been retrieved from a prior request, the client will receive a 200.

GET /api/v2/torrents/fetchMetadata?source=abc
HTTP/1.1 200 OK
{
  "comment": string,
  "created_by": string,
  "creation_date": int,
  "files": [],
  "hash": string,
  "infohash_v1": string,
  "infohash_v2": string,
  "name": string,
  "piece_size": int,
  "pieces_num": int,
  "private": bool,
  "total_size": int,
  "trackers": [],
  "webseeds": []
}

Retrieved metadata will be cached in the current web session. Subsequent requests performed within the same web session will return the metadata immediately, while other web sessions will be required to reretrieve the torrent's metadata from peers. Once a torrent is added using the cached metadata, the metadata is removed from the cache.

/parseMetadata API

HTTP request

To request metadata for one to several .torrent file(s), you may upload the files to the /parseMetadata API. To do so, submit the file(s) as multipart MIME data. You may use any key for the uploaded value.

To test file upload using curl, specify the -F flag (e.g. curl https://127.0.0.1:8080/api/v2/torrents/parseMetadata --get -F file=@"/root/example.torrent").

HTTP response

This API always responds to successful requests (i.e. valid torrent file(s)) with a 200. The response will contain the full metadata for the uploaded torrent(s). The response object is keyed off of the uploaded file's name.

GET /api/v2/torrents/parseMetadata
HTTP/1.1 200 OK
{
  "example.torrent": {
    "comment": string,
    "created_by": string,
    "creation_date": int,
    "files": [],
    "hash": string,
    "infohash_v1": string,
    "infohash_v2": string,
    "name": string,
    "piece_size": int,
    "pieces_num": int,
    "private": bool,
    "total_size": int,
    "trackers": [],
    "webseeds": []
  }
}

/add API

The existing /add API now supports using the metadata cache that's populated by the new metadata APIs. When specifying a url and/or torrent file to download, the metadata cache is first checked for the torrent. If found, the metadata is used directly from the cache, rather than needing to re-retrieve it. Note that when specifying the name of a .torrent file uploaded via the parseMetadata API, you must prepend file: to the file name. For example, if you uploaded example.torrent to the parseMetadata API, you can add this torrent via the /add API by specifying a url of file:example.torrent.

When metadata is retrieved directly from the cache, you may also specify a new filePriorities parameter. This parameter allows for specifying the file priority of each file in the torrent. This parameter may only be specified when adding a single torrent.

Alternatives:

I explored having the metadata API leave the request open until the metadata was available. Once the metadata was fetched, it would be returned directly in the response of the original request. One downside of this approach is that metadata retrieval can take an arbitrary long amount of time. This could result in torrents whose metadata could never be retrieved via this API (e.g. due to the retrieval taking longer than the client's/reverse proxy's request timeout). This approach would also require some further modification of qBittorrent's web application layer to suppress the default behavior of returning a blank response.

Future work:

  • Modify the WebUI to make use of the new /fetchMetadata API. This will likely mean splitting the current Add/Upload Torrent dialog into two dialogs. The first dialog will support specifying the URL(s)/.torrent file(s) to submit, while the second dialog will display the torrent's metadata and allow for modification of file priorities.
  • Support downloading the retrieved metadata as a .torrent file (as supported in the GUI)

Closes #20966.

@glassez glassez self-assigned this Jul 1, 2024
@glassez glassez added the WebAPI WebAPI-related issues/changes label Jul 1, 2024
Copy link
Member

@glassez glassez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only preliminary comments after brief review.

src/webui/api/apicontroller.h Outdated Show resolved Hide resolved
src/webui/api/apicontroller.h Outdated Show resolved Hide resolved
src/webui/api/apicontroller.h Outdated Show resolved Hide resolved
src/webui/api/apicontroller.h Outdated Show resolved Hide resolved
src/webui/api/serialize/serialize_torrent.h Outdated Show resolved Hide resolved
src/webui/api/torrentscontroller.cpp Outdated Show resolved Hide resolved
src/webui/api/torrentscontroller.cpp Outdated Show resolved Hide resolved
src/webui/api/torrentscontroller.cpp Outdated Show resolved Hide resolved
src/webui/api/torrentscontroller.h Outdated Show resolved Hide resolved
@glassez
Copy link
Member

glassez commented Jul 1, 2024

@Piccirello
I wonder if it's difficult to implement client side bencode parser in JS? It is very possible that one (or more) already exists.
I believe that if we had used one, we would have got a more universal solution. Since users often add local (for the WebUI) .torrent files, we would not have to send it to the server first for parsing, and then again to add the torrent. As for magnet links, it would also look easier if you sent raw metadata to the client and forget about it. Otherwise, how do you propose to add such torrents (after WebUI user selects file priorities etc.)? Of course, you can store all this metadata in a session... But this seems to be a more cumbersome implementation.

In any case, this part of the API should be thought out, implying the subsequent addition of torrents. Otherwise, we may end up with something little useful in practice.

@Piccirello
Copy link
Member Author

Since users often add local (for the WebUI) .torrent files, we would not have to send it to the server first for parsing, and then again to add the torrent.

I'm not convinced that's the approach we'll eventually take. I can imagine sending the .torrent file once, returning the metadata to the client, and then allowing the torrent to be added without needing to re-send the file (likely by transmitting the info hash).

As for magnet links, it would also look easier if you sent raw metadata to the client and forget about it. Otherwise, how do you propose to add such torrents (after WebUI user selects file priorities etc.)? Of course, you can store all this metadata in a session... But this seems to be a more cumbersome implementation.

To me the session seems like an appropriate place to store this. I don't think the client should be responsible for parsing this data. It would also mean each client (official and unofficial) would need to implement it.

In any case, this part of the API should be thought out, implying the subsequent addition of torrents. Otherwise, we may end up with something little useful in practice.

I agree with this. I'll try to sketch out what the next steps would look like and how this API would be used.

@Piccirello Piccirello force-pushed the metadata-api branch 2 times, most recently from cec83e2 to f857170 Compare July 1, 2024 18:38
@NikcN22
Copy link

NikcN22 commented Jul 1, 2024

@Piccirello I wonder if it's difficult to implement client side bencode parser in JS? It is very possible that one (or more) already exists. I believe that if we had used one, we would have got a more universal solution. Since users often add local (for the WebUI) .torrent files, we would not have to send it to the server first for parsing, and then again to add the torrent. As for magnet links, it would also look easier if you sent raw metadata to the client and forget about it. Otherwise, how do you propose to add such torrents (after WebUI user selects file priorities etc.)? Of course, you can store all this metadata in a session... But this seems to be a more cumbersome implementation.

In any case, this part of the API should be thought out, implying the subsequent addition of torrents. Otherwise, we may end up with something little useful in practice.

I think there is no need to embed the Bencode decoder directly into the client API. This is quite easy to do directly in the “graphical” part. More interesting is the need to provide the ability to assign priority to files in the add method.

@Piccirello Piccirello force-pushed the metadata-api branch 2 times, most recently from 1490741 to f736e62 Compare July 1, 2024 20:22
@Chocobo1
Copy link
Member

Chocobo1 commented Jul 2, 2024

I wonder if it's difficult to implement client side bencode parser in JS? It is very possible that one (or more) already exists.

FYI, there certainly exists bencode encoder/decoder library in JS however I'm not aware that they are compatible with bittorrent v2. In the past, I had to mod one to suit my need.

@glassez
Copy link
Member

glassez commented Jul 3, 2024

FYI, there certainly exists bencode encoder/decoder library in JS however I'm not aware that they are compatible with bittorrent v2.

"bencode" format is independent from BitTorrent so generic "bencode" decoder should not depend on it too. Or do you refer to torrent file specific parsers?

@Chocobo1
Copy link
Member

Chocobo1 commented Jul 3, 2024

"bencode" format is independent from BitTorrent so generic "bencode" decoder should not depend on it too.

They had deficiencies in their implementations. Not fully conform with the spec.

@glassez
Copy link
Member

glassez commented Jul 3, 2024

"bencode" format is independent from BitTorrent so generic "bencode" decoder should not depend on it too.

They had deficiencies in their implementations. Not fully conform with the spec.

It seems to be the same problem as with bencode editors. I couldn't find BitTorrent independent editor for Linux.

@Piccirello
Copy link
Member Author

I ended up exploring how retrieved metadata would tie into the /add API, resulting in some changes to the /metadata API. Namely, the /metadata API now supports processing multiple sources at once. I've also made the necessary changes to the /add API to support downloading a torrent whose metadata has been previously retrieved via /metadata. This allows for adding the torrent with custom file priorities, which will enable a future PR to modify the WebUI's Add Torrent experience to mimic that of the GUI. PR description has been modified with the full changes.

src/webui/api/torrentscontroller.cpp Fixed Show resolved Hide resolved
@Piccirello Piccirello marked this pull request as ready for review July 9, 2024 21:42
@Piccirello Piccirello requested a review from a team July 9, 2024 21:42
@glassez
Copy link
Member

glassez commented Jul 12, 2024

sources can be delineated by a (non-url encoded) comma

I suppose you're talking about percent-encoding, right?
I don't believe it is valid requirement. IIRC, in URL some of characters MUST be percent-encoded and other CAN be percent-encoded.

@glassez
Copy link
Member

glassez commented Jul 12, 2024

When metadata is requested for multiple torrents, a 202 will be returned if any of the torrents requires asycnhronous background work. This means that a 202 may include data for some of the request torrents.

It would look much simpler and more convenient if "metadata" endpoint accepted only single source.

@Piccirello
Copy link
Member Author

sources can be delineated by a (non-url encoded) comma

I suppose you're talking about percent-encoding, right? I don't believe it is valid requirement. IIRC, in URL some of characters MUST be percent-encoded and other CAN be percent-encoded.

, is a reserved character and so it must be percent encoded.

It would look much simpler and more convenient if "metadata" endpoint accepted only single source.

This was my initial approach but I changed it because of how it will integrate with the Add Torrent dialogs in the WebUI. With proper documentation and status codes, I think this API will be easy for consumers/clients to understand.

@glassez
Copy link
Member

glassez commented Jul 12, 2024

, is a reserved character and so it must be percent encoded.

Then why do you suggest using it non-encoded?

I changed it because of how it will integrate with the Add Torrent dialogs in the WebUI.

Do "Add Torrent dialogs in the WebUI" really require API endpoint to allow accepting multiple sources at once?

With proper documentation and status codes, I think this API will be easy for consumers/clients to understand.

If "metadata" accepts only single source then 202 status would definitely mean that the response does not contain metadata, and 200 - on the contrary, that the metadata is available in the response.
If "metadata" accepts multiple sources then 202 status is ambiguous and you will still have to check the response body to understand which metadata is immediately available and which is not. In addition, it is possible that one of the sources is invalid. What should you do in this case? In single source approach you could just return error status...

@Piccirello
Copy link
Member Author

My question is - is this JSON structure a standard or just some arbitrary decision? If it's arbitrary, why is one structure better than the other? AFAIK there is no standard for representing Bencode encoding in JSON.

@glassez
Copy link
Member

glassez commented Nov 13, 2024

representing Bencode encoding in JSON

It makes no sense to compare the data structure (e.g. tree) and the way it is encoded (e.g. Bencode, JSON, XML).

@NikcN22
Copy link

NikcN22 commented Nov 13, 2024

My question is - is this JSON structure a standard or just some arbitrary decision? If it's arbitrary, why is one structure better than the other? AFAIK there is no standard for representing Bencode encoding in JSON.

Yes, this is an arbitrary decision. But this decision is based on the analysis of existing torrent files. I assume that qBittorrent already uses this model because otherwise it uses custom decoding, which is generally wrong.
But something else is more important. Uniformity, inheritance and simplicity. The skeleton of the data model should repeat the skeleton of the torrent file. This will simplify the introduction of possible changes in the future. Let's just assume that for some reason the structure of the torrent file changes. It will be easier to make the necessary changes to the model.

@glassez
Copy link
Member

glassez commented Nov 13, 2024

The skeleton of the data model should repeat the skeleton of the torrent file.

Well, actually we don't want to fully reflect the structure of the .torrent file, as it is too low-level for our purposes. But it seems to me a good idea to separate the "info" section content from another fields, since this is a key aspect of the torrent metadata structure.

(Just my opinion. I don't insist. This PR has more important problems.)

@glassez
Copy link
Member

glassez commented Nov 13, 2024

@NikcN22
FYI, "private" is part of "info" section.

@NikcN22
Copy link

NikcN22 commented Nov 13, 2024 via email

@glassez
Copy link
Member

glassez commented Nov 13, 2024

@NikcN22 FYI, "private" is part of "info" section.

I just mean, that it is part of "info" unlike your example where it is outside of it.

@Piccirello
Copy link
Member Author

The API response I'm now proposing is the following. It's based on the Torrent File spec, which Bencode Online seems to be based off of (cc @Chocobo1).

Some notable deviations from the Torrent File spec, which I'm happy to revisit:

  • announce-list renamed to trackers (and slight format deviation)
  • url-list renamed to webseeds
{
  "comment": string,
  "created_by": string,
  "creation_date": int,
  "hash": string,
  "info": {
    "files": [
      {
        "index": int,
        "path": string,
        "length": int
      }
    ],
    "length": int,
    "name": string,
    "piece_length": int,
    "pieces_num": int,
    "private": bool
  },
  "infohash_v1": string,
  "infohash_v2": string,
  "trackers": [
    {
      "url": string,
      "tier": int
    }
  ],
  "webseeds": [string]
}

@Chocobo1
Copy link
Member

Chocobo1 commented Nov 19, 2024

It's based on the Torrent File spec, which Bencode Online seems to be based off of (cc @Chocobo1).

Just to clarify. That app doesn't care about "Torrent File" spec. You get what you feed. The app only tries its best to convert bencoded data to/from JSON. And for the normal user, the most common bencoded data is .torrent file.
Note that bencode supports dictionaries so it can have structure similar to JSON.

@glassez
Copy link
Member

glassez commented Nov 19, 2024

"index": int,

IMO, it is redundant. I would just use correctly ordered "files" list in order to have "natural" indexes.

@Piccirello
Copy link
Member Author

IMO, it is redundant. I would just use correctly ordered "files" list in order to have "natural" indexes.

I will switch to this approach.

@Piccirello Piccirello force-pushed the metadata-api branch 2 times, most recently from fcb94a1 to 773b344 Compare December 14, 2024 20:17
@Piccirello
Copy link
Member Author

I've updated the API response to the following format:

{
  "comment": string,
  "created_by": string,
  "creation_date": int,
  "hash": string,
  "info": {
    "files": [
      {
        "path": string,
        "length": int
      }
    ],
    "length": int,
    "name": string,
    "piece_length": int,
    "pieces_num": int,
    "private": bool
  },
  "infohash_v1": string,
  "infohash_v2": string,
  "trackers": [
    {
      "url": string,
      "tier": int
    }
  ],
  "webseeds": [string]
}

@Piccirello
Copy link
Member Author

@glassez this PR hasn't made progress in a while, and I would like to get it merged. What objections remain about its implementation? I'm hoping to focus primarily on the API's interface. As you noted earlier:

Nothing prevents we from fixing the shortcomings of its implementation later (if any). The main thing in this matter is its basic design (and interface).

Signed-off-by: Thomas Piccirello <thomas@piccirello.com>
Signed-off-by: Thomas Piccirello <thomas@piccirello.com>
Signed-off-by: Thomas Piccirello <thomas@piccirello.com>
Signed-off-by: Thomas Piccirello <thomas@piccirello.com>
@Piccirello
Copy link
Member Author

@glassez I have some other PRs that bring massive performance improvements to the WebUI, but they're currently blocked on this PR. I would love to get this PR moving along.

@glassez
Copy link
Member

glassez commented Jan 7, 2025

I have some other PRs that bring massive performance improvements to the WebUI, but they're currently blocked on this PR.

It is doubtful that "performance improvements to the WebUI" could be blocked on this PR.
In any case, I'll give you another portion of my review when it comes time. I am very limited in my time that I can provide for the project. (For some reason, many people believe that I have to devote myself solely to reviewing someone else's PRs). Otherwise, you could have called in some other member with merge rights.

@Piccirello
Copy link
Member Author

AFAIK there are really only two active devs with merge rights. I personally don't care who approves this, I just want it approved after six months of review.

@glassez
Copy link
Member

glassez commented Jan 8, 2025

@glassez this PR hasn't made progress in a while, and I would like to get it merged. What objections remain about its implementation?

The objections are still the same. These are your "cache" classes. I can't move past them unless I close my eyes and pretend they don't exist at all. They look too obviously confusing to me. Here are some examples:

metadataCache.add(infoHash, torrentDescr); // can actually replace existing item which is unexpected for method named as `add`.
metadataCache.get(infoHash); // can return `nullopt` which is obviously confusing since I just inserted it.
metadataCache.contains(infoHash); // can return `false` which is obviously confusing since I just inserted it.
metadataCache.update(infoHash, torrentInfo);
// What is the status of this operation? Has the item been updated?
// We can't even check in advance if the item we're going to update exists
// because of the confusing behavior of `contains()`.

Comment on lines +18 to +19
torrentmetadatacache.h
torrentsourcecache.h
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would move them to "api" subfolder.

#include "base/bittorrent/infohash.h"
#include "base/bittorrent/torrentdescriptor.h"

class TorrentMetadataCache : public QObject
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see any good reason for this class (and its satellite) to inherit from QObject.

public:
using QObject::QObject;

std::optional<BitTorrent::TorrentDescriptor> get(const BitTorrent::InfoHash &infoHash);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is unlikely that it can change the state of the class. So it should be constant:

Suggested change
std::optional<BitTorrent::TorrentDescriptor> get(const BitTorrent::InfoHash &infoHash);
std::optional<BitTorrent::TorrentDescriptor> get(const BitTorrent::InfoHash &infoHash) const;

public:
using QObject::QObject;

std::optional<BitTorrent::InfoHash> get(const QString &source);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::optional<BitTorrent::InfoHash> get(const QString &source);
std::optional<BitTorrent::InfoHash> get(const QString &source) const;

else if (const auto cachedInfoHash = m_torrentSourceCache.get(source))
infoHash = cachedInfoHash.value();

if (infoHash != BitTorrent::InfoHash {})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (infoHash != BitTorrent::InfoHash {})
if (infoHash.isValid())

Everywhere.

@glassez
Copy link
Member

glassez commented Jan 8, 2025

@glassez this PR hasn't made progress in a while, and I would like to get it merged. What objections remain about its implementation?

The objections are still the same. These are your "cache" classes. I can't move past them unless I close my eyes and pretend they don't exist at all. They look too obviously confusing to me. Here are some examples:

metadataCache.add(infoHash, torrentDescr); // can actually replace existing item which is unexpected for method named as `add`.
metadataCache.get(infoHash); // can return `nullopt` which is obviously confusing since I just inserted it.
metadataCache.contains(infoHash); // can return `false` which is obviously confusing since I just inserted it.
metadataCache.update(infoHash, torrentInfo);
// What is the status of this operation? Has the item been updated?
// We can't even check in advance if the item we're going to update exists
// because of the confusing behavior of `contains()`.

I'll still review the rest of the code to see how these flaws affect it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
WebAPI WebAPI-related issues/changes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Load torrent metadata by magnet, hash... in qBittorrent-nox
4 participants