Fix search engine API response handling #22075

biskweet · 2024-12-28T02:23:00Z

Added manual escaping of HTML entities before automatic replacement.
Now, "Ubuntu 22.04.5 LTS ("Jammy Jellyfish")" becomes "Ubuntu 22.04.5 LTS (\"Jammy Jellyfish\")" instead of "Ubuntu 22.04.5 LTS ("Jammy Jellyfish")".
Closes #22074.

glassez · 2024-12-28T05:22:13Z

Fix issue #22074 is incorrect commit/PR title according to qBittorrent contribution guidelines.

glassez · 2024-12-28T05:54:56Z

IMO, the problem is that htmlentitydecode() is automatically applied in retrieve_url() independently from retrieved data format. Either retrieve_url() should check data format before apply it or applying it should be matter of plugins themselves.
@Chocobo1, what do you think?

Chocobo1 · 2024-12-28T07:56:03Z

Either retrieve_url() should check data format before apply it or applying it should be matter of plugins themselves.

I agree with the latter option.
Ideally dataStr = htmlentitydecode(dataStr) should be removed but I'm not sure whether it will break existing plugins. If it will, then the function could add a parameter to control whether htmlentitydecode(dataStr) is invoked and it defaults to True. The plugin can choose to turn it off.
Or we can just remove htmlentitydecode(dataStr) and require plugins to update/follow.

biskweet · 2024-12-28T13:02:21Z

If it will, then the function could add a parameter to control whether htmlentitydecode(dataStr) is invoked and it defaults to True. The plugin can choose to turn it off.

If we are willing to change the function signature then maybe it would be simpler to add a boolean in arguments that defaults to false, and which determines whether we want to manually escape " or not before invoking htmlentitydecode. Something like

def retrieve_url(url: str, custom_headers: Mapping[str, Any] = {}, request_data: Optional[Any] = None, should_escape_quotes=False) -> str:
    # ...

    if should_escape_quotes:
        dataStr = dataStr.replace('&quot;', '\\"')
    dataStr = htmlentitydecode(dataStr)
    return dataStr

Thus allowing plugins to control this behavior. I don't think applying htmlentitydecode is a problem per se since it's generally needed. It's just that in this particular case, it breaks JSON correctness.

glassez · 2024-12-28T14:09:17Z

It's just that in this particular case, it breaks JSON correctness.

For me, the need to decode HTML entities is a special case (although it may be more common than others), and in the general case it can retrieve data in arbitrary format (not only HTML), which should keep HTML entities as-is.
Anyway, any assumptions about retrieved data is matter of specific plugins. Even if 90% of plugins would call htmlentitydecode() after retrieve_url() it doesn't seem bad for me.

the function could add a parameter to control whether htmlentitydecode(dataStr) is invoked and it defaults to True. The plugin can choose to turn it off.
Or we can just remove htmlentitydecode(dataStr) and require plugins to update/follow.

I would choose first option for v5.0.x and v5.1.x, and second one for v5.2.x and above.

biskweet · 2024-12-28T14:41:30Z

For me, the need to decode HTML entities is a special case (although it may be more common than others), and in the general case it can retrieve data in arbitrary format (not only HTML), which should keep HTML entities as-is.

I agree it should be a togglable option, however I don't think it should be all-or-nothing. Almost any request to apibay.org yields JSON results containing HTML entities (at least &, try this query with Moana 2, warning NSFW results). I believe it is important to correctly escape quotes while still parsing entities.

Another, maybe better option would be to parse these entities later in the process, when using the JSON data in the UI. That would make more sense since parsing entities is just a matter of making data human-readable.

Chocobo1 · 2024-12-29T06:44:25Z

I would choose first option for v5.0.x and v5.1.x, and second one for v5.2.x and above.

Just a side note. Let's limit the change only for >= 5.1.x. Backporting doesn't seem like a good idea.

glassez · 2024-12-29T07:40:57Z

Backporting doesn't seem like a good idea.

Why, considering that existing plugins are not supposed to be affected?
However, I don't mind (as long as the v5.1 release doesn't take too long).

Chocobo1 · 2024-12-29T07:58:20Z

Why, considering that existing plugins are not supposed to be affected?

The fastest way to fix #22074 is for the plugin to fetch the web data by itself (duplicate/copy the code) and not rely on qbt helpers. Either backport to v5.0 or releasing v5.1 will still take a lot of time to reach users.
Also I would like to avoid breaking anything (in v5.0) if something goes wrong.

Chocobo1 · 2025-01-04T16:08:10Z

The fastest way to fix #22074 is for the plugin to fetch the web data by itself (duplicate/copy the code) and not rely on qbt helpers.

@biskweet
It seems that we (main contributors) agreed on this course. Would you mind making the changes to the plugin directly instead? The affected plugins is located in another repo: https://github.com/qbittorrent/search-plugins/blob/master/nova3/engines/piratebay.py
As for helpers.py, I'll take care of it.

biskweet · 2025-01-07T14:13:05Z

Created a new pull request here on the qbittorrent/search-plugins repo.

Fix issue #22074

06cb542

glassez requested a review from Chocobo1 December 28, 2024 05:20

glassez added the Search engine Issues related to the search engine/search plugins functionality label Dec 28, 2024

biskweet changed the title ~~Fix issue #22074~~ Fix seach engine API response handling Dec 28, 2024

xavier2k6 changed the title ~~Fix seach engine API response handling~~ Fix search engine API response handling Dec 29, 2024

biskweet closed this Jan 7, 2025

biskweet deleted the Fix branch January 7, 2025 13:06

biskweet mentioned this pull request Jan 7, 2025

Fix apibay.org search engine API response handling qbittorrent/search-plugins#331

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix search engine API response handling #22075

Fix search engine API response handling #22075

biskweet commented Dec 28, 2024

glassez commented Dec 28, 2024

glassez commented Dec 28, 2024

Chocobo1 commented Dec 28, 2024

biskweet commented Dec 28, 2024 •

edited

Loading

glassez commented Dec 28, 2024

biskweet commented Dec 28, 2024 •

edited

Loading

Chocobo1 commented Dec 29, 2024

glassez commented Dec 29, 2024

Chocobo1 commented Dec 29, 2024 •

edited

Loading

Chocobo1 commented Jan 4, 2025

biskweet commented Jan 7, 2025

Fix search engine API response handling #22075

Fix search engine API response handling #22075

Conversation

biskweet commented Dec 28, 2024

glassez commented Dec 28, 2024

glassez commented Dec 28, 2024

Chocobo1 commented Dec 28, 2024

biskweet commented Dec 28, 2024 • edited Loading

glassez commented Dec 28, 2024

biskweet commented Dec 28, 2024 • edited Loading

Chocobo1 commented Dec 29, 2024

glassez commented Dec 29, 2024

Chocobo1 commented Dec 29, 2024 • edited Loading

Chocobo1 commented Jan 4, 2025

biskweet commented Jan 7, 2025

biskweet commented Dec 28, 2024 •

edited

Loading

biskweet commented Dec 28, 2024 •

edited

Loading

Chocobo1 commented Dec 29, 2024 •

edited

Loading