Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

src: implement whatwg's URLPattern spec #56452

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

anonrig
Copy link
Member

@anonrig anonrig commented Jan 3, 2025

Work in progress. Opening to receive some early response. Not ready to land.

Co-authored-by: Daniel Lemire (@lemire)

Blocked

This is blocked from landing due to the old macOS machines we use in our infrastructure (cc @nodejs/build)

Notes:

  • Ada now requires C++20
  • URLPattern is now a global class.
  • URLPattern is also exposed in node:url module
  • Ada now enables exceptions just like V8. This is done because std::regex, the default regex library of C++ does not have any non-exception API surface like std::filesystem. The alternative to not enabling exceptions is to bundle Ada with a regex library or implementing it's own regex parser, which is too much work for URLPattern at this stage. Further Ada releases can support such changes to disable exceptions.

TODOs

  • Pass all web-platform tests
  • Release Ada v3 before landing this PR
  • Make sure to split all changes to multiple commits
  • Add @lemire as co-author to all commits
  • Land upstream pull-request implement URLPattern ada-url/ada#785
  • Add documentation for global and node:url module declarations.

cc @nodejs/cpp-reviewers

Fixes #40844

@anonrig anonrig requested review from jasnell and RafaelGSS January 3, 2025 16:07
@nodejs-github-bot
Copy link
Collaborator

Review requested:

  • @nodejs/gyp
  • @nodejs/security-wg
  • @nodejs/startup
  • @nodejs/url
  • @nodejs/web-standards

@nodejs-github-bot nodejs-github-bot added lib / src Issues and PRs related to general changes in the lib or src directory. needs-ci PRs that need a full CI run. labels Jan 3, 2025
@targos targos added the semver-major PRs that contain breaking changes and should be released in the next major version. label Jan 3, 2025
@anonrig anonrig added macos Issues and PRs related to the macOS platform / OSX. blocked PRs that are blocked by other issues or PRs. build-agenda labels Jan 3, 2025
@targos
Copy link
Member

targos commented Jan 3, 2025

Ada now enables exceptions just like UV and V8

Can you elaborate? libuv is a C library so I don't think exceptions exist there, and I'm pretty sure V8 is built with exceptions disabled.

@anonrig
Copy link
Member Author

anonrig commented Jan 3, 2025

Ada now enables exceptions just like UV and V8

Can you elaborate? libuv is a C library so I don't think exceptions exist there, and I'm pretty sure V8 is built with exceptions disabled.

My bad UV does not enable exceptions. Referencing v8.gyp file:

{
  'target_name': 'torque_base',
  'type': 'static_library',
  'toolsets': ['host', 'target'],
  'sources': [
    '<!@pymod_do_main(GN-scraper "<(V8_ROOT)/BUILD.gn"  "\\"torque_base.*?sources = ")',
  ],
  'dependencies': [
    'v8_shared_internal_headers',
    'v8_libbase',
  ],
  'defines!': [
    '_HAS_EXCEPTIONS=0',
    'BUILDING_V8_SHARED=1',
  ],
  'cflags_cc!': ['-fno-exceptions'],
  'cflags_cc': ['-fexceptions'],
  'xcode_settings': {
    'GCC_ENABLE_CPP_EXCEPTIONS': 'YES',  # -fexceptions
  },
  'msvs_settings': {
    'VCCLCompilerTool': {
      'RuntimeTypeInfo': 'true',
      'ExceptionHandling': 1,
    },
  },
}

@targos
Copy link
Member

targos commented Jan 3, 2025

This is not really V8. It's a build-time executable (torque) used to generate code for V8

@anonrig anonrig requested a review from Qard January 3, 2025 16:27
src/node_url_pattern.cc Outdated Show resolved Hide resolved
src/node_url_pattern.cc Outdated Show resolved Hide resolved
ada::url_pattern_options options{};
Local<Value> ignore_case;
if (obj->Get(env->context(),
FIXED_ONE_BYTE_STRING(env->isolate(), "ignoreCase"))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

consider adding to env-properties.h


MaybeLocal<Value> URLPattern::Hash() const {
auto context = env()->context();
return ToV8Value(context, url_pattern_.get_hash());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the key challenge here is that this will copy the string on every call. Any chance of memoizing the string once created.

URLPattern::URLPattern(Environment* env,
Local<Object> object,
ada::url_pattern&& url_pattern)
: BaseObject(env, object) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We likely should introduce this as experimental in the first release, even if it graduates from experimental quickly. There should likely be a warning emitted on the first construction.

@@ -1571,6 +1572,7 @@ module.exports = {
toPathIfFileURL,
installObjectURLMethods,
URL,
URLPattern,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs docs added.

Comment on lines +398 to +462
if (!url_pattern->Protocol().ToLocal(&result)) {
return;
}
info.GetReturnValue().Set(result);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if (!url_pattern->Protocol().ToLocal(&result)) {
return;
}
info.GetReturnValue().Set(result);
if (url_pattern->Protocol().ToLocal(&result)) {
info.GetReturnValue().Set(result);
}

a bit more compact to invert the checks on these.

src/node_url_pattern.cc Outdated Show resolved Hide resolved
void URLPattern::New(const FunctionCallbackInfo<Value>& args) {
Environment* env = Environment::GetCurrent(args);

CHECK(args.IsConstructCall());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since this constructor is exposed directly to users, this should throw an exception rather than abort

@jasnell
Copy link
Member

jasnell commented Jan 3, 2025

Can you also include a fairly simple benchmark?

Copy link

codecov bot commented Jan 3, 2025

Codecov Report

Attention: Patch coverage is 86.45598% with 60 lines in your changes missing coverage. Please review.

Project coverage is 88.67%. Comparing base (7b472fd) to head (261359f).
Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
src/node_url_pattern.cc 86.57% 21 Missing and 37 partials ⚠️
src/node_url_pattern.h 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #56452      +/-   ##
==========================================
- Coverage   89.16%   88.67%   -0.50%     
==========================================
  Files         661      664       +3     
  Lines      191421   192025     +604     
  Branches    36845    36633     -212     
==========================================
- Hits       170673   170270     -403     
- Misses      13615    14539     +924     
- Partials     7133     7216      +83     
Files with missing lines Coverage Δ
...internal/bootstrap/web/exposed-window-or-worker.js 93.89% <100.00%> (+0.19%) ⬆️
lib/internal/url.js 95.79% <100.00%> (-1.89%) ⬇️
lib/url.js 98.94% <100.00%> (-1.06%) ⬇️
src/node_binding.cc 83.66% <ø> (ø)
src/node_external_reference.h 100.00% <ø> (ø)
src/node_url_pattern.h 0.00% <0.00%> (ø)
src/node_url_pattern.cc 86.57% <86.57%> (ø)

... and 99 files with indirect coverage changes

BufferValue input_buffer(env->isolate(), args[0]);
CHECK_NOT_NULL(*input_buffer);
input = input_buffer.ToString();
} else if (args[0]->IsObject()) {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s not part of spec but can we add a “is URL instance” fast path here? To avoid re-normalizing each URL property (as with the JsObject route)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can. I haven't started working on optimizations yet due to the inconsistencies between the spec and WPT.

Copy link
Member

@mcollina mcollina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don’t think this is a good pattern to land in Node.js. Specifically, a server using this will create one per route and iterate in a loop. This will be slow, specifically if you need to match the last of the list.

(This feedback was provided when URLPattern was standardized and essentially ignored).

For this to be useful, we would need to have a Node.js-specific API to organize these URLPattern in a radix prefix trie and actually do the matching all at once.

I can possibly be persuaded that we need this for Web platform compatibility, but it’s not that popular either (unlike fetch()).

@mcollina
Copy link
Member

mcollina commented Jan 3, 2025

@jasnell I’ll try to build this and get a benchmark going against the ecosystem routers.

@anonrig
Copy link
Member Author

anonrig commented Jan 3, 2025

@jasnell I’ll try to build this and get a benchmark going against the ecosystem routers.

Right now, this pull-request does not pass WPT, and not at all optimized. Any benchmarks will not be beneficial.

@marco-ippolito
Copy link
Member

I'm +1 on adding it to Node.js, I like more WP compatibility and I feel like it's an api that could be useful for libraries.
I will not comment on the implementation details since I'm not an expert on the subject

Copy link
Member

@ronag ronag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have landed plenty of inefficient Web API's. Don't see why this would be different. I'm of the consistent opinion that we should implement them but strongly discourage their use for any performance sensitive code.

@mon-jai
Copy link

mon-jai commented Jan 4, 2025

Just out of curiosity, why can't we port the Chromium's implementation but instead need to reimplement this ourselves? (I am not a maintainer)

@jasnell
Copy link
Member

jasnell commented Jan 4, 2025

Just out of curiosity, why can't we port the Chromium's implementation but instead need to reimplement this ...

The chromium implementation depends on quite a few chromium/blink internals that we don't have here. That kind of reuse is difficult to just make happen. This ada-based implementation is standalone, has no other dependencies, etc. It's actually easier overall just to write it from scratch than to repurpose that other impl.

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch 2 times, most recently from 65631d9 to 4e224f9 Compare January 4, 2025 17:08
@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch 3 times, most recently from 022eefd to af23313 Compare January 5, 2025 16:39
@domenic
Copy link
Contributor

domenic commented Jan 6, 2025

One worry worth highlighting here is that, IIUC, the architecture of this PR will use a separate regexp library for the regexps that show up in the URLPattern constructor, vs. the ones that show up in the RegExp constructor.

That could be confusing for developers, as I assume the feature overlap will not be the same. For example, I wonder what results you would get after this PR for:

new URLPattern("https://example.com/:id([\\p{Decimal_Number}--[1-9]])").test("https://example.com/0");

new RegExp("[\\p{Decimal_Number}--[1-9]]", "v").test("0");

In Chromium we have architected our URLPattern implementation into two parts: one, liburlpattern, which is meant to have very few dependencies, and largely works to produce regexp pattern strings. The second part, in blink/renderer/core/url_pattern, implements more parts of the spec, including bridging to V8's regexp engine.

I suspect a similar architecture would not fit that well for you, given the different boundaries between Ada / Node.js. But maybe Ada could have some sort of regexp-matcher-callback system which would allow the use of V8's regexps instead of std::regex.

@jasnell
Copy link
Member

jasnell commented Jan 6, 2025

I suspect a similar architecture would not fit that well for you, given the different boundaries between Ada / Node.js. But maybe Ada could have some sort of regexp-matcher-callback system which would allow the use of V8's regexps instead of std::regex.

This is a good idea. I think that if we introduce this as experimental initially we can do this in a separate step though, before it graduates to supported.

@anonrig
Copy link
Member Author

anonrig commented Jan 6, 2025

I suspect a similar architecture would not fit that well for you, given the different boundaries between Ada / Node.js. But maybe Ada could have some sort of regexp-matcher-callback system which would allow the use of V8's regexps instead of std::regex.

You're right. I think we should also wait for the web-platform tests to be fixed or discussed before releasing an URLPattern implementation that is spec compliant but not Chromium compliant.

@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch 13 times, most recently from 2dcb81a to ad8610c Compare January 8, 2025 16:47
@anonrig anonrig force-pushed the yagiz/implement-url-pattern branch from ad8610c to 261359f Compare January 8, 2025 16:58
@mcollina mcollina dismissed their stale review January 8, 2025 16:59

Dismissing my block

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
blocked PRs that are blocked by other issues or PRs. build-agenda lib / src Issues and PRs related to general changes in the lib or src directory. macos Issues and PRs related to the macOS platform / OSX. needs-ci PRs that need a full CI run. semver-major PRs that contain breaking changes and should be released in the next major version. tsc-agenda Issues and PRs to discuss during the meetings of the TSC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

implement URLPattern