Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No --scopeType and --scopeType set to prefix gives different results when --include is set #445

Open
benoit74 opened this issue Dec 17, 2024 · 2 comments

Comments

@benoit74
Copy link
Collaborator

When an --include rule is passed to zimit, if --scopeType is not set, then only pages matching the include rule are fetched. When setting --scopeType to prefix, all URLs matching the include rule or the prefix of --url are fetched.

This contradicts the fact that by default --scopeType is prefix.

To be investigated.

@DivyeshVora79909
Copy link

maybe just defaulting to prefix

parser.add_argument(
    "--scopeType",
    help="A predefined scope of the crawl...",
    choices=["page", "page-spa", "prefix", "host", "domain", "any", "custom"],
    default="prefix",
)

@benoit74
Copy link
Collaborator Author

The prefix default value should already be the case in Browsertrix crawler (where this argument is used). So question is more to investigate if Browsertrix crawler default has changed, if this is intentional or a bug, if we should update our documentation or if we should pass our own default (which we prefer to avoid so that we mostly stay in-line with Browsertrix behavior).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants