-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Mozilla PSL helper #19
Conversation
WalkthroughThe changes introduce a new implementation for managing the Mozilla Public Suffix List (PSL) using a Trie data structure in the Changes
Poem
Warning Rate limit exceeded@jschlyter has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 6 minutes and 58 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📜 Recent review detailsConfiguration used: CodeRabbit UI ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (2)
🚧 Files skipped from review as they are similar to previous changes (2)
Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media? 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
CodeRabbit Configuration File (
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Outside diff range and nitpick comments (5)
tests/test_dns_mozpsl.py (2)
12-19
: Enhance test coverage with additional edge casesWhile the current test cases cover various domain patterns, consider adding tests for:
- International Domain Names (IDN) with punycode
- Edge cases (empty string, None, malformed domains)
- Maximum length domain names
Example additional test cases:
# IDN test assert psl.coredomain("www.münchen.de.") == ("münchen.de", "") # Edge cases with pytest.raises(ValueError): psl.coredomain("") with pytest.raises(TypeError): psl.coredomain(None) with pytest.raises(ValueError): psl.coredomain("invalid..domain.") # Max length domain (255 chars) long_domain = "a" * 63 + "." + "b" * 63 + "." + "c" * 63 + "." + "d" * 61 + "." assert psl.coredomain(long_domain) is not None
8-8
: Add documentation to improve test maintainabilityConsider adding:
- Docstring explaining the test's purpose and methodology
- Type hints for better code maintainability
- Comments explaining complex test cases
Example improvement:
def test_mozpsl() -> None: """Test Mozilla Public Suffix List functionality. Tests the following scenarios: 1. Basic domain extraction (www.ck) 2. Government domains (gov.ck) 3. Complex subdomains (microsoft.com) 4. AWS service domains 5. Error handling for invalid domains """pyproject.toml (1)
25-25
: Consider more actively maintained alternativesFor handling internationalized domain names (IDNs), consider these alternatives:
idna
package - More actively maintained and widely used- Python's built-in
encodings.idna
- Part of the standard libraryThe choice depends on your specific needs:
- If you need pure punycode encoding/decoding, the current dependency is fine
- If you need full IDNA 2008 support, consider switching to
idna
- If you only need basic IDNA support, the standard library might suffice
dnstapir/dns/mozpsl.py (2)
9-9
: Correct the docstring format by removing the extra quotation markThe docstring has an extra double quote at the beginning, which may lead to formatting issues.
Apply this diff to fix the docstring:
-class TrieNode: - """ "Storage class for Trie""" +class TrieNode: + """Storage class for Trie"""
70-79
: Include a timeout in the HTTP request to prevent hangingAdding a timeout to the
httpx.get
request ensures that the application doesn't hang indefinitely if the request takes too long to respond.Apply this diff to add a timeout parameter:
def load_psl_url(self, url: str) -> None: """Load PSL from URL""" response = httpx.get( url, headers={ "Accept-Encoding": "gzip", }, + timeout=10.0, # Timeout in seconds ) response.raise_for_status() self.load_psl(io.StringIO(response.text))
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
⛔ Files ignored due to path filters (1)
poetry.lock
is excluded by!**/*.lock
📒 Files selected for processing (3)
dnstapir/dns/mozpsl.py
(1 hunks)pyproject.toml
(1 hunks)tests/test_dns_mozpsl.py
(1 hunks)
🔇 Additional comments (3)
dnstapir/dns/mozpsl.py (3)
108-108
: Clarify the adjustment of labels
for exclusions
The line labels -= 2
decreases the label count by 2 for exclusions. This adjustment could be unintuitive to future readers.
Consider adding a comment to explain why labels
is decremented by 2 in the case of exclusions. This aids in maintainability and understanding of the code.
if line[0] == "!":
line = line[1:]
labels -= 2 # Exclusions have two fewer labels in the core domain
2-2
:
Ensure typing.Self
is compatible with the project's Python version
The use of Self
from the typing
module is only available in Python 3.11 and above. If the project aims to support earlier versions of Python, consider replacing Self
with a type variable or a forward reference.
Please confirm the minimum Python version supported by the project. If compatibility with earlier versions (pre-3.11) is required, you can adjust the code as follows:
-import io
-from typing import Self
...
-class TrieNode:
- # ...
- self.children: dict[str, Self] = {}
+import io
+from typing import TypeVar
+T = TypeVar('T', bound='TrieNode')
...
class TrieNode:
# ...
- self.children: dict[str, Self] = {}
+ self.children: dict[str, T] = {}
Also applies to: 14-14
5-5
:
Verify the availability and correct usage of the punycode
library
The punycode
module is not a standard Python library and may not be available. Instead, consider using the built-in idna
encoding for handling Internationalized Domain Names (IDNs).
Please ensure that the punycode
library is installed and correctly implements the required functionality. You might prefer to use the built-in idna
codec as shown below:
-import punycode
+import idna
...
# Convert from Unicode
-lbls = punycode.convert(line).split(".")
+lbls = idna.encode(line).decode('ascii').split(".")
Run the following script to check for the availability of the punycode
package:
✅ Verification successful
The punycode
library is correctly specified as a project dependency
The codebase correctly includes punycode
version 0.2.1 as a direct dependency in pyproject.toml
. While using the built-in idna
codec is an alternative approach, the current implementation is valid and the dependency is properly managed through Poetry.
🏁 Scripts executed
The following scripts were executed for the analysis:
Script:
#!/bin/bash
# Description: Verify if 'punycode' is available in the environment.
# Check if 'punycode' is installed
pip show punycode
# Search for 'punycode' in PyPI
pip search punycode
Length of output: 183
Script:
#!/bin/bash
# Let's check if punycode is used in the codebase and how it's implemented
rg -l "punycode" --type py
# Check the content of the file that imports punycode
rg "punycode" -A 5 -B 5 --type py
# Look for any requirements or setup files that might list dependencies
fd "requirements.txt|setup.py|pyproject.toml"
# If found, check their content
fd "requirements.txt|setup.py|pyproject.toml" -x cat {}
Length of output: 3254
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Much better than the original. Good job @jschlyter and coderabbit.
Code from @morkrost
Summary by CodeRabbit
New Features
Bug Fixes
Tests
PublicSuffixList
class, validating core functionality and error handling for various domain inputs.