Skip to content

Commit

Permalink
Merge pull request #13 from andyfcx/main
Browse files Browse the repository at this point in the history
fix: make sure to default utf-8 as encoding and strip unwanted single or double quotes
  • Loading branch information
D4Vinci authored Nov 21, 2024
2 parents e9b0102 + 63112bf commit 18eb96a
Showing 1 changed file with 5 additions and 2 deletions.
7 changes: 5 additions & 2 deletions scrapling/engines/camo.py
Original file line number Diff line number Diff line change
Expand Up @@ -109,8 +109,11 @@ def fetch(self, url: str) -> Response:
content_type = res.headers.get('content-type', '')
# Parse charset from content-type
encoding = 'utf-8' # default encoding
if 'charset=' in content_type.lower():
encoding = content_type.lower().split('charset=')[-1].split(';')[0].strip()
content_type_lower = content_type.lower()
if 'charset=' in content_type_lower:
encoding = content_type_lower.split('charset=')[-1].split(';')[0].strip().strip('"').strip("'")
if 'utf-8' in encoding:
encoding = 'utf-8'

status_text = res.status_text
# PlayWright API sometimes give empty status text for some reason!
Expand Down

0 comments on commit 18eb96a

Please sign in to comment.