ISSUE 393 : Decoding the content into utf-8 to avoid encoding issues #394

BrayanMnz · 2023-12-01T22:20:52Z

The issue is that we are wrongly encoding a bytes object parsed to a string
what we need to do instead is, decode the bytes object directly.

This commit changes that, to decoding the content directly from bytes into utf-8

See, the results after these changes:

#393 (comment)

…tats_bref function. The issue is because we are wrongly encoding a bytes object parsed to string what we need to do instead is, decode the bytes object directly.

…stats_bref function. The issue is because we are wrongly encoding a bytes object parsed to string what we need to do instead is, decode the bytes object directly.

bdilday · 2023-12-01T23:09:23Z

I think this change was on purpose in order to workaround a different issue - see
#223
#218

you may want to double check that this PR doesn't reintroduce the intermittent truncated dataframe issue mentioned in #218

BrayanMnz · 2023-12-02T15:24:55Z

Could you elaborate on how to replicate the old issue? @bdilday

Just to validate this not introduce a regression

BrayanMnz · 2023-12-05T00:37:57Z

Hi, @bdilday

Replicating the old issue, there is no data truncated with my fix.

I used the same example as the previous issue

pybaseball-2023-12-04_20.36.11.mp4

bdilday · 2023-12-08T14:30:28Z

Could you elaborate on how to replicate the old issue? @bdilday

Just to validate this not introduce a regression

it's tricky to replicate because it depends not only on the data but on the (randomly generated) links in the page header. That is, if the page contained a link to the german language or spanish language fbref pages, AND a player in the table had a non-ascii character in their name, then the table got truncated.

but the problem seems to be resolved now. Here's an example that should reproduce the original issue but works fine
https://gist.github.com/bdilday/6e6bef77a5e42776a7e6da7bee12178d

BrayanMnz · 2023-12-08T15:23:13Z

So, my PR should also solve the issue reported on ISSUE 393.
Could you review it, again?

Also tested the snippet you've shared with my changes and is working as expected.

FYI @bdilday

bdilday · 2023-12-15T22:08:09Z

So, my PR should also solve the issue reported on ISSUE 393. Could you review it, again?

Also tested the snippet you've shared with my changes and is working as expected.

FYI @bdilday

Yes, I think this PR looks good to go!

BrayanMnz · 2023-12-18T14:59:04Z

Hi, @tjburch @schorrm

could someone merge this?

BrayanMnz added 2 commits December 1, 2023 18:16

Decoding the content into utf-8 to avoid encoding issues in batting_s…

ef3e52a

…tats_bref function. The issue is because we are wrongly encoding a bytes object parsed to string what we need to do instead is, decode the bytes object directly.

Decoding the content into utf-8 to avoid encoding issues in batting_…

9eb173a

…stats_bref function. The issue is because we are wrongly encoding a bytes object parsed to string what we need to do instead is, decode the bytes object directly.

schorrm merged commit 1c05162 into jldbc:master Dec 18, 2023
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ISSUE 393 : Decoding the content into utf-8 to avoid encoding issues #394

ISSUE 393 : Decoding the content into utf-8 to avoid encoding issues #394

BrayanMnz commented Dec 1, 2023

bdilday commented Dec 1, 2023

BrayanMnz commented Dec 2, 2023

BrayanMnz commented Dec 5, 2023

bdilday commented Dec 8, 2023

BrayanMnz commented Dec 8, 2023 •

edited

Loading

bdilday commented Dec 15, 2023

BrayanMnz commented Dec 18, 2023

ISSUE 393 : Decoding the content into utf-8 to avoid encoding issues #394

ISSUE 393 : Decoding the content into utf-8 to avoid encoding issues #394

Conversation

BrayanMnz commented Dec 1, 2023

bdilday commented Dec 1, 2023

BrayanMnz commented Dec 2, 2023

BrayanMnz commented Dec 5, 2023

bdilday commented Dec 8, 2023

BrayanMnz commented Dec 8, 2023 • edited Loading

bdilday commented Dec 15, 2023

BrayanMnz commented Dec 18, 2023

BrayanMnz commented Dec 8, 2023 •

edited

Loading