Crash in read_replica due to decoding byte 0x9d #128

pyrite357 · 2020-09-28T19:01:29Z

Describe the bug
Fresh install of 2.0.16, started replica, and 10 hours later, randomly crashed decoding something in the source

To Reproduce
I'm not sure what table/row/col it had trouble decoding, is there a way for me to tell?

Expected behavior
This same datasource wasn't having this trouble in 2.0.12 of pg_chameleon, but it could be new data entered, i'm not sure

I also expected to see something in the error log, but nothing shows up when running chameleon show_errors

Screenshots
If applicable, add screenshots to help explain your problem.

(venv) [user@host ~]$ chameleon start_replica --config default --source rpmv2 --debug
2020-09-28 12:39:08 MainProcess DEBUG pg_lib.py (659): Changing the autocommit flag to True
2020-09-28 12:39:08 MainProcess DEBUG pg_lib.py (659): Changing the autocommit flag to True
2020-09-28 12:39:08 MainProcess DEBUG pg_lib.py (659): Changing the autocommit flag to True
2020-09-28 12:39:08 MainProcess INFO global_lib.py (613): Checking if the replica for source rpmv2 is stopped 
2020-09-28 12:39:08 MainProcess INFO global_lib.py (623): Cleaning not processed batches for source rpmv2
2020-09-28 12:39:08 MainProcess DEBUG pg_lib.py (3238): Cleaning table t_log_replica_rpmv2_1
2020-09-28 12:39:08 MainProcess DEBUG pg_lib.py (3238): Cleaning table t_log_replica_rpmv2_2
2020-09-28 12:39:08 MainProcess INFO global_lib.py (554): Starting the replica daemons for source rpmv2 
2020-09-28 12:39:08 MainProcess DEBUG global_lib.py (563): Replica process for source rpmv2 is running
2020-09-28 12:39:08 read_replica DEBUG pg_lib.py (659): Changing the autocommit flag to True
2020-09-28 12:39:08 MainProcess DEBUG pg_lib.py (659): Changing the autocommit flag to True
2020-09-28 12:39:08 MainProcess DEBUG pg_lib.py (1276): Cleaning replayed batches for source rpmv2 older than 1 day
2020-09-28 12:39:08 read_replica DEBUG pg_lib.py (679): There is already a database connection active.
2020-09-28 12:39:08 read_replica DEBUG pg_lib.py (3110): Collecting schema mappings for source rpmv2
2020-09-28 12:39:08 replay_replica DEBUG pg_lib.py (659): Changing the autocommit flag to True
2020-09-28 12:39:08 read_replica DEBUG mysql_lib.py (1452): Batch data [(387, 'rpmv2-bin-logs.000001', 36208279, 't_log_replica_rpmv2_1', None)] 
2020-09-28 12:39:08 read_replica DEBUG mysql_lib.py (1000): collecting table type map
2020-09-28 12:39:08 read_replica DEBUG mysql_lib.py (1217): GTID DISABLED - log_file rpmv2-bin-logs.000001, log_position 36208279. id_batch: 387 
2020-09-28 12:39:08 read_replica DEBUG mysql_lib.py (1231): ROTATE EVENT - binlogfile rpmv2-bin-logs.000001, position 36208279. 
2020-09-28 12:39:09 MainProcess ERROR global_lib.py (567): Read process alive: False - Replay process alive: True
2020-09-28 12:39:09 MainProcess ERROR global_lib.py (568): Stack trace: Traceback (most recent call last):
  File "/opt/data/rpmfsadm/venv/lib64/python3.6/site-packages/pg_chameleon/lib/global_lib.py", line 496, in read_replica
    self.mysql_source.read_replica()
  File "/opt/data/rpmfsadm/venv/lib64/python3.6/site-packages/pg_chameleon/lib/mysql_lib.py", line 1453, in read_replica
    replica_data=self.__read_replica_stream(batch_data)
  File "/opt/data/rpmfsadm/venv/lib64/python3.6/site-packages/pg_chameleon/lib/mysql_lib.py", line 1311, in __read_replica_stream
    for row in binlogevent.rows:
  File "/opt/data/rpmfsadm/venv/lib64/python3.6/site-packages/pymysqlreplication/row_event.py", line 433, in rows
    self._fetch_rows()
  File "/opt/data/rpmfsadm/venv/lib64/python3.6/site-packages/pymysqlreplication/row_event.py", line 428, in _fetch_rows
    self.__rows.append(self._fetch_one_row())
  File "/opt/data/rpmfsadm/venv/lib64/python3.6/site-packages/pymysqlreplication/row_event.py", line 519, in _fetch_one_row
    row["after_values"] = self._read_column_data(self.columns_present_bitmap2)
  File "/opt/data/rpmfsadm/venv/lib64/python3.6/site-packages/pymysqlreplication/row_event.py", line 136, in _read_column_data
    values[name] = self.__read_string(column.length_size, column)
  File "/opt/data/rpmfsadm/venv/lib64/python3.6/site-packages/pymysqlreplication/row_event.py", line 224, in __read_string
    string = string.decode(encoding)
  File "/usr/lib64/python3.6/encodings/cp1252.py", line 15, in decode
    return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 60: character maps to <undefined>

2020-09-28 12:39:09 MainProcess ERROR global_lib.py (574): Read daemon crashed. Terminating the replay daemon.
2020-09-28 12:39:09 MainProcess DEBUG pg_lib.py (659): Changing the autocommit flag to True
2020-09-28 12:39:09 MainProcess INFO global_lib.py (599): Replica process for source rpmv2 ended

Environment(please complete the following information):

OS: CentOS 8.2
MySQL Version: MariaDB Server version: 10.1.19-MariaDB - mariadb.org binary distribution
PostgreSQL Version: 12.4
Python Version: 3.6.8
Cloud hosted database: local install

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

pyrite357 · 2020-09-28T19:57:02Z

More info:

(venv) [user@host ~]$ pip list
Package           Version
----------------- ----------
certifi           2019.11.28
chardet           3.0.4
daemonize         2.5.0
idna              2.9
mysql-replication 0.22
pg-chameleon      2.0.16
pip               20.2.3
psycopg2-binary   2.8.4
PyMySQL           0.10.1
PyYAML            5.3
requests          2.23.0
rollbar           0.14.7
setuptools        39.2.0
six               1.14.0
tabulate          0.8.6
urllib3           1.25.8
wheel             0.35.1

the4thdoctor · 2020-10-01T06:17:51Z

the offending character is 0x9d which should be the right double quotation mark ”
what's the database/table character set? it could be a setting ignored by the older pymysql and now enforced by the 0.10.

pyrite357 · 2020-10-05T23:59:17Z

I seem to have fixed my problem by re-initing the source, then enable, then start. But I'm waiting to see if it crashes again in 10 hours time or not. Please leave this issue open until I confirm :)

I kind of figured out how to view the binlog in phpMyAdmin, although MySQL comes with a whole utility (mysqlbinlog) to do this too, which I didn't try, but the SQL is:

SHOW BINLOG EVENTS FROM x

Where x corresponds to the position in the debug line below (x = 36208279)

2020-09-28 12:39:08 read_replica DEBUG mysql_lib.py (1452): Batch data [(387, 'rpmv2-bin-logs.000001', 36208279, 't_log_replica_rpmv2_1', None)]

It would be nice if the mysql_lib told me what table it was that had the offending data, but since it does not, this is how you can go find it. Leaving this info here for anyone who may have the need to investigate this in the future.

the4thdoctor · 2020-10-06T06:39:10Z

cool, thanks for the feedback. I'm still convinced it's an encoding issue added in the recent pymysql update.
I'll try to reproduce the problem and eventually come with a way to trap the error.

pyrite357 · 2020-11-06T15:34:01Z

I'm not sure if it's related, but my MySQL source has 2 databases it is replicating, and in one of them I found a table with a tinyint(1) column that had a value of 2, and my config has type_override on. I had seen some errors in the PostgreSQL logs regarding this, which I think was throwing errors when I did init_replica or start_replica. I have now changed that column to a normal integer. This might have just been a completely separate problem, I'm not sure. But it fixed that particular problem.

You had asked about the charset though. In the default.yml, it's set to utf8 for charset under db_conn. The source has 2 databases though (schema_mappings), and each one has its own collation. database1 has latin1_swedish_ci, but also has some tables with utf8_general_ci collations also.

database1 is latin1/latin1_swedish_ci for the database itself.
database2 has 25 tables, 24 of them are utf8_general_ci and 1 of them is latin1_swedish_ci. The charset of the database itself is utf8 though.

One possible area of trouble is only being able to define a MySQL source as one particular charset, but then having that source have multiple databases each with a different charset than the one defined in the default.yml config for the source db_conn. So if you are right about it being an encoding issue, could that be why perhaps?

the4thdoctor · 2020-11-07T10:07:14Z

Thanks for your feedback. The issue with the override to boolean is normal. boolean ca store only true/false and postgresql is very strict about that. The override was implemented to have the tinyint translated if the field is used as a boolean (if i correctly remember MySQL 5.6 doesn't have boolean type). If you use tinyint to store integers just remove the override.

For the encoding, as the error is caused by the driver PyMySQL during the replica process. I can check if there is something new implemented in PyMySQL that caused the issue with the newer library.
Thanks for now.

pyrite357 · 2021-01-06T03:12:24Z

Still having this issue.

Have you done any investigative work since Nov 7 @the4thdoctor ? Happy New Year btw!

pyrite357 · 2021-01-06T21:20:32Z

Good news, I can reproduce it now, it is a fancy quote causing the issue. The charset of the table in question is latin1_swedish_ci, however the majority of that source database are tables using utf8_general_ci. In pg_chameleon, the source charset is set to utf8. I'm going to work on seeing what I can do to fix the charsets in my source database.

I don't mind taking responsibility for my source database and the poor normalization it had, however it would be an improvement to pg_chameleon to fail gracefully when/if this happens (and not just crash), maybe even write something to the error log, and skip over a transaction, but keep running, that way way production isn't completely down/broken.

pyrite357 · 2022-05-03T23:59:33Z

@the4thdoctor Even though it was a fancy quote in plain view, it was in fact the ✔️ character in full. The source MySQL database had a mix of latini1_swedish_ci and utf8 collations, and some ascii_general_ci ones too. But the pg_chameleon default.yml file can only define one characterset for the source database. I fixed it by updating all the tables/columns in the source MySQL database to utf8_general_ci, and now that character no longer causes pg_chameleon to stop/break.

the4thdoctor self-assigned this Sep 29, 2020

the4thdoctor added the v2.0 label Sep 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Crash in read_replica due to decoding byte 0x9d #128

Crash in read_replica due to decoding byte 0x9d #128

pyrite357 commented Sep 28, 2020 •

edited

Loading

pyrite357 commented Sep 28, 2020

the4thdoctor commented Oct 1, 2020

pyrite357 commented Oct 5, 2020

the4thdoctor commented Oct 6, 2020

pyrite357 commented Nov 6, 2020

the4thdoctor commented Nov 7, 2020

pyrite357 commented Jan 6, 2021

pyrite357 commented Jan 6, 2021

pyrite357 commented May 3, 2022 •

edited

Loading

Crash in read_replica due to decoding byte 0x9d #128

Crash in read_replica due to decoding byte 0x9d #128

Comments

pyrite357 commented Sep 28, 2020 • edited Loading

pyrite357 commented Sep 28, 2020

the4thdoctor commented Oct 1, 2020

pyrite357 commented Oct 5, 2020

the4thdoctor commented Oct 6, 2020

pyrite357 commented Nov 6, 2020

the4thdoctor commented Nov 7, 2020

pyrite357 commented Jan 6, 2021

pyrite357 commented Jan 6, 2021

pyrite357 commented May 3, 2022 • edited Loading

pyrite357 commented Sep 28, 2020 •

edited

Loading

pyrite357 commented May 3, 2022 •

edited

Loading