-
-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crash in read_replica due to decoding byte 0x9d #128
Comments
More info:
|
the offending character is 0x9d which should be the right double quotation mark ” |
I seem to have fixed my problem by re-initing the source, then enable, then start. But I'm waiting to see if it crashes again in 10 hours time or not. Please leave this issue open until I confirm :) I kind of figured out how to view the binlog in phpMyAdmin, although MySQL comes with a whole utility (mysqlbinlog) to do this too, which I didn't try, but the SQL is:
Where x corresponds to the position in the debug line below (x = 36208279)
It would be nice if the mysql_lib told me what table it was that had the offending data, but since it does not, this is how you can go find it. Leaving this info here for anyone who may have the need to investigate this in the future. |
cool, thanks for the feedback. I'm still convinced it's an encoding issue added in the recent pymysql update. |
I'm not sure if it's related, but my MySQL source has 2 databases it is replicating, and in one of them I found a table with a tinyint(1) column that had a value of 2, and my config has type_override on. I had seen some errors in the PostgreSQL logs regarding this, which I think was throwing errors when I did init_replica or start_replica. I have now changed that column to a normal integer. This might have just been a completely separate problem, I'm not sure. But it fixed that particular problem. You had asked about the charset though. In the default.yml, it's set to utf8 for charset under db_conn. The source has 2 databases though (schema_mappings), and each one has its own collation. database1 has latin1_swedish_ci, but also has some tables with utf8_general_ci collations also. database1 is latin1/latin1_swedish_ci for the database itself. One possible area of trouble is only being able to define a MySQL source as one particular charset, but then having that source have multiple databases each with a different charset than the one defined in the default.yml config for the source db_conn. So if you are right about it being an encoding issue, could that be why perhaps? |
Thanks for your feedback. The issue with the override to boolean is normal. boolean ca store only true/false and postgresql is very strict about that. The override was implemented to have the tinyint translated if the field is used as a boolean (if i correctly remember MySQL 5.6 doesn't have boolean type). If you use tinyint to store integers just remove the override. For the encoding, as the error is caused by the driver PyMySQL during the replica process. I can check if there is something new implemented in PyMySQL that caused the issue with the newer library. |
Still having this issue. Have you done any investigative work since Nov 7 @the4thdoctor ? Happy New Year btw! |
Good news, I can reproduce it now, it is a fancy quote causing the issue. The charset of the table in question is latin1_swedish_ci, however the majority of that source database are tables using utf8_general_ci. In pg_chameleon, the source charset is set to utf8. I'm going to work on seeing what I can do to fix the charsets in my source database. I don't mind taking responsibility for my source database and the poor normalization it had, however it would be an improvement to pg_chameleon to fail gracefully when/if this happens (and not just crash), maybe even write something to the error log, and skip over a transaction, but keep running, that way way production isn't completely down/broken. |
@the4thdoctor Even though it was a fancy quote in plain view, it was in fact the ✔️ character in full. The source MySQL database had a mix of latini1_swedish_ci and utf8 collations, and some ascii_general_ci ones too. But the pg_chameleon default.yml file can only define one characterset for the source database. I fixed it by updating all the tables/columns in the source MySQL database to utf8_general_ci, and now that character no longer causes pg_chameleon to stop/break. |
Describe the bug
Fresh install of 2.0.16, started replica, and 10 hours later, randomly crashed decoding something in the source
To Reproduce
I'm not sure what table/row/col it had trouble decoding, is there a way for me to tell?
Expected behavior
This same datasource wasn't having this trouble in 2.0.12 of pg_chameleon, but it could be new data entered, i'm not sure
I also expected to see something in the error log, but nothing shows up when running chameleon show_errors
Screenshots
If applicable, add screenshots to help explain your problem.
Environment(please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: