Skip to content

Fix value fetch and convert issue in binlog insert path#236

Merged
shlomi-noach merged 10 commits intogithub:masterfrom
Wattpad:binlog_insert_column_lookup_issue
Sep 29, 2016
Merged

Fix value fetch and convert issue in binlog insert path#236
shlomi-noach merged 10 commits intogithub:masterfrom
Wattpad:binlog_insert_column_lookup_issue

Conversation

@pbitty
Copy link
Copy Markdown
Contributor

@pbitty pbitty commented Sep 14, 2016

This fixes #235. In the process I found a bug triggered when inserting to a renamed column via the binlog.

The source of the issue is that we were using mappedSharedColumns to drive the fetching and conversion of the insert values. For this we want to use sharedColumns since these columns are based on the original table, which is where the insert values are coming from.

We continue to use mappedSharedColumns to generate the column names in the replace statement.

I realize we didn't discuss the issue much in #235, but I was using lots of code examples, I figured it would be easier to continue in the PR if needed.

  • contributed code is using same conventions as original code
  • code is formatted via gofmt (please avoid goimports)
  • code is built via ./build.sh
  • code is tested via ./test.sh

This catches a bug in `sql.BuildDMLInsertQuery` where we are using the target
column's Charset to drive the value conversion.  In the case where we are
altering the charset, the Charset used for conversion will be different than
the original column's charset, resulting in an erroneous conversion.
This catches a bug in `sql.BuildDMLInsertQuery` where we we are fetching
the insert values using the renamed column's name, and end up fetching
the value of the wrong column.

The test in `localtests/rename` did not catch this because binlog
update events were "correcting" the error, as they follow a different
code path that does not contain the bug.
When processing binlog insert statements, we want to use `sharedColumns`
to decide which values to fetch and convert from the insert DML event.

We only want to  `mappedShareColumns` to define the column names in the
`replace into ...` statement.
@pbitty pbitty force-pushed the binlog_insert_column_lookup_issue branch from cd91b84 to 2f80c9d Compare September 14, 2016 15:28
@shlomi-noach
Copy link
Copy Markdown
Contributor

Thank you for working on this!
Before fully reviewing this, I think neither the original solution (decoding via sharedColumns) nor yours (decoding via mappedSharedColumns) are fully correct, because both are only tempering with a single value: the target (in my case) or the source (in your case).

I believe your solution will not solve migrating a utf8 into latin1, and neither of our solutions will solve migrating latin1 to latin2 (however it pains me to thing someone might actually want to do that).

@pbitty
Copy link
Copy Markdown
Contributor Author

pbitty commented Sep 19, 2016

I am still trying to learn the details behind mysql's charset conversion semantics, but it seems to me that mysql handles all the conversion of charsets when sending to/receiving from the client. So since our connection charset is utf8mb4 and we send all of our Golang strings are encoded as utf8, the server can properly decode the string and convert to the target encoding properly.

So the only place where we have to do explicit decoding is when parsing binlog events, since mysql won't be helping us there.

I still have to better confirm the above, however, I did the following test:

in localtests/alter-charset, I changed the alter statement to be:

--alter='MODIFY `t` varchar(128) CHARACTER SET latin2 NOT NULL, MODIFY `tutf8` varchar(128) CHARACTER SET latin1 NOT NULL'

therefore converting the columns t: latin1 --> latin2 and tutf8: utf8 --> latin1, and the tests passed. All strings remained intact in the end.

I am still digging deeper to get a better explanation for this behavior, but if my understanding is correct we only need to worry about decoding the source value.

@pbitty
Copy link
Copy Markdown
Contributor Author

pbitty commented Sep 19, 2016

I've added more target charsets to the test, so we have:

  • latin1 --> utf8mb4
  • latin1 --> latin2
  • utf8 --> latin1

@shlomi-noach
Copy link
Copy Markdown
Contributor

That's interesting! I'll check it out tomorrow.

BTW, let's try a recent GH feature, if you don't mind. Since you are PRing onto master (as is the convention here), can you please add me as collaborator on your branch? https://help.github.com/articles/allowing-changes-to-a-pull-request-branch-created-from-a-fork/

@pbitty
Copy link
Copy Markdown
Contributor Author

pbitty commented Sep 19, 2016

I don't mind at all. In fact, it was already enabled. :) Feel free to push to this branch.

@pbitty
Copy link
Copy Markdown
Contributor Author

pbitty commented Sep 19, 2016

No, there's no badge that I can see.

@pbitty
Copy link
Copy Markdown
Contributor Author

pbitty commented Sep 19, 2016

Cool. :) Happy to contribute. This is a great tool.

@shlomi-noach
Copy link
Copy Markdown
Contributor

There's a mystery plot involved here.

@shlomi-noach
Copy link
Copy Markdown
Contributor

Added tests for unsigned, a relevant value conversion problem:

  • unsigned-reorder
  • unsigned-rename

begin
insert into gh_ost_test values (null, 11, 23);
insert into gh_ost_test values (null, 13, 23);
insert into gh_ost_test values (null, rand(), rand());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rand() is a value [0..1), and you assign it onto an integer. It will always be 0 and the test is not reliable. Fixing.

@shlomi-noach
Copy link
Copy Markdown
Contributor

As much as I wrap my head around it this just seems fantastic and correct. I've added a bunch of tests to validate this with, and all goes well! It also applies to unsigned values.

@shlomi-noach
Copy link
Copy Markdown
Contributor

Next thing is the added alter-charset-all-dml tests, which test update and delete by non-UTF column value. These fail at the moment (as predicted?)

@shlomi-noach
Copy link
Copy Markdown
Contributor

Added fix to BuildDMLUpdateQuery, all tests now pass.

I'm sending this for production testing.

@pbitty
Copy link
Copy Markdown
Contributor Author

pbitty commented Sep 27, 2016

Good catch. I had missed the issue in BuildDMLUpdateQuery actually.

@shlomi-noach
Copy link
Copy Markdown
Contributor

Production testing is good. Merging.

@pbitty pbitty deleted the binlog_insert_column_lookup_issue branch October 26, 2016 23:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Altering column charset fails on binlog apply

2 participants