Database Administrators Stack Exchange is a question and answer site for database professionals who wish to improve their database skills and learn from others in the community. It's 100% free, no registration required.

Trying to run pt-table-checksum to detect slave drift in MySQL replication. My slaves are not using the standard TCP listening port however, and so when I run the tool I get an error:

Cannot connect to P=3306,h=,p=...

It still checksums the master's tables, but without being able to connect to the slaves it's not very useful. I can't figure out how to specify alternate ports for the slaves when running the command. My current command looks like

2 Answers
2

The issue is that the replicas don't report their ports to the master. By default pt-table-checksum tries to connect by filling unknowns with defaults; if the unknowns aren't using defaults, this will fail. In this case, there are two options:

1) Use the --recursion-method option to change from inspecting SHOW PROCESSLIST to inspecting SHOW SLAVE HOSTS, and configure slaves to report their host. I don't recommend this.

2) I recommend, instead, that you use --recursion-method=dsn and create a table full of DSN (data source name) instructions that tell the tool how to connect to each replica. If this table is foo.bar, then you will use --recursion-method=dsn=D=foo,t=bar.

This is only available in the 2.0 series of the toolkit, but you should be using that anyway, because pt-table-checksum is hugely improved in 2.0.

Thanks, I am using 2.02 of the toolkit. So for the DSN table, it should be in a separate database on the master mysql server? Also, should all the slaves DSN information be in one column in that table (h=0.0.0.0,P=3388,p=password)?
–
wjimenez5271Feb 1 '12 at 15:55

1

See Barons #2 for the syntax on how to specify the location of the table. Where it "should" be is up to you. Where it needs to be for purposes of the tool is irrelevant assuming the credentials running the tool have privs to write there. And yes, you'll put in one row with a DSN as you described.
–
atxdbaFeb 1 '12 at 16:20

I don't remember the structure of the table exactly (need to check docs...) but I believe you need an id and a dsn column; id should be an int, dsn should be the h=,P=,p= syntax.
–
Baron SchwartzFeb 1 '12 at 16:48

Personally I don't like putting credentials in something that will show up in the processlist so you can do

pt-table-checksum --defaults-file=/path/to/my.cnf h=myhost,P=port

where the my.cnf looks like

[client]
user=usernameXYZ
password=asdf234JKL

If you want this to replicate through to your slaves you need to specify a --replicate option that tells it the table to place the checksums in (and consequently put the slave values into on each respective slave). If this is your first time you might want to have it create that table for you with --create-replicate-table.

It won't need to directly connect to the slaves unless you want it to monitor the slaves for falling behind. It basically issues queries like

where the literal master values are written to the binlog along w/ the select portion of the replace. When that executes on the slaves the select does it's think to get the slaves checksums.

I've spent a lot of time with this tool recently so start out with that and reply with more specific problems you need help with so I don't just rewrite the entire manual and every possible thing that can go wrong.

Note: if you run

export PTDEBUG=1

Before any percona script it will give you TONS of debug info that can help point you in the direction of what's going wrong. Also remember these are all perl script so you can inspect the source yourself to (maybe) see what's going wrong.

Thanks that's super helpful. Maybe I am not understanding something fundamental here, but if you don't use the --replicate option to have it replicate through your slaves, how would you be able to compare the check-sums between master and slave hosts? Wouldn't that defeat the purpose? Or can it still perform checksums on the slaves without using the --replicate option?
–
wjimenez5271Feb 1 '12 at 7:14

If you don't use replicate then it just prints to stdout, and yes I agree that's pretty pointless. Even with --replicate you risk certain chunks being mutated between checks so you have to be aware if your dataset will be subject to mutating on the fringes. Doing it w/o the replicate option is simple. Stop all writes to every DB involved. Then you'll be sure. But then why not just run checksum table x,y,z. The whole point of this tool is to run live checkseums
–
atxdbaFeb 1 '12 at 7:31

I think atxdba's comments here refer in part to the v1.0 version of pt-table-checksum. The tool is completely rewritten in the 2.0 series and it doesn't even have an option to operate in "non --replicate mode" anymore. Also, the standard MySQL command-line arguments for username, password, etc are all accepted, but as he points out, it's best to use a defaults file for passwords. I'll post another answer with what I think is the real solution.
–
Baron SchwartzFeb 1 '12 at 13:35

Thanks for the further detail on that. So what I am seeing happening is the tool is connecting to the master mysql, but it is not connecting to the slave. When I run it I see the following error: 'Cannot connect to P=3306,h=,p=...' Which makes me think that it is trying to connect to the slaves on the standard port and is not able. It can in fact connect to the master on the special port because it is listing the tables as it runs through the check-sums. I did notice the SHOW SLAVE HOSTS command on the master has the same faulty information about the port of the connected slave.
–
wjimenez5271Feb 1 '12 at 15:39

1

See Baron's answer for specifying DSN's that don't match the port of your master. (If they're non standard but match the port of your master it should 'inherit' that config option and apply it appropriately). Remember you'll have to make the same grants with appropriate privs on the slaves for it to be able to connect. But again, this connection is to monitor replication lag and hold off on issuing more checksums if one is falling behind; NOT to perform the actual checksum which is propagated through the binlogs.
–
atxdbaFeb 1 '12 at 15:54