wagnerbianchi.com

MariaDB 10.1, MSR and MTS

março 24th, 2016 Bianchi Posted in MySQL Manutenção, MySQL Replication No Comments »

As a preparation of my presentation together with Max Bubenick at 2016’s Percona Live, happening in Santa Clara, CA, US, I’m running as many tests as I can to check all the maturity of the technology of feature we are about to talking about. This is a common sense that you need to go over the planned to be presented feature in order to address some of the implicit subjects. This way, we stared discussing about a crash on MariaDB 10.1 setup for a Multi-Source Replication Slave, being this slave server a Multi-Threaded Slave as well running with 12 threads dedicated to execute raw updates from the relay log, having at least 3 out of those 12 threads dedicated to each of the exiting domain_id. You can check the numbers of threads dedicated to each domain_id interpreting the contents of mysql.gtid_slave_pos table to keep track of their current position (the global transaction ID of the last transaction applied). Using the table allows the slave to maintain a consistent value for the gtid_slave_pos system variable across server restarts. That is, as a I have setup 3 masters and one multi-source slave, in this scenario I’ve got domains #2, #3, #4, being the multi-source slave the domain #1. That justifies the 12 threads and at least 3 for each domain.

Below, the designed architecture:

Screen Shot 2016-03-24 at 9.23.40 AM

box01 - @@server_id=1, @@gtid_domain_id=1
box02 - @@server_id=2, @@gtid_domain_id=2
box03 - @@server_id=3, @@gtid_domain_id=3
box04 - @@server_id=4, @@gtid_domain_id=4

After configuring the multi-source replication and having configuration files well set, I started some tests.

#: Connection name with box02
MariaDB [(none)]> change master 'box02' to master_host='192.168.0.102', master_user='repl', master_password='Bi@nchI', master_use_gtid=current_pos;
#: Connection name with box03
MariaDB [(none)]> change master 'box03' to master_host='192.168.0.102', master_user='repl', master_password='Bi@nchI', master_use_gtid=current_pos;
#: Connection name with box04
MariaDB [(none)]> change master 'box04' to master_host='192.168.0.104', master_user='repl', master_password='Bi@nchI', master_use_gtid=current_pos;

Just to make sure we’re on the same page, I created on the master’s side individual databases to make the masters to write just to their own database schema to avoid conflicts on writing to the same table (that’s an existing successful case I have to formulate a new blog to tell). So, after that, I used sysbench to prepare the test case, creating 10 tables in each database schema with 10000 rows each table. Finally, I run sysbench with the following structure to execute a simple 60 secs test using OLTP script:

[vagrant@maria0X ~]$ sudo sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --oltp-table-size=10000 --mysql-db=box0X --oltp-tables-count=10 --mysql-user=root --db-driver=mysql --mysql-table-engine=innodb --max-time=60 --max-requests=0 --report-interval=60 --num-threads=50 --mysql-engine-trx=yes run

I started the above sysbench on all the three masters and then, the multi-source slave has crashed with the below error:

2016-03-23 19:54:57 140604957547264 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-03-23 19:54:57 140604957547264 [Warning] Slave: Running in read-only mode Error_code: 1836
2016-03-23 19:54:57 140604957547264 [Warning] Slave: Table 'sbtest2' is read only Error_code: 1036
2016-03-23 19:54:57 140604957547264 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-03-23 19:54:57 140604957547264 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-03-23 19:54:57 140604957547264 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
[...snip...]
2016-03-23 19:54:57 140604957244160 [ERROR] Slave (additional info): Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-03-23 19:54:57 140604957244160 [Warning] Slave: Running in read-only mode Error_code: 1836
2016-03-23 19:54:57 140604957244160 [Warning] Slave: Table 'sbtest1' is read only Error_code: 1036
2016-03-23 19:54:57 140604957244160 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-03-23 19:54:57 140604957244160 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
2016-03-23 19:54:57 140604957244160 [Warning] Slave: Commit failed due to failure of an earlier commit on which this one depends Error_code: 1964
[...snip...]
2016-03-23 19:59:14 140604959972096 [Note] /usr/sbin/mysqld: Normal shutdown

The problem here is clear, “Commit failed due to failure of an earlier commit on which this one depends”.

Furthermore, when I tried to start multi-source slave back, I found the following events added to the error log:

2016-03-23 19:59:17 139987887904800 [Note] /usr/sbin/mysqld (mysqld 10.1.11-MariaDB-log) starting as process 3996 ...
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: The InnoDB memory heap is disabled
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Memory barrier is not used
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Compressed tables use zlib 1.2.3
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Using Linux native AIO
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Using generic crc32 instructions
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Completed initialization of buffer pool
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Highest supported file format is Barracuda.
InnoDB: Transaction 46834 was in the XA prepared state.
InnoDB: Transaction 46834 was in the XA prepared state.
InnoDB: Transaction 46835 was in the XA prepared state.
InnoDB: Transaction 46835 was in the XA prepared state.
InnoDB: Transaction 46836 was in the XA prepared state.
InnoDB: Transaction 46836 was in the XA prepared state.
InnoDB: Transaction 46838 was in the XA prepared state.
InnoDB: Transaction 46838 was in the XA prepared state.
InnoDB: Transaction 46839 was in the XA prepared state.
InnoDB: Transaction 46839 was in the XA prepared state.
InnoDB: 6 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 4 row operations to undo
InnoDB: Trx id counter is 47616
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: 128 rollback segment(s) are active.
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Waiting for purge to start
InnoDB: Starting in background the rollback of uncommitted transactions
2016-03-23 19:59:17 7f51503fe700 InnoDB: Rolling back trx with id 46837, 4 rows to undo
2016-03-23 19:59:17 139987215443712 [Note] InnoDB: Rollback of trx with id 46837 completed
2016-03-23 19:59:17 7f51503fe700 InnoDB: Rollback of non-prepared transactions completed
2016-03-23 19:59:17 139987887904800 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.26-76.0 started; log sequence number 124266988
2016-03-23 19:59:17 139987887904800 [Note] Plugin 'FEEDBACK' is disabled.
2016-03-23 19:59:17 7f517854d820 InnoDB: Starting recovery for XA transactions...
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction 46839 in prepared state after recovery
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction contains changes to 7 rows
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction 46838 in prepared state after recovery
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction contains changes to 5 rows
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction 46836 in prepared state after recovery
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction contains changes to 7 rows
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction 46835 in prepared state after recovery
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction contains changes to 5 rows
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction 46834 in prepared state after recovery
2016-03-23 19:59:17 7f517854d820 InnoDB: Transaction contains changes to 7 rows
2016-03-23 19:59:17 7f517854d820 InnoDB: 5 transactions in prepared state after recovery
2016-03-23 19:59:17 139987887904800 [Note] Found 5 prepared transaction(s) in InnoDB
2016-03-23 19:59:17 139987887904800 [ERROR] Found 5 prepared transactions! It means that mysqld was not shut down properly last time and critical recovery information (last binlog or tc.log file) was manually deleted after a crash. You have to start mysqld with --tc-heuristic-recover switch to commit or rollback pending transactions.
2016-03-23 19:59:17 139987887904800 [ERROR] Aborting

So, to get the MSR Slave back:

[vagrant@maria01 ~]$ sudo mysqld --defaults-file=/etc/my.cnf --tc-heuristic-recover=ROLLBACK
2016-03-23 20:18:20 140348206848032 [Note] mysqld (mysqld 10.1.11-MariaDB-log) starting as process 4047 ...
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Using mutexes to ref count buffer pool pages
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: The InnoDB memory heap is disabled
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Memory barrier is not used
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Compressed tables use zlib 1.2.3
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Using Linux native AIO
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Using generic crc32 instructions
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Initializing buffer pool, size = 128.0M
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Completed initialization of buffer pool
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Highest supported file format is Barracuda.
InnoDB: Transaction 46834 was in the XA prepared state.
InnoDB: Transaction 46834 was in the XA prepared state.
InnoDB: Transaction 46835 was in the XA prepared state.
InnoDB: Transaction 46835 was in the XA prepared state.
InnoDB: Transaction 46836 was in the XA prepared state.
InnoDB: Transaction 46836 was in the XA prepared state.
InnoDB: Transaction 46838 was in the XA prepared state.
InnoDB: Transaction 46838 was in the XA prepared state.
InnoDB: Transaction 46839 was in the XA prepared state.
InnoDB: Transaction 46839 was in the XA prepared state.
InnoDB: 5 transaction(s) which must be rolled back or cleaned up
InnoDB: in total 0 row operations to undo
InnoDB: Trx id counter is 48128
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: 128 rollback segment(s) are active.
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Waiting for purge to start
InnoDB: Starting in background the rollback of uncommitted transactions
2016-03-23 20:18:21 7fa534bff700 InnoDB: Rollback of non-prepared transactions completed
2016-03-23 20:18:21 140348206848032 [Note] InnoDB: Percona XtraDB (http://www.percona.com) 5.6.26-76.0 started; log sequence number 124267433
2016-03-23 20:18:21 140348206848032 [Note] Plugin 'FEEDBACK' is disabled.
2016-03-23 20:18:21 140348206848032 [Note] Heuristic crash recovery mode
2016-03-23 20:18:21 7fa55d039820 InnoDB: Starting recovery for XA transactions...
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction 46839 in prepared state after recovery
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction contains changes to 7 rows
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction 46838 in prepared state after recovery
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction contains changes to 5 rows
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction 46836 in prepared state after recovery
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction contains changes to 7 rows
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction 46835 in prepared state after recovery
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction contains changes to 5 rows
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction 46834 in prepared state after recovery
2016-03-23 20:18:21 7fa55d039820 InnoDB: Transaction contains changes to 7 rows
2016-03-23 20:18:21 7fa55d039820 InnoDB: 5 transactions in prepared state after recovery
2016-03-23 20:18:21 140348206848032 [Note] Found 5 prepared transaction(s) in InnoDB
2016-03-23 20:18:21 140347457898240 [Note] InnoDB: Dumping buffer pool(s) not yet started
2016-03-23 20:18:21 140348206848032 [Note] Please restart mysqld without --tc-heuristic-recover
2016-03-23 20:18:21 140348206848032 [ERROR] Can't init tc log
2016-03-23 20:18:21 140348206848032 [ERROR] Aborting

And finally:

[vagrant@maria01 ~]$ sudo service mysql start
Starting MySQL... SUCCESS!

By the way, as per the discussion on twitter, I’m not really sure yet if this is a problem related to the in-order commit when using parallel replication which implies that a transaction commit conflict is happening at that point. Below the configuration file used for the MSR Slave, showing that it’s configured with @@slave_pararllel_mode=optimistic which as per the manual online “tries to apply most transactional DML in parallel, and handles any conflicts with rollback and retry”, more info here.

#: box01 - multi-source slave
[client]
port=3306
socket=/var/lib/mysql/mysql.sock
[mysqld]
user=mysql
port=3306
socket=/var/lib/mysql/mysql.sock
basedir=/usr
datadir=/var/lib/mysql
read_only=1
#: repl vars
server_id=1
report_host=box01
report_port=3306
report_user=repl
log_bin=mysql-bin
log_bin_index=mysql.index
log_slave_updates=true
binlog_format=ROW
#: verify checksum on master
master_verify_checksum=1
#: gtid vars
gtid_domain_id=1
gtid_ignore_duplicates=ON
gtid_strict_mode=1
 
#: msr slave parallel mode *
box02.slave_parallel_mode=conservative
box03.slave_parallel_mode=conservative
box04.slave_parallel_mode=conservative
 
slave_parallel_threads=10
slave_domain_parallel_threads=2
slave_parallel_max_queued=512M
slave_net_timeout=15
slave_sql_verify_checksum=1
slave_compressed_protocol=1
#: binary log group commit behavior
#binlog_commit_wait_usec=100000
#binlog_commit_wait_count=20

Maybe a test using @@slave_domain_parallel_threads should be done as the next step, but, if you have any additional thoughts on this, it’s really appreciated.

Continuing with this, I found that Connection Names were not running in optimistic mode (it was conservative, which limits parallelism in an effort to avoid any conflicts) and then after changing that, I did the test again:

#: current values
MariaDB [(none)]> show all slaves status\G
              Connection_name: box02
                Parallel_Mode: conservative
              Connection_name: box03
                Parallel_Mode: conservative
              Connection_name: box04
                Parallel_Mode: conservative
3 rows in set (0.00 sec)
 
#: changing Parallel Mode to Optimistic
MariaDB [(none)]> stop all slaves;
Query OK, 0 rows affected, 3 warnings (0.00 sec)
 
MariaDB [(none)]> set global box02.slave_parallel_mode='optimistic';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [(none)]> set global box03.slave_parallel_mode='optimistic';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [(none)]> set global box04.slave_parallel_mode='optimistic';
Query OK, 0 rows affected (0.00 sec)
 
MariaDB [(none)]> start all slaves;
Query OK, 0 rows affected, 3 warnings (0.02 sec)
 
MariaDB [(none)]> show all slaves status\G
              Connection_name: box02
                Parallel_Mode: optimistic
              Connection_name: box03
                Parallel_Mode: optimistic
              Connection_name: box04
                Parallel_Mode: optimistic
3 rows in set (0.00 sec)

The parallel threads were like:

MariaDB [(none)]> SELECT ID,TIME,STATE,USER FROM INFORMATION_SCHEMA.PROCESSLIST WHERE USER='system user';
+----+------+------------------------------------------------------------------+-------------+
| ID | TIME | STATE                                                            | USER        |
+----+------+------------------------------------------------------------------+-------------+
| 46 |   81 | Slave has read all relay log; waiting for the slave I/O thread t | system user |
| 45 |   81 | Waiting for master to send event                                 | system user |
| 44 |   86 | Slave has read all relay log; waiting for the slave I/O thread t | system user |
| 43 |   86 | Waiting for master to send event                                 | system user |
| 42 |  102 | Slave has read all relay log; waiting for the slave I/O thread t | system user |
| 41 |  102 | Waiting for master to send event                                 | system user |
| 35 |    0 | Waiting for prior transaction to commit                          | system user |
| 34 |    0 | Waiting for prior transaction to commit                          | system user |
| 33 |    0 | Waiting for prior transaction to commit                          | system user |
| 32 |  175 | Waiting for work from SQL thread                                 | system user |
| 31 |  175 | Waiting for work from SQL thread                                 | system user |
| 30 |    0 | Unlocking tables                                                 | system user |
| 29 |    0 | Unlocking tables                                                 | system user |
| 28 |    0 | Unlocking tables                                                 | system user |
| 27 |  175 | Waiting for work from SQL thread                                 | system user |
| 26 |  175 | Waiting for work from SQL thread                                 | system user |
+----+------+------------------------------------------------------------------+-------------+
16 rows in set (0.00 sec)

Additionally, I’m curious to check now the Retried_transactions per connection Name variable to check if the retry transactions part of the optimistic parallel replication mode is really working:

MariaDB [(none)]> pager egrep "Connection|Parallel|Gtid_IO|Retried"
PAGER set to 'egrep "Connection|Parallel|Gtid_IO|Retried"'
MariaDB [(none)]> show all slaves status\G
              Connection_name: box02
                  Gtid_IO_Pos: 1-1-68,4-4-87933,3-3-77410,2-2-149378
                Parallel_Mode: optimistic
         Retried_transactions: 12
              Connection_name: box03
                  Gtid_IO_Pos: 1-1-68,4-4-87933,3-3-88622,2-2-131340
                Parallel_Mode: optimistic
         Retried_transactions: 3
              Connection_name: box04
                  Gtid_IO_Pos: 1-1-68,4-4-98365,3-3-77410,2-2-131340
                Parallel_Mode: optimistic
         Retried_transactions: 3
3 rows in set (0.02 sec)

Additionally, we can check that the global status variable Slave_retried_transactions finnally reflects the total value to retried transactions by Connection Names on MSR Slave:

MariaDB [(none)]> show global status like 'Slave_retried%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| Slave_retried_transactions | 18    |
+----------------------------+-------+
1 row in set (0.00 sec)

So, it’s solved, slave hasn’t crashed anymore, but the question why did the MSR Slave crashed is not solved yet. But, what was learnt here was that, we can use also minimal besides of conservative for slave_parallel_mode that will play very good in this case as it’s going to only parallelizes the commit steps of transactions, this is the next test I would like to realize as the next step on this ever growing post. I’m going to try another post to check the relation between transaction’s conflicts rate and performance impact over the exiting slave parallel mode.

AddThis Social Bookmark Button

How to MySQL 5.6 MTS? Some tips and tests…

outubro 6th, 2014 Bianchi Posted in MySQL HA, MySQL Replication No Comments »

In terms of new features, MTS extrapolate conceptually all the expectations seeing that it is a feature that will elevate (or at leaf will try it) all the stress that we’ve been seeing for years, having a slave server behind master in some way, files or seconds. The first thing that blink on my mid is the red mark I used to have on my personal server’s dashboard control which has called out my attention always when a slave server hangs while the master still working well. Many of time, I’ve seeing IO problems that will make slave hangs within the relay log rotation time or when executing a long report – at least four times a day, I must go and check what’s going on. BTW, due to all of that I believe that the exchange the model of single-thread for that with multiples threads will elevate the problem – I hope to have the slave servers not hanging too much anymore.

Having said that, I’ve been seeing some cases in which after implementing the MySQL 5.6 in production, properly adjusting the worker threads variable (slave_parallel_wrokers) and starting the slave, not often it’s being worked as promised. The first action when it’s not working properly is to check all the necessary configuration to get it working well, it’s not just raise up the previous mentioned and put it to run – you make sure about some small details:

  • make sure the master and slaves are MySQL 5.6++, there some information shipped with binary log events which will be read by the slaves, being executed by more than once thread (slave_parallel_threads > 1);
  • make sure slave_parallel_threads is configured with a value > 1;
  • make sure to enable Crash-Safe Replication to make it more reliable, adding master_info_repository = TABLE, relay_log_info_repository = TABLE and relay_log_recovery = ON to my.cnf file;
  • it’s going to work with binlog_format as statement or row;

There is no problem in using the bin log format as statement or row, both formats will work well since on both one can observe the entries on binary log file of all the necessary information for the workers. Another advice is that, once you’ve started MySQL replicating in a crash-safe mode, it’s not possible to alter dynamically the repositories for relay and master info on runtime for a busy server due to the workers info being stored on tables and if you change this information’s location, workers might be a little bit out of reference.

First of all, I did a test considering a customer environment where they’re running a MySQL 5.5.37 on all the master and slave servers. the strategy is to replace slave servers until we hit the master, doing finally a switchover to another server in order to upgrade the master server to MySQL 5.6 – a normal and reasonable strategy to avoid risks on production. If you get a error on any project step, time will be welcomed to study a little bit more what’s happening and then, take another step towards another task. Another point to take into account is the old_passwords (removed in 5.7) that is still available in 5.6 and must be used when users in mysql.user table remain using 16-bytes passwords. It’s a good maneuver while updating all the account’s passwords to keep old_passwords=1 configuration to give sometime to map all the accounts used by the applications to avoid problems with access denied (while taking with account’s passwords updates, configure log_warnings=2 to get all the login failed attempts and try to correct it).

Test Scenario: confirming that mysql 5.5 (master) and mysql 5.6 (slave) does not replicate with mts

As this is the scenario I’ve been see as much on many of customers I’ve visited, I decided to spend sometime and stress the possibilities around the MTS replication having a 5.5 master and a new comer, mysql 5.6 as slave. Some discussions on the internet made me believe that at some level of configuration this scenario will become possible – but, it’s not supported. After speaking to guys on MySQL Central, we discussed a lot many scenarios but some high level guys known as developers said that 5.6 used to ship some additional information on binary logs and then, the mts on slave get to know how to proceed splitting up queries (binlog_format=statement) or updates (binlog_format=row) by workers (threads) – this job is done actually by a coordinator that is a thread as well that executes stuff read from the relay logs on slave side. BTW, this all I’ve got to know after testing the environment which I raised up using Vagrant as below.

Files you’ll need to create/copy/paste – make sure you have a centos65-x86_64 box added on your vagrant boxes or alter the value of mysql55.vm.box and mysql56.vm.box in Vagrantfile configs.

wagnerbianchi02:mysql55_and_mysql56 root# ls -lh
total 24
drwxr-xr-x 3 root wheel 102B Oct 6 12:53 .vagrant
-rw-r--r-- 1 root wheel 760B Oct 6 12:53 Vagrantfile
-rw-r--r-- 1 root wheel 681B Oct 6 12:52 setup_env55.sh
-rw-r--r-- 1 root wheel 343B Oct 6 12:42 setup_env56.sh

Vagrantfile, which you can just copy/paste:

# -*- mode: ruby -*-
# vi: set ft=ruby :

file_to_disk = './tmp/galera_disk.vdi'

# Vagrantfile API/syntax version. Don't touch unless you know what you're doing!
VAGRANTFILE_API_VERSION = "2"

Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|

config.vm.define "mysql55" do |mysql55|
mysql55.vm.box = "centos65-x86_64"
mysql55.vm.network :"private_network", virtualbox__intnet: "mysql55_and_mysql56", ip: "192.168.50.100"
mysql55.vm.provision "shell", path: "setup_env55.sh"
end

config.vm.define "mysql56" do |mysql56|
mysql56.vm.box = "centos65-x86_64"
mysql56.vm.network :"private_network", virtualbox__intnet: "mysql55_and_mysql56", ip: "192.168.50.101"
mysql56.vm.provision "shell", path: "setup_env56.sh"
end

end

If you get an error like this below, review the boxes you’ve got added to your Vagrant boxes:

wagnerbianchi01:mysql55_and_mysql56 root# vagrant up
Bringing machine 'mysql55' up with 'virtualbox' provider...
Bringing machine 'mysql56' up with 'virtualbox' provider...
==> mysql55: Box 'centos65-x86_64' could not be found. Attempting to find and install...
mysql55: Box Provider: virtualbox
mysql55: Box Version: >= 0
==> mysql55: Adding box 'centos65-x86_64' (v0) for provider: virtualbox
mysql55: Downloading: centos65-x86_64
An error occurred while downloading the remote file. The error
message, if any, is reproduced below. Please fix this error and try
again.

Couldn't open file /opt/vagrant_projects/mysql55_and_mysql56/centos65-x86_64

Setup scripts that you can use to create others machines:

#!/usr/bin/env bash
#: script name: setup_env55.sh
#
sudo echo "nameserver 8.8.8.8" > /etc/resolv.conf
sudo echo "nameserver 8.8.4.4" >> /etc/resolv.conf
sudo yum -y install wget vim
sudo yum -y remove mysql-libs-5.1.71-1.el6.x86_64
sudo rpm -Uvi https://dev.mysql.com/get/mysql-community-release-el6-5.noarch.rpm
sudo wget http://dev.mysql.com/get/Downloads/MySQL-5.5/MySQL-5.5.39-2.el6.x86_64.rpm-bundle.tar
sudo tar xvf MySQL-5.5.39-2.el6.x86_64.rpm-bundle.tar
sudo rpm -ivh install MySQL-{server,shared,client}*
sudo /etc/init.d/mysql start

#!/usr/bin/env bash
#: script name: setup_env56.sh
#
sudo echo "nameserver 8.8.8.8" > /etc/resolv.conf
sudo echo "nameserver 8.8.4.4" >> /etc/resolv.conf
sudo yum -y install wget vim
sudo rpm -Uvi https://dev.mysql.com/get/mysql-community-release-el6-5.noarch.rpm
sudo yum-config-manager --enable mysql56-community
sudo yum -y install mysql-server
sudo service mysqld start

Following this so is the command vagrant up and machines up & running.

wagnerbianchi01:mysql55_and_mysql56 root# vagrant status
Current machine states:

mysql55 running (virtualbox)
mysql56 running (virtualbox)

This environment represents multiple VMs. The VMs are all listed
above with their current state. For more information about a specific
VM, run `vagrant status NAME`.

Setting up additional things like classic or GTID replication (where once can do using MySQL Utilities), it’s going to be possible to execute the tests. In addition of regular and initial variables the come mainly with the 5.5 configuration file, on 5.6 I added the server_id, server_id=200, slave_worker_threads = 2, master_info_repository = TABLE, relay_log_info_repository = TABLE and relay_log_recovery = ON, for Crash-Safe Replication configurations as the 5.6 will be the slave.

mysql> mysql> select @@server_id, @@slave_parallel_workers, @@master_info_repository,
    -> @@relay_log_info_repository, @@master_info_repository, @@relay_log_recovery\G
*************************** 1. row ***************************
                @@server_id: 200
   @@slave_parallel_workers: 2
   @@master_info_repository: TABLE
@@relay_log_info_repository: TABLE
   @@master_info_repository: TABLE
       @@relay_log_recovery: 1
1 row in set (0.00 sec)

Checking the replication status:

           Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

Now, the test proposed on the Luis’ blog sometime ago to explain MTS is, on the master server:

mysql> create database db1; create database db2;
Query OK, 1 row affected (0.00 sec)

Query OK, 1 row affected (0.00 sec)

On slave side, check the content of the table mysql.slave_worker_info – this table will register all the movement around the MTS replication. Below you can see that, as we haven’t executed anything directly on databases yet, threads haven’t worked yet.

mysql> select * from mysql.slave_worker_info\G
*************************** 1. row ***************************
                        Id: 1
            Relay_log_name:
             Relay_log_pos: 0
           Master_log_name:
            Master_log_pos: 0
 Checkpoint_relay_log_name:
  Checkpoint_relay_log_pos: 0
Checkpoint_master_log_name:
 Checkpoint_master_log_pos: 0
          Checkpoint_seqno: 0
     Checkpoint_group_size: 64
   Checkpoint_group_bitmap:
*************************** 2. row ***************************
                        Id: 2
            Relay_log_name:
             Relay_log_pos: 0
           Master_log_name:
            Master_log_pos: 0
 Checkpoint_relay_log_name:
  Checkpoint_relay_log_pos: 0
Checkpoint_master_log_name:
 Checkpoint_master_log_pos: 0
          Checkpoint_seqno: 0
     Checkpoint_group_size: 64
   Checkpoint_group_bitmap:
2 rows in set (0.00 sec)

Coming back on the master, enter some inserts:

mysql> create table db1.t1(a int); create table db2.t1(a int);
Query OK, 0 rows affected (0.01 sec)

Query OK, 0 rows affected (0.00 sec)

mysql> insert into db1.t1 values (1),(2),(3); insert into db2.t1 values (1),(2),(3);
Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

And then, we again go over mysql.slave_worker_info to check if those two listed threads have worked or not:

mysql> select * from mysql.slave_worker_info\G
*************************** 1. row ***************************
                        Id: 1
            Relay_log_name: ./mysqld-relay-bin.000002
             Relay_log_pos: 1171
           Master_log_name: master-bin.000001
            Master_log_pos: 1007
 Checkpoint_relay_log_name: ./mysqld-relay-bin.000002
  Checkpoint_relay_log_pos: 797
Checkpoint_master_log_name: master-bin.000001
 Checkpoint_master_log_pos: 633
          Checkpoint_seqno: 1
     Checkpoint_group_size: 64
   Checkpoint_group_bitmap:
*************************** 2. row ***************************
                        Id: 2
            Relay_log_name:
             Relay_log_pos: 0
           Master_log_name:
            Master_log_pos: 0
 Checkpoint_relay_log_name:
  Checkpoint_relay_log_pos: 0
Checkpoint_master_log_name:
 Checkpoint_master_log_pos: 0
          Checkpoint_seqno: 0
     Checkpoint_group_size: 64
   Checkpoint_group_bitmap:
2 rows in set (0.00 sec)

Just one thread working!! Yes, we confirmed that this does not work when you have a master on 5.5 and slaves on 5.6, regardless of the binlog_format and bla bla bla. A good point for a discussion at this point is that MySQL 5.6 has received lots of improvements regarding its engine, many other related with InnoDB and many others on many other things. Maybe it’s a good time to start upgrading 5.5 to 5.6 on slaves until we hit the master and then, upgrade all the database machines, even having MySQL 5.6 MTS disable for this moment.

Test Scenario: confirming that mysql 5.6 (master) and mysql 5.6 (slave) replicates with mts, even with binlog_format=statement

To make this new test, we just need to remove the 5.5’s RPM packages and add 5.6 from the repository and then, start slave. The final step is to execute the tests again and check the mysql.slave_worker_info table’s content on slave server.

[root@mysql55 ~]# service mysql stop
Shutting down MySQL. SUCCESS!
[root@mysql55 ~]# rpm -e MySQL-shared-5.5.39-2.el6.x86_64 MySQL-client-5.5.39-2.el6.x86_64 \
> MySQL-shared-compat-5.5.39-2.el6.x86_64 MySQL-server-5.5.39-2.el6.x86_64 \
> MySQL-shared-compat-5.5.39-2.el6.x86_64
[root@mysql55 ~]# yum -y install mysql-server
[...]
Setting up Install Process
[...]
Installed size: 329 M
Downloading Packages:
[...]
Complete!

With the 5.6 on master side, next step is to add to the my.cnf the thread_stack=256K to avoid this reported misconfiguration. After it, it time to put those two configured worker threads to work…

On master:

[root@mysql55 ~]# service mysqld start
Starting mysqld:                                           [  OK  ]
[root@mysql55 ~]# mysql -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 2
Server version: 5.6.21-log MySQL Community Server (GPL)

Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved.

Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

mysql> insert into db1.t1 values (1),(2),(3); insert into db2.t1 values (1),(2),(3);
Query OK, 3 rows affected (0.01 sec)
Records: 3  Duplicates: 0  Warnings: 0

Query OK, 3 rows affected (0.00 sec)
Records: 3  Duplicates: 0  Warnings: 0

Checking the worker threads on slave:

mysql> select * from mysql.slave_worker_info\G
*************************** 1. row ***************************
                        Id: 1
            Relay_log_name: ./mysqld-relay-bin.000004
             Relay_log_pos: 976
           Master_log_name: master-bin.000002
            Master_log_pos: 816
 Checkpoint_relay_log_name: ./mysqld-relay-bin.000004
  Checkpoint_relay_log_pos: 554
Checkpoint_master_log_name: master-bin.000002
 Checkpoint_master_log_pos: 394
          Checkpoint_seqno: 1
     Checkpoint_group_size: 64
   Checkpoint_group_bitmap:
*************************** 2. row ***************************
                        Id: 2
            Relay_log_name: ./mysqld-relay-bin.000004
             Relay_log_pos: 765
           Master_log_name: master-bin.000002
            Master_log_pos: 605
 Checkpoint_relay_log_name: ./mysqld-relay-bin.000004
  Checkpoint_relay_log_pos: 554
Checkpoint_master_log_name: master-bin.000002
 Checkpoint_master_log_pos: 394
          Checkpoint_seqno: 0
     Checkpoint_group_size: 64
   Checkpoint_group_bitmap:
2 rows in set (0.00 sec)

Yes, it’s working and confirmed that MTS is a feature present just on 5.6, using ROW or STATEMENT as binlog_format. BTW, i like to blog things considering all the small details, because, as our MySQL friend said on MySQL Central @ OOW14, “do not underestimate the importance of the small things“.

All the best,

AddThis Social Bookmark Button

Replication and worker threads

janeiro 30th, 2014 Bianchi Posted in MySQL Replication No Comments »

Recently I’ve got myself a little worried about how to monitor the threads executing data coming from the relay logs in a replication environment. I decided to go, raise some virtual machines, setup mysql on that and start investigating how to do that. All the things will start setting up the replication between some mysql servers using binary log. I haven’t tested the slave parallel workers with GTID replication and hearing folks around saying that parallel slave workers is a feature that is note working in GTID at this time (check the blog’s date, ok?)

Getting those stuff running, this is time to configure the slave_parallel_workers, which is the system variable that is in changer to control/handle the amount of threads will be dedicated to execute all those data being piled in relay logs. BTW, it’s very good that the mentioned variable as many others can be reconfigured with new value on the runtime and having said that…

mysql> show variables like ‘slave_parallel%’;
+——————————+———————–+
| Variable_name | Value |
+——————————+———————–+
| slave_parallel_workers | 0 |
+——————————+———————–+
1 rows in set (0.00 sec)

mysql> set global slave_parallel_workers=10;
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like ‘slave_parallel%’;
+——————————+———————–+
| Variable_name | Value |
+——————————+———————–+
| slave_parallel_workers | 10 |
+——————————+———————–+
1 rows in set (0.00 sec)

As you can see, it’s easy to configure the number of threads the will execute data that is coming form the relay log/master server. It’s good to have in mind that is advisable that the number of threads keep up with the number of the available processor cores of the slave machine, as this is the slave one. This is the time so to configure a way to monitor the slave thread worker. Another variable that has a GLOBAL scope as well must be reconfigured:

# assure that the replication is stopped at this point as it’s just possible to run the below command if replication is stopped

mysql> set global relay_log_info_repository=’table’;
ERROR 1198 (HY000): This operation cannot be performed with a running slave; run STOP SLAVE first
mysql> stop slave;
Query OK, 0 rows affected (0.01 sec)

mysql> set global relay_log_info_repository=’table’;
Query OK, 0 rows affected (0.02 sec)

mysql> start slave;
Query OK, 0 rows affected, 1 warning (0.03 sec)

Now it’s a matter to start slave and query the mysql.slave_worker_info table…

mysql> select count(*) from mysql.slave_worker_info\G
*************************** 1. row ***************************
count(*): 10
1 row in set (0.00 sec)

Cheers, WB

 

AddThis Social Bookmark Button

MySQL Replication Topologies

novembro 8th, 2011 Bianchi Posted in MySQL Replication No Comments »

You should know that MySQL team has been doing a good job and that product naturally is being a great option when the chat is horizontal scale or Scale-Out too. I mentioned the last “too” cause MySQL has been doing a so confident job in vertical scaling on its availability with hands on InnoDB Storage Engine. But, treating about MySQL Replication and Scale-Out points, MySQL has its good features as that three supported kinds of data replication:

  • Synchronous data replication, used just when you’re using MySQL Cluster (data replication between Data Nodes);
  • Asynchronous and Semi-synchronous replication, to replicate data using servers available as a MASTER and SLAVE, having MASTER a lot of SLAVEs and a SLAVE a unique MASTER.
This post are being write just to highlight the kinds of topology mentioned by Oracle and some other that we can create to solve a specific problem inside a company. To mention the existent kinds, it will need to explain more about the “map of availability”, created by MySQL AB.
As say a friend from USA, it is “easy peasy”  to understand this graph and work with on your organization strategy. Starting from Small Business where normally a little and small amount of availability is required to maintain the business continuity, you can set up just only a instance of MySQL to get it working well, with small management applied to this environment. We can realizing, looking graph that this small business could count with 35 days of downtime on worst cases. As we will rising the graph, we will seeing new situations and the number of nines will growing (high availability nines).
.
Topology 1: Single
.
The fist one I will comment is the “Simple” topology, normally used when organization is looking for data redundancy and backup process improvements. Simply, it will operates with two servers actuating as a MASTER and SLAVE. The good touch here is to adapt application to write data on MASTER and just read data from SLAVE. It will provide good improvements and will alleviate workload if you were operating with a single server to respond all app requests .
master-slave

The main server (rounded with red) acts as a MASTER and the other, as a SLAVE - that last must be configured with read-only=1

In this case your normally will configure MySQL running into SLAVE server with read-only=1, as showed below:
.
mysql> SET GLOBAL read_only=1;
.
Topology 2: Multiple
.
As the name says, on this topology we can have many servers looking for a unique MASTER, building what we know as a multiple topology. It will be pretty suitable when the environment has the necessity to advance to a multiple divided workload, which on you can let you app write data on MASTER, read from one of the SLAVEs servers (you can apply a kind of load-balancing as mysql-proxy or F5 LTM) and let the other to extract security copies to avoid interfere on those production servers. It is a common situation where we have high workload and must backup databases at least two times a day – in this case it is good to use the snapshot backup supported by MySQL Enterprise Backup, Xtrabackup or Zmanda.

You could set up much more servers than two depicted above!

Topology 3: Chain
This topology will simply provide that story of replicate data on Master(1) <- Master/Slave(2) <- Slave(3) architecture. This is good when you have a departmental servers available separately inside your organization to attend many areas with as less time as possible. With this topology replication model, you will be able to adjust applications to write data on server A and B (INSERT and DELETE), scaling writes using both mentioned servers. The third one could be used to serve reports and backup as a read only server (just SELECT). What we cannot forget is to set the log-slave-updates on server’s B my.cnf due to this server will be MASTER and SLAVE at the same time (MySQL Manual Page: http://bit.ly/nGTQO1).
rpl_chain

MASTER(A) <-> MASTER(B) -> SLAVE(C) - Attention to configure out -log-slave-updates on server (B)

Normally, a slave does not log to its own binary log any updates that are received from a master server. This option tells the slave to log the updates performed by its SQL thread to its own binary log. For this option to have any effect, the slave must also be started with the --log-bin option to enable binary logging. --log-slave-updates is used when you want to chain replication servers. For example, you might want to set up replication servers using this arrangement:

A -> B -> C

Here, A serves as the master for the slave B, and B serves as the master for the slave C. For this to work, B must be both a master and a slave. You must start both A and B with --log-bin to enable binary logging, and B with the --log-slave-updates option so that updates received from A are logged by B to its binary log.

Topology 4: Circular

This kind of replication topology has been generating many discussion around MySQL environments due to the set up with MySQL 5.0, version that not count with the terminator applied on MySQL 5.1. On broad terms, MySQL servers is set up on a circular way where every server is MASTER and SLAVE at the same time. The log-slave-updates replication system variables must be configured on all servers in order to ignore servers that just have executed that current updates.

rpl_circular

You can set up MySQL Servers 5.1 ++ in circular replication as A <-> B <-> C

In circular replication, it was sometimes possible for an event to propagate such that it would be replicated on all servers. This could occur when the originating server was removed from the replication circle and so could no longer act as the terminator of its own events, as normally happens in circular replication.

To prevent this from occurring, a new IGNORE_SERVER_IDS option is introduced for the CHANGE MASTER TO statement. This option takes a list of replication server IDs; events having a server ID which appears in this list are ignored and not applied.

In conjunction with the introduction of IGNORE_SERVER_IDS, SHOW SLAVE STATUS has two new fields. Replicate_Ignore_Server_Ids displays information about ignored servers. Master_Server_Id displays the server_id value from the master. (Bug #47037)

See also Bug #25998, Bug #27808.

Additional Resources

White Papers

On Demand Webinars

AddThis Social Bookmark Button