Andy Kerber (@dbakerber), Senior Consultant
In this post, we are going to build a MySQL active-passive cluster using Pacemaker, Corosync, and DRBD.
This was originally intended simply as a thought exercise. Could I put together a fairly resilient, fairly highly available, MySQL configuration over shared storage without using something like Oracle RAC? The answer to that question is that the shared storage piece is impossible without Oracle RAC. However, using DRBD, we can get something close.
DRBD stands for Distributed Replicated Block Device. It works, as the name implies, by replicating blocks. It is not a shared device. One DRBD device is designated as the primary device, with additional devices designated as secondary devices, and blocks are replicated from the primary device to the secondary device.
Pacemaker and Corosync are Linux clustering software pieces that allow for communication between the cluster nodes, maintain synchronization for cluster resources, and monitor resources for availability. When a resource becomes unavailable, they also manage the failover.
The servers
The VMware workstations are running OEL 7.6 with 8G RAM. In addition to a 20G root device, each server has a 20G VMDK for the DRDB device for the MySQL database files.
The DRBD devices are configured as logical volumes (LVM) in order to make adding space easier.
The server names are linclust1 and linclust2. Below are the host files, note that the NICs for cluster management are named linclust1-hb and linclust2-hb, and the storage management NICs are named lincust1-priv and linclust2-priv. It is definitely recommended that different NICs be used for storage and internodal communications.
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4 ::1 localhost localhost.localdomain localhost6 localhost6.localdomain6 10.12.1.58 linclust1 linclust1.localdomain 10.12.1.59 linclust2 linclust2.localdomain 192.168.31.101 linclust1-priv linclust1-priv.localdomain 192.168.31.102 linclust2-priv linclust2-priv.localdomain 192.168.57.101 linclust1-hb linclust1-hb.localdomain 192.168.57.102 linclust2-hb linclust2-hb.localdomain 10.12.1.61 linclust-vip linclust-vip.localdomain
Before completing any other cluster configuration, we configure LVM and DRBD. The DRDB devices show up on each node as /dev/sdb and /dev/sdc.
Create partitions using fdisk. Find instructions on creating a new partition in fdisk in this blog.
The steps used are as follows (all commands are run as root unless otherwise specified). Since the workstations are running RHEL/OEL7, upon creation the partition will automatically be aligned on a sector boundary when we run fdisk. In early versions, we would need to align it manually.
The general tasks in this process are to: create the logical volumes, create the DRBD device, create the cluster and define the cluster services. The individual steps in each task are below:
- Create logical volumes for the DRBD device. This step is done on each node.
pvcreate /dev/sdb1 vgcreate shared1 /dev/sdb1 lvcreate --name shared1 -l 100%FREE shared1
- Install the required software on both cluster servers:
yum –y install drbd pacemaker corosync pcs pcsd
The above command will install all the required software with the exception of MySQL.
- Install the required MySQL packages on both servers. Since this is open source software, we don’t have to worry about licensing rules, and can install it on both servers:
yum localinstall mysql80-community-release-el7-2.noarch.rpm yum –y install mysql-community-server
- Configure DRBD. DRBD continuously replicates data from the primary device to the secondary device.
a. Edit global_common.conf and copy to each node. It should read as follows:
global { usage-count no; } common { net { protocol C; } }
b. Create the actual definition file for our configuration, and copy to each node. The file must be identical on each node. In our case, the name of the file is drbd00.res, and has the following lines:
resource drbd00 {
device /dev/drbd0; disk /dev/shared1/shared1; meta-disk internal; net { allow-two-primaries; } syncer { verify-alg sha1; } on linclust1.localdomain { address 192.168.31.101:7789; } on linclust2.localdomain { address 192.168.31.102:7789; } }
At this point, we are ready to start DRBD. Note that we are using the option ‘allow two primaries’. This is because PCS will manage the mounting of the file system for the software and data.
c. Run these commands to initialize DRBD on each node:
drbdadm create-md drbd00
The above command initializes the DRBD data.
d. Now start DRBD on each node:
systemctl start drbd.service systemctl enable drbd.service
e. Run the command below to assign the primary and secondary nodes:
drbdadm primary drbd00 –force
The primary command designates the current node as the primary node, so we can make changes to the DRBD device attached to this node. DRBD is now running. We can see the DRBD devices at /dev/drbd0.
f. Create a file system on the DRBD device. Since we are going to use MySQL and an active-passive cluster, we will create an xfs file system:
mkfs –t xfs /dev/drbd0
g. Now the DRBD replication should be working. Run the command below to confirm:
[root@linclust1 ~]# cat /proc/drbd version: 8.4.5 (api:1/proto:86-101) srcversion: 1AEFF755B8BD61B81A0AF27 0: cs:Connected ro:Primary/Secondary ds:UpToDate/UpToDate C r----- ns:406265 nr:383676 dw:790464 dr:488877 al:35 bm:0 lo:0 pe:0 ua:0 ap:0 ep:1 wo:f oos:0
If the status does not show UpToDate, wait for a bit, check it again, and verify that everything is up to date.
h. Create the mount points. I am going to use /u01:
mkdir –p /u01 chown oracle:oinstall /u01
- Configure the cluster.
a. When we installed pcs, the user hacluster should have been created. Modify the /etc/passwd file for the hacluster user to read like the following:
hacluster:x:189:189:cluster user:/home/hacluster:/bin/bash
b. Create the directory for hacluster, and set hacluster:hacluster as the owner:
mkdir –p /home/hacluster chown hacluster:hacluster /home/hacluster
c. Set the password for the hacluster user using the passwd command:
passwd hacluster
d. Start the cluster services:
systemctl start pcsd.service systemctl enable pcsd.service
e. Authenticate the cluster managers:
pcs cluster auth lincust1 linclust2
f. Create the cluster:
pcs cluster setup --name DRBD_CLUSTER linclust1-hb linclust2-hb
g. Disable stonith. We do not want to fence a node that is not working, DRDB and PCS should be able to handle it properly:
pcs property set stonith-enabled=FALSE pcs cluster start –all
The basic cluster services are now configured and the cluster is running. Now it is time to configure the required services that are used to manage the cluster and resources. These services are what we use to manage the failover.
- Create the virtual IP address (VIP) that users will use to connect to the database.
pcs resource create ClusterVIP ocf:heartbeat:IPaddr2 ip=10.12.1.61 cidr_netmask=32 op monitor interval=30s
- Define the services to manage DRBD. Since we have one DRBD resource, we will need two services to manage it. The ‘raw’ service (note, name was chosen by me) simply tells PCS to keep track of the DRBD resource. The ‘master’ service tells PCS that it has to manage (master) the DRBD service it created:
pcs resource create dataraw ocf:linbit:drbd drbd_resource="drbd00" op monitor interval=10s pcs resource master datamaster dataraw master-max=1 master-node-max=1 clone-max=2 clone-node-max=2 notify=true
Note this entry: ocf:linbit:drbd. That is the type of service for PCS to monitor. For a complete list of available services, run the command ‘pcs resource list’.
- Mount the MySQL data file system and tell PCS to manage it:
pcs resource create DATAFILES filesystem device="/dev/drbd0" directory="/u01/mysql/mysqldata" fstype="xfs"
- Create the MySQL data directory:
mkdir –p /u01/app/mysql/msqldata chown –R mysql:mysql /u01/app/mysql
- Initialize the data directory for MySQL on the DRBD device:
mysqld --initialize --datadir='/u01/app/mysql/mysqldata'
- Configure some colocation and startup rules:
pcs constraint colocation add datamaster with ClusterVIP INFINITY pcs constraint order ClusterVIP then datamaster pcs constraint colocation add DATAFILES with datamaster INFINITY; pcs constraint order promote datamaster then start DATAFILES pcs resource defaults migration-threshold=1 pcs resource group add mysql ClusterVIP DATAFILES
- Create the resources to monitor MySQL:
pcs resource create mysql_service ocf:heartbeat:mysql binary="/usr/bin/mysqld_safe"
config="/etc/my.cnf" datadir="/var/lib/mysql" pid="/var/lib/mysql/mysql.pid"
socket="/var/lib/mysql/mysql.sock" owner="mysql" additional_parameters="--bind-
address=0.0.0.0" op start timeout=60s op stop timeout=60s op monitor interval=20s
timeout=30s op monitor interval=20s timeout=30s
pcs constraint colocation add mysql_service01 with DATAFILES infinity pcs constraint order promote datamaster then start DATAFILES then start mysql_service
At this point, the cluster is created and services are running.
Whenever you failover, be sure to check the status of both DRBD (cat /proc/drbd) and PCS (pcs status). Due to the slow startup of DRBD, I added the following script to run after bootup to make sure DRBD was up and running:
[root@linclust2 startup]# cat postboot.sh #!/bin/bash PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/root/bin export PATH sleep 3m drbdadm up drbd00 cat /proc/drbd > /tmp/drbd.out
To create the script above, and ensure that it runs each time the system starts, I created it as a service. To do this, I created a file called postboot.service in /etc/systemd/system/. The file has the following contents:
[root@linclust2 system]# cat postboot.service [Unit] Description=Script to run things after everything else starts After=network.target [Service] Type=simple ExecStart=/root/startup/postboot.sh TimeoutStartSec=0 [Install] WantedBy=default.target
Note the file name after ExecStart=, which is the file that gets executed. To enable this, run this command:
systemctl enable postboot.service systemctl start postboot.service
You can also modify the script at this point to include any other commands you need.
This is what you should see when you check the status:
[root@linclust1 ~]# pcs status Cluster name: DRBD_CLUSTER Stack: corosync Current DC: linclust2-hb (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum Last updated: Wed Mar 13 09:33:26 2019 Last change: Wed Mar 13 09:13:42 2019 by hacluster via crmd on linclust1-hb 2 nodes configured 5 resources configured Online: [ linclust1-hb linclust2-hb ] Full list of resources:
Master/Slave Set: datamaster [dataraw] Masters: [ linclust2-hb ] Slaves: [ linclust1-hb ] Resource Group: mysql ClusterVIP (ocf::heartbeat:IPaddr2): Started linclust2-hb DATAFILES (ocf::heartbeat:Filesystem): Started linclust2-hb mysql_service (ocf::heartbeat:mysql): Started linclust2-hb Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Reboot and check status to verify a successful failover:
[root@linclust1 ~]# pcs status Cluster name: DRBD_CLUSTER Stack: corosync Current DC: linclust1-hb (version 1.1.19-8.el7_6.4-c3c624ea3d) - partition with quorum Last updated: Wed Mar 13 09:55:41 2019 Last change: Wed Mar 13 09:13:42 2019 by hacluster via crmd on linclust1-hb 2 nodes configured 5 resources configured Online: [ linclust1-hb ] OFFLINE: [ linclust2-hb ] Full list of resources: Master/Slave Set: datamaster [dataraw] Masters: [ linclust1-hb ] Stopped: [ linclust2-hb ] Resource Group: mysql ClusterVIP (ocf::heartbeat:IPaddr2): Started linclust1-hb DATAFILES (ocf::heartbeat:Filesystem): Started linclust1-hb mysql_service (ocf::heartbeat:mysql): Started linclust1-hb Daemon Status: corosync: active/enabled pacemaker: active/enabled pcsd: active/enabled
Note that this is not a true High Availability (HA) solution. It takes a few minutes for failover to stabilize on the new node, and start all required processes. If you need true HA, you will need to spend the money required for a commercial solution, such as Oracle RAC.
Please note: this blog contains code examples provided for your reference. All sample code is provided for illustrative purposes only. Use of information appearing in this blog is solely at your own risk. Please read our full disclaimer for details.