Quick start instructions for transition from 2 node heartbeat to DRCM
RHEL6/CENTOS6 bare metal NFS cluster example
Last modeified: 08/14/2016

NOTE: The 2-way file replication at the end of this document is not part of heartbeat.
NOTE: The 2-way file replication transition only applies to clusters using the 
NOTE: file_system_sync addon for HA clusters.

On both cluster nodes:

# cd /root
# crontab -l > crontab
# crontab -r

# cd /var/spool/cron
# ls -rtal

Backup and remove all user crons as required by your server setup.

On both cluster nodes:

# service heartbeat stop
# chkconfig heartbeat off

Check for deprecated ATRPMS repo:

# ls -rtal /etc/yum.repos.d/atrpms.repo
# emacs /etc/yum.repos.d/atrpms.repo

enabled=0

Remove heartbeat:

# yum remove heartbeat-devel heartbeat-libs heartbeat pacemaker
# mv /etc/ha.d /etc/ha.d_old_archive_only

From user account on both nodes:

$ cd $HOME
$ mkdir -p git
$ cd git
$ git clone https://github.com/datareel/datareel
$ cd $HOME/git/datareel/cluster_manager/drcm_server
$ make
$ sudo su root -c 'make install_root'

As root, on primary node:

# /usr/sbin/drcm_server --check-config
# emacs /etc/drcm/cm.cfg /etc/ha.d_old_archive_only/ha.cf

[CM_NODE]
nodename = nfs1
hostname = nfs1.example.com
keep_alive_ip = 192.168.2.111

from ha.cf:

node nfs1.example.com
node nfs2.example.com
ucast p2p1 192.168.2.112

from ha.cf on backup node:

ucast p2p1 192.168.2.111

NOTE: Must get ucast value from the ha.cf from both nodes. In heartbaet the order is reversed.

# emacs /etc/drcm/cm.cfg /etc/ha.d_old_archive_only/haresources

[CM_IPADDRS]
web, 192.168.2.15, 26, em1, /etc/drcm/resources/ipv4addr.sh
db, 192.168.2.17, 24, em2, /etc/drcm/resources/ipv4addr.sh

from haresources

nfs1.example.com IPaddr2::192.168.2.15/26/em1 IPaddr2::192.168.2.17/24/em2 Crontab::system_crons Crontab::user_crons Crontab::user_crons mysqld::start

# cp /etc/ha.d_old_archive_only/cron.d/system_crons /etc/drcm/crontabs/nfs_system_crons
# cp /etc/ha.d_old_archive_only/cron.d/user_crons /etc/drcm/crontabs/nfs_user_crons

Review all crontabs:

# emacs /etc/drcm/crontabs/nfs_system_crons
# emacs /etc/drcm/crontabs/nfs_user_crons

# emacs /etc/drcm/cm.cfg

[CM_CRONTABS]
system, /etc/drcm/crontabs/nfs_system_crons, /etc/cron.d, /etc/drcm/resources/crontab.sh
user, /etc/drcm/crontabs/nfs_user_crons, /etc/cron.d, /etc/drcm/resources/crontab.sh

[CM_SERVICES]
mysql, mysqld, /etc/drcm/resources/service.sh

[CM_NODE]
nodename = nfs1
...
node_crontabs = user,system
node_backup_crontabs =
node_floating_ip_addrs = web, db
node_services = mysql
node_backup_services =
node_applications =
node_filesystems =
node_backup_filesystems =


[CM_NODE]
nodename = nfs2
...
node_crontabs =	
node_backup_crontabs = nfs1:user, nfs1:system
node_floating_ip_addrs =
node_backup_floating_ip_addrs = nfs1:web, nfs1:db
node_services =
node_backup_services = nfs1:mysql
node_applications =
node_backup_applications =
node_filesystems =
node_backup_filesystems =

On primary node:

# dd if=/dev/urandom bs=1024 count=2 > /etc/drcm/.auth/authkey
# sha1sum /etc/drcm/.auth/authkey | cut -b1-40 > /etc/drcm/.auth/authkey.sha1
# chmod 600 /etc/drcm/.auth/authkey /etc/drcm/.auth/authkey.sha1

# /usr/sbin/drcm_server --check-config

# ifconfig

Check for floating IP addresses

# service drcm_server start; service drcm_server status
# /usr/sbin/drcm_server --client --command=cm_stat

Depending on you cm.cfg setting the cluster resource will activate with 1 to 5 mintues:

# /usr/sbin/drcm_server --client --command=cm_stat

  Node status: Cluster has 1 node DOWN
  Resource status: All cluster resources in NORMAL state
  Cluster status: DOWN nodes

# ifconfig
# chkconfig drcm_server on

Mirror /etc/drcm directory to back up node:

# rsync -avc -n /etc/drcm nfs2:/etc/.

Only after testing with command above:

# rsync -avc /etc/drcm nfs2:/etc/.

On backup node:

# /usr/sbin/drcm_server --check-config
# chkconfig drcm_server on
# service drcm_server start; service drcm_server status

# /usr/sbin/drcm_server --client --command=cm_stat

  Node status: All cluster nodes are UP
  Resource status: All cluster resources in NORMAL state
  Cluster status: HEALTHY

SETUP FOR 2-WAY FILE REPLICATION:
---------------------------------
On primary node, setup file replication if needed:

# emacs /etc/drcm/my_cluster_conf/my_cluster_info.sh /etc/ha.d_old_archive_only/my_cluster_info.cfg

export CLUSTER_NAME="nfs"
export BACKUP_STORAGE_SERVER="192.168.2.1"
export BACKUP_STORAGE_LOCATION="/backups"
export BACKUP_USER="root"
export PRIMARY_FILESYSTEMS_IP="192.168.2.111"
export PRIMARY_FILESYSTEMS_ETH="p2p1"
export BACKUP_FILESYSTEMS_IP="192.168.2.112"
export BACKUP_FILESYSTEMS_ETH="p2p1"
export PRIMARY_FILESYSTEMS_RSYNC_IP="192.168.3.111"
export BACKUP_FILESYSTEMS_RSYNC_IP="192.168.3.112"

from /etc/ha.d_old_archive_only/my_cluster_info.cfg:

CLUSTERNAME="nfs"
NODE_INTERFACE="em1:1"
HB1_INTERFACE="p2p1"
HB2_INTERFACE="p2p2"

NOTE: PRIMARY_FILESYSTEMS_RSYNC_IP = keep alive IP
NOTE: BACKUP_FILESYSTEMS_RSYNC_IP = keep alive IP

NOTE: PRIMARY_FILESYSTEMS_RSYNC_IP = ifcfg p2p2
NOTE: BACKUP_FILESYSTEMS_RSYNC_IP = ssh nfs2 ifcfg p2p2

NOTE: BACKUP_STORAGE_SERVER = Server used to backup all nodes
NOTE: BACKUP_STORAGE_LOCATION = The DIR on storage server where you keep your backups

Copy over FS sync configurations:

# cp /etc/ha.d_old_archive_only/file_system_sync/*.cfg /etc/drcm/my_cluster_conf/.
# cd /etc/drcm/my_cluster_conf
# chmod 644 *.cfg
# files=$(find -name "*.cfg" -print)
# for f in ${files}; do newname=$(echo $f | sed s/.cfg/.sh/g); mv ${f} ${newname}; done
# mv samba_sync_list.sh cifs_sync_list.sh

# /etc/drcm/file_system_sync/backup_to_shared_storage.sh
# /etc/drcm/file_system_sync/sync_testfs.sh

Mirror /etc/drcm directory to back up node:

# rsync -avc -n /etc/drcm nfs2:/etc/.

Only after testing with command above:

# rsync -avc /etc/drcm nfs2:/etc/.

On the primary node, add the following to root's crontab:
# crontab -e -u root

*/5 * * * *  /etc/drcm/file_system_sync/sync_mysql.sh > /dev/null 2>&1
*/5 * * * *  /etc/drcm/file_system_sync/sync_pgsql.sh > /dev/null 2>&1
*/5 * * * *  /etc/drcm/file_system_sync/sync_testfs.sh > /dev/null 2>&1
*/6 * * * *  /etc/drcm/file_system_sync/sync_cifs.sh > /dev/null 2>&1
*/7 * * * *  /etc/drcm/file_system_sync/sync_nfs.sh > /dev/null 2>&1
*/8 * * * *  /etc/drcm/file_system_sync/sync_www.sh > /dev/null 2>&1
*/11 * * * * /etc/drcm/file_system_sync/sync_other.sh > /dev/null 2>&1

On the backup node, add the following to root's crontab:
# crontab -e -u root

* * * * * /etc/drcm/file_system_sync/sync_mysql.sh > /dev/null 2>&1
* * * * * /etc/drcm/file_system_sync/sync_pgsql.sh > /dev/null 2>&1
* * * * * /etc/drcm/file_system_sync/sync_testfs.sh > /dev/null 2>&1
* * * * * /etc/drcm/file_system_sync/sync_cifs.sh > /dev/null 2>&1
* * * * * /etc/drcm/file_system_sync/sync_nfs.sh > /dev/null 2>&1
* * * * * /etc/drcm/file_system_sync/sync_www.sh > /dev/null 2>&1
* * * * * /etc/drcm/file_system_sync/sync_other.sh > /dev/null 2>&1

To test the FS fail over and fail back:

On primary node:

# /etc/drcm/utils/manual_fs_failover.sh
# /usr/sbin/drcm_server --client --command=cm_stat

To fail back:

service drcm_server restart