Friday, 13 December 2013

Migrate sun4u to sun4v using FLAR

FLAR stands for FLash ARchive. If we need to install a few Solaris servers and customize them in a similar way.
Instead of installing and configuring one by one, we can install and configure one server first, then create FLAR on this server, and install the other servers using this FLAR.
FLAR can also be used for server migration.

The steps of creating FLAR and installing from FLAR is very straight forward. However if we are migrating from servers with older CPU to servers with different CPU architecture, we have to do some extras steps.
Recently I migrated a few servers from sun4u to sun4v using FLAR, Here are my migration steps:
create FLAR image on sun4u server
  1. By default FLAR created on sun4u cannot be used on sun4v servers, we have to add sun4v as a supported architecture for FLAR.
    # echo "PLATFORM_GROUP=sun4v" >> /var/sadm/system/admin/.platform

    # flarcreate -n "migration flar" -c -S -x /flar -x /export/home /flar/migration.flar

    Alternatively we can add -U flag to create FLAR with sun4v support
    # flarcreate -n "migration flar" -U "content_architectures=sun4u,sun4v" -c -S -x /flar -x /export/home /flar/migration.flar

  2. Verify our FLAR can be used on sun4v machines:
    # flar -i /flar/migration.flar | grep content_architectures
  3.  Now move the FLAR to a storage accessible from sun4v machines, this can either be NFS, HTTP, or a local hard drive.
  4. Boot sun4v machine, choose Flash Install, and select the migration.flar we created in step 1.
  5. After installation completes, reboot server, we will get this error:
    cannot open boot_archive
  6. To boot the server properly, we need to upgrade the sun4v machine:
    boot sun4v machine from DVD or Jumpstart, select Upgrade, after upgrade finishes, reboot server.
We have successfully migrated sun4u to sun4v machine!

Reference:
http://docs.oracle.com/cd/E19253-01/821-0436/samekernel/index.html

Tuesday, 26 November 2013

Using tmpfs to improve Nagios performance

Nagios is an excellent monitoring tool. We can monitor servers, network devices using Nagios.
Besides many of the useful plugins at nagios exchange (http://exchange.nagios.org) , we can also write our own plugins using shell scripts.

We can set up Nagios monitoring server by following Setting up Nagios monitoring server, the default setting and configuration is sufficient if we are only monitoring a few servers. However as the number of monitored hosts and services increases, we will notice the check latencies.
This is because Nagios needs continuously updating some files on disk, when there are more items to monitor, there are also more disk I/O required, eventually I/O will become the bottle neck, it's slowing down the Nagios check.

To solve this problem, we need to improve IO performance or reduce IO requests, we can install Nagios on SSD disk, but it's not cost effective.

In an earlier post using tmpfs to improve PostgreSQL performance, to boost the performance of PostgreSQL, we pointed stats_temp_directory to tmpfs.
Similarly, if some files are only needed when Nagios is running, we can move them to tmpfs, thus reduce IO requests.
In Nagios there are a few key files that affect disk I/O, they are:
1. /usr/local/nagios/var/status.data, this status file stores the current status of all monitored services and hosts, it's being consistently updated as defined by status_update_interval, in my default nagios installation, status_file is updated every 10 seconds.
The contents of the status file are deleted every time Nagios restarts, so it's only useful when nagios is running.
[root@centos /usr/local/nagios/etc]# grep '^status' nagios.cfg
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10

2. /usr/local/nagios/var/objects.cache, this file is a cached copy of object definitions, and CGIs read this file the get the object definitions.
the file is recreated every time Nagios starts, So objects.cache doesn't need to be on non-volatile storage.
[root@centos /usr/local/nagios/etc]# grep objects.cache nagios.cfg
object_cache_file=/usr/local/nagios/var/objects.cache

3. /usr/local/nagios/var/spool/checkresults, all the incoming check results are stored here, while Nagios is running, we will notice that files are being created and deleted constantly, so checkresults can also be moved to tmpfs
[root@centos /usr/local/nagios/etc]# grep checkresults nagios.cfg
check_result_path=/usr/local/nagios/var/spool/checkresults
[root@centos /usr/local/nagios/etc]#

[root@centos /usr/local/nagios/var/spool/checkresults]# ls
checkP2D5bM  cn6i6Ld  cn6i6Ld.ok
[root@centos /usr/local/nagios/var/spool/checkresults]# head -4 cn6i6Ld
### Active Check Result File ###
file_time=1385437541

### Nagios Service Check Result ###
[root@centos /usr/local/nagios/var/spool/checkresults]#

So we can move status.data, objects.cache and checkresults to tmpfs, but before that we need to mount the file system first
[root@centos ~]# mkdir -p /mnt/nagvar
[root@centos ~]# mount -t tmpfs tmpfs /mnt/nagvar -o size=50m
[root@centos ~]# df -h /mnt/nagvar
Filesystem            Size  Used Avail Use% Mounted on
tmpfs                  50M     0   50M   0% /mnt/nagvar
[root@centos ~]# mount | grep nagvar
tmpfs on /mnt/nagvar type tmpfs (rw,size=50m)

create directory for checkresults
[root@centos ~]# mkdir -p /mnt/nagvar/spool/checkresults
[root@centos ~]# chown -R nagios:nagios /mnt/nagvar

modify nagios.cfg
status_file=/mnt/nagvar/status.dat
object_cache_file=/mnt/nagvar/objects.cache
check_result_path=/mnt/nagvar/spool/checkresults

restart nagios so our changes will take effect
[root@centos ~]# service nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.

we can see, nagios is using /mnt/nagvar
[root@centos ~]# tree /mnt/nagvar/
/mnt/nagvar/
├── objects.cache
├── spool
│   └── checkresults
│       ├── ca8JfZI
│       └── ca8JfZI.ok
└── status.dat

2 directories, 4 files

We can configure /etc/fstab to mount /mnt/nagvar everytime system reboots.
[root@centos ~]# echo <<EOF >> /etc/fstab
tmpfs      /mnt/nagvar    tmpfs   defaults,size=50m    0 0
EOF

But the directory /mnt/nagvar/spool/checkresults will be gone after /mnt/nagvar is re-mounted, so we need to create this directory before starting up Nagios.
we can update /etc/init.d/nagios, add this lines after the first line:
mkdir -p /mnt/nagvar/spool/checkresults
chown -R nagios:nagios /mnt/nagvar

[root@centos ~]# sed -i '1a\
mkdir -p /mnt/nagvar/spool/checkresults\
chown -R nagios:nagios /mnt/nagvar' /etc/init.d/nagios

Since we have moved the files to tmpfs, there is no disk I/O on these files, we can see great performance improvement of Nagios.

Reference:
http://assets.nagios.com/downloads/nagiosxi/docs/Utilizing_A_RAM_Disk_In_NagiosXI.pdf

Thursday, 21 November 2013

setup nginx web server with PHP

Nginx (engine x) is a high performance lightweight HTTP server, more and more sites are using nginx, according to Netcraft survey (http://news.netcraft.com/archives/2013/11/01/november-2013-web-server-survey.html), nginx powers 15% of the busies sites in November 2013.

Nginx installation is very straight forward, we can download latest source code from http://nginx.org/en/download.html or point our yum source to http://nginx.org/packages/OS/OSRELEASE/$basearch/ and install using yum.
Replace “OS” with “rhel” or “centos”, depending on the distribution used, and “OSRELEASE” with “5” or “6”, for 5.x or 6.x versions, respectively.
So for CentOS 6.3, we can point our YUM source to: http://nginx.org/packages/centos/6/$basearch/

Tuesday, 5 November 2013

How to recover deleted open files

In Linux, a file is deleted completed when:
  1. No more hard link reference to the file
  2. Processes opening the file is terminated
From why du and df show different filesystem usage (http://linuxscripter.blogspot.com/2013/11/why-du-and-df-show-different-filesystem.html), we know that if we delete an open file, Linux won't release its space until the opening processes are stopped.

So if we delete an open file by mistake, is there a way to recover it?
YES, we can check which process is opening the file, and recover file content by checking the process file descriptor.

Again, let's assume our Apache error_log is deleted, we can check which process is opening this file:
[root@centos ~]# lsof | sed -n '1p;/error_log.*deleted/p'
COMMAND    PID      USER   FD      TYPE     DEVICE SIZE/OFF       NODE NAME
httpd     3155      root    2w      REG      253,0      370      15396 /var/log/httpd/error_log (deleted)
httpd     3157    apache    2w      REG      253,0      370      15396 /var/log/httpd/error_log (deleted)
httpd     3158    apache    2w      REG      253,0      370      15396 /var/log/httpd/error_log (deleted)
httpd     3159    apache    2w      REG      253,0      370      15396 /var/log/httpd/error_log (deleted)
httpd     3160    apache    2w      REG      253,0      370      15396 /var/log/httpd/error_log (deleted)
httpd     3161    apache    2w      REG      253,0      370      15396 /var/log/httpd/error_log (deleted)
httpd     3162    apache    2w      REG      253,0      370      15396 /var/log/httpd/error_log (deleted)
httpd     3163    apache    2w      REG      253,0      370      15396 /var/log/httpd/error_log (deleted)
httpd     3164    apache    2w      REG      253,0      370      15396 /var/log/httpd/error_log (deleted)

From the output we know that, process 3155 still opens this file. Go to /proc/3155/fd/ to confirm this, 2w means error_log is opened for write, and we need to check softlink "2" in /proc/3155/fd.
[root@centos ~]# cd /proc/3155/fd/
[root@centos fd]# ls -l 2
l-wx------ 1 root root 64 Nov  5 09:46 2 -> /var/log/httpd/error_log (deleted)
[root@centos fd]# tail 2
[Tue Nov 05 09:46:35 2013] [notice] suEXEC mechanism enabled (wrapper: /usr/sbin/suexec)
[Tue Nov 05 09:46:35 2013] [notice] Digest: generating secret for digest authentication ...
[Tue Nov 05 09:46:35 2013] [notice] Digest: done
[Tue Nov 05 09:46:35 2013] [notice] Apache/2.2.15 (Unix) DAV/2 PHP/5.3.3 mod_wsgi/3.2 Python/2.6.6 configured -- resuming normal operations

To recover the content of error_log, we can just copy 2 to a temporary location, stop Apache and rename copy of 2 to /var/log/httpd/error_log

[root@centos fd]# cp 2 /tmp/error_log
[root@centos ~]# service httpd stop
Stopping httpd:                                            [  OK  ]
[root@centos ~]# mv /tmp/error_log /var/log/httpd/error_log

[root@centos ~]# service httpd start
Starting httpd:                                            [  OK  ]

Monday, 4 November 2013

why du and df show different filesystem usage

Today I saw a forum post asking on his system, why df shows over 200G space used, while du only shows 50G ussed.
Most probably this is caused by open files being deleted.
When a file is opened by a process, deleting the file won't release the space occupied by it. we also need to terminate the process, otherwise df and du will show different filesystem usage.

Let's assume du and df show huge difference for /var, we can check which open files are deleted and the processes opening them.

[root@linux ~]# lsof | sed -n '1p;/var.*deleted/p'
COMMAND    PID     USER   FD      TYPE     DEVICE SIZE/OFF       NODE NAME
httpd     1779     root   10w      REG      253,0    22034       8117 /var/log/httpd/access_log (deleted)
httpd     1810   apache   10w      REG      253,0    22034       8117 /var/log/httpd/access_log (deleted)
httpd     1811   apache   10w      REG      253,0    22034       8117 /var/log/httpd/access_log (deleted)
httpd     1812   apache   10w      REG      253,0    22034       8117 /var/log/httpd/access_log (deleted)
httpd     1813   apache   10w      REG      253,0    22034       8117 /var/log/httpd/access_log (deleted)
httpd     1814   apache   10w      REG      253,0    22034       8117 /var/log/httpd/access_log (deleted)
httpd     1815   apache   10w      REG      253,0    22034       8117 /var/log/httpd/access_log (deleted)
httpd     1816   apache   10w      REG      253,0    22034       8117 /var/log/httpd/access_log (deleted)
httpd     1817   apache   10w      REG      253,0    22034       8117 /var/log/httpd/access_log (deleted)

From the output we can see that access_log is deleted, but Apache was not restarted, httpd process still has this file open.
to release the space, we can restart httpd process:
[root@linux ~]# service httpd restart
Stopping httpd:                                            [  OK  ]
Starting httpd:                                            [  OK  ]

Now check again
[root@linux ~]# lsof | sed -n '1p;/var.*deleted/p'
COMMAND    PID     USER   FD      TYPE     DEVICE SIZE/OFF       NODE NAME

And du df report same file system usage.

Before we restart httpd process, Linux won't release the space used by access_log, if access_log is deleted by mistake, is there a way to recover it?
Yes, I will demo how to recover deleted open files in http://linuxscripter.blogspot.com/2013/11/how-to-recover-deleted-open-files.html

Saturday, 2 November 2013

Use puppet to manage linux servers

Puppet is a configuration management system, using puppet we can easily manage thousands of Linux servers. If we have configured our system using epel source, we can directly install puppet using YUM. Alternatively we can download the software from puppetlabs.org and follow document to install it.

To install manually, our system must have ruby installed, ruby rpm files can be found on linux installation media, if we have a local yum repository, we can install ruby using yum.

After ruby is installed, we can download and install puppet, facter is also required for puppet. we download the stable versions is facter-1.7.2.tar.gz and puppet-3.2.2.tar.gz.


1. Install puppet on both puppet master and agent
# tar -zxpf facter-1.7.2.tar.gz
# cd facter-1.7.2
# ruby install.rb
# cd ..
# tar -zxpf puppet-3.2.2.tar.gz
# cd puppet-3.2.2
# ruby install.rb

2. start puppet master
# puppet master

3. on agent, edit /etc/puppet/puppet.conf
[main]
server = centos.local.vb
certificate_revocation = false
ssldir=/var/lib/puppet/ssl

4. connect puppet master for the first time, this will generate an ssl signing request
# puppet agent --no-daemonize --onetime --verbose
Info: Creating a new SSL certificate request for centos-1.local.vb
Info: Certificate Request fingerprint (SHA256): B8:67:94:4C:2A:23:2F:90:D8:4E:34:CC:AF:48:B0:04:BA:82:7F:D2:E3:7F:B7:9A:78:35:18:87:EB:05:D5:61
Exiting; no certificate found and waitforcert is disabled


5. On puppet master, sign the ssl request from puppet agent
[root@centos ~]# puppet cert list
"centos-1.local.vb" (SHA256) B8:67:94:4C:2A:23:2F:90:D8:4E:34:CC:AF:48:B0:04:BA:82:7F:D2:E3:7F:B7:9A:78:35:18:87:EB:05:D5:61
[root@centos ~]# puppet cert sign "centos-1.local.vb"
Notice: Signed certificate request for centos-1.local.vb
Notice: Removing file Puppet::SSL::CertificateRequest centos-1.local.vb at '/var/lib/puppet/ssl/ca/requests/centos-1.local.vb.pem'


6. Now we can manage our linux servers from puppet master. If we want to manage httpd service, we can create an httpd module

# mkdir -p /etc/puppet/modules/httpd

Every module stores its configuration in manifests/init.pp file, so we need to create /etc/puppet/modules/httpd/manifests/init.pp


class httpd {
package { "httpd":
ensure => installed,
}

service { "httpd":
ensure => running,
enable => true,
}

file { "/var/www/html/index.html":
ensure => present,
group => "root",
owner => "root",
mode => "0644",
source => "puppet:///modules/httpd/puppet.index.html"
}
}

source => "puppet:///modules/httpd/puppet.index.html" is telling puppet agent that it needs to get index.html from puppet master, the file location on master is: /etc/puppet/modules/httpd/files/puppet.index.html

# echo i am from puppet index.html ! > /etc/puppet/modules/httpd/files/puppet.index.html
we have a httpd module, to manage agent, we also need to define our node files, we can define this in /etc/puppet/manifests/site.pp
node centos-1 {
include httpd
}

7. test our configuration on centos-1:
[root@centos-1 html]# puppet agent --no-daemonize --onetime --verbose
Info: Retrieving plugin
Info: Caching catalog for centos-1.local.vb
Info: Applying configuration version '1383379216'
Notice: /Stage[main]/Httpd/Service[httpd]/ensure: ensure changed 'stopped' to 'running'
Notice: /Stage[main]/Httpd/File[/var/www/html/index.html]/ensure: defined content as '{md5}33f97919a4e508801272b7889f34e332'
Notice: Finished catalog run in 0.70 seconds
Puppet supports regular expressions in its configuration files, if all the servers centos-1 centos-2 centos-999 have same configuration, instead of repeating the node definitions 999 times, we can represent them using one regular expression.
node /^centos-\d$/
Puppet also supports import in its configuration files, if all our agents have different configuration files, besides constructing a big site.pp, we can have 1 configuration file for each agent, centos-1.pp centos-2.pp centos-999.pp, and then import them from site.pp
import "nodes/*"
As the environment grows, we will have more and more configuration files in nodes directory, it's not very efficient to manage many files, puppet has a feature called External Node Classifier (ENC), using ENC we can replace text-based node definition files with LDAP, database, or whatever data sources suitable for our environment.

Friday, 11 October 2013

use HAProxy as HTTP load balancer

HAProxy can provide high availability, load balancing, and proxying for TCP and HTTP based application.

In my lab setup of HAProxy, I have 3 servers

centos-1, running apache
centos-2, running apache
centos, running HAProxy, HTTP requests to it will be forwarded to the other 2 servers

Here are my steps of setting up HAProxy to load balance between to HTTP servers.

1. on centos-1 and centos-2: install apache
# yum -y install httpd
create index.html for apache
# hostname -s > /var/www/html/index.html
disable iptables, start apache
# service iptables stop
# service httpd start

2. on centos, download HAProxy, the latest stable version is 1.4.24
# ./haproxy -v
HA-Proxy version 1.4.24 2013/06/17
Copyright 2000-2013 Willy Tarreau <w@1wt.eu>

3. edit haproxy configuration file, my sample haproxy.cfg, note this is for testing only, for complete configuration options, refer to http://cbonte.github.io/haproxy-dconv/configuration-1.4.html
global
        daemon
        maxconn 256

defaults
        mode http
        timeout connect 5000ms
        timeout client 50000ms
        timeout server 50000ms

frontend http-in
        bind *:80
        default_backend servers

backend servers
        balance roundrobin
        server server1 centos-1:80 maxconn 32
        server server2 centos-2:80 maxconn 32

4. start HAProxy
# ./haproxy -f haproxy.cfg

5. launch web browser, visit http://192.168.100.100, you will see the page displaying "centos-1", this shows our request is being served by centos-1
refresh the page again, page will display "centos-2", our request is being served by centos-2 this time.

If we change balance roundrobin to balance source, our request will always being served by the same host, but using sticky session is still the preferred way to make sure clients always connect to the same backend server.

The setup is done, but it needs more tweaks before it can be used in production environment.things to check:
1. check backend server aliveness.
2. sticky session.
3. we need to change apache LogFormat on centos-1 centos-2 to reflect the real HTTP client address.

In our setup centos is the only server running HAProxy, it will be a single point of failure, we can setup centos cluster, and make HAProxy listening on the VIP.


reference:
http://haproxy.1wt.eu/
http://cbonte.github.io/haproxy-dconv/configuration-1.4.html

Tuesday, 24 September 2013

Using tmpfs to improve PostgreSQL performance

In PostgreSQL, the statistics collector collects the following information:
  1. statistics about table and index accesses
  2. usage of user-defined functions
  3. current command executed by any server process.

The statistics collector then passes the information to backends using temporary files, location of the temporary files are defined by stats_temp_directory in postgresql.conf, by defaults it points to $PGDATA/pg_stat_tmp
As PostgreSQL is running, there are continuous I/O in stats_temp_directory, the disk IO may affect database performance. PostgreSQL recommends to point stats_temp_directory to RAM-based file system, decreasing physical I/O, thus increasing database performance.

We can use either ramfs or tmpfs, for the differences of the two, see http://www.thegeekstuff.com/2008/11/overview-of-ramfs-and-tmpfs-on-linux/

I will use tmpfs for stats_temp_directory, here are my steps
1. create directory for our new mount point:
# mkdir /mnt/tmp

2. mount tmpfs
# mount -t tmpfs tmpfs /mnt/tmp/ -o size=200m
now we have a file system of 200M.
# df -h /mnt/tmp/
Filesystem            Size  Used Avail Use% Mounted on
tmpfs                 200M  8.0K  200M   1% /mnt/tmp
every time the server reboots, /mnt/tmp will be gone, to make the configuration persistent, add this line to /etc/fstab
tmpfs    /mnt/tmp    tmpfs   defaults,size=200m    0 0


3. edit postgres.conf, add this line:
stats_temp_directory = '/mnt/tmp'

4. restart database
$ pg_ctl -D /var/lib/pgsql/data restart

5. confirm we are using the tmpfs
[postgres@linux ~]$ psql
psql (8.4.11)
Type "help" for help.

postgres=# show stats_temp_directory;
 stats_temp_directory
----------------------
 /mnt/tmp
(1 row)

postgres=# \q
[postgres@linux ~]$ ls -lh /mnt/tmp/
total 8.0K
-rw------- 1 postgres postgres 6.0K Sep 24 13:41 pgstat.stat
[postgres@linux ~]$
Similarly, we can use tmpfs to improve the performance of Nagios server.
reference:
http://www.postgresql.org/docs/9.1/static/monitoring-stats.html

Tuesday, 10 September 2013

How to install Zabbix agent and monitor remote linux servers

I installed Zabbix server in my previous post: http://linuxscripter.blogspot.com/2013/09/how-to-install-zabbix-server-on-linux.html
To monitor remote servers, we can install Zabbix agent.
Installing Zabbix agent is quite straight forward

Here are my installation steps:
1. create user account for zabbix
# useradd zabbix
2. download current stable version zabbix-2.0.8.tar.gz from www.zabbix.com
3. configure and install zabbix agent
# tar -zxpf zabbix-2.0.8.tar.gz
# cd zabbix-2.0.8
#  ./configure --enable-agent --prefix=/usr/local/zabbix
# make install
4. modify /usr/local/zabbix/etc/zabbix_agentd.conf
In previouse post, we installed zabbix server on 192.168.100.100, so modify the Server parameter
Server=192.168.100.100
5. restart zabbix_agentd to reload the new configuration file
now we can monitor this host from zabbix server.

To monitor remote host, all the configuration are done in the Zabbix web interface.
1. open web browser, go to http://zabbix-server/zabbix
2. Configuration -> Hosts -> Create host
Enter the host name, IP address, choose "Linux servers" as the host group.


3. After saving, you will see the host linux-1 is monitored by zabbix now.


 4. At the monitoring dashboard, you will see issues are shown in different colors


Reference: https://www.zabbix.com/documentation/2.0/manual/installation/install#from_the_sources

How to install Zabbix server on Linux server

Zabbix is an open source monitoring server, besides monitoring the service status, it also stores the historical status of service, so we can generate graphs easily from Zabbix.  I have been using Nagios for many years, recently I decided to give Zabbix a try.

Here are the steps I installed server.

First we need to download zabbix software, current stable version is 2.0.8, so I downloaded zabbix-2.0.8.tar.gz

1. create OS account for running zabbix server
# useradd zabbix
2.
# tar -zxpf zabbix-2.0.8.tar.gz
# cd zabbix-2.0.8

3. Zabbix uses databases as its data storage, it supports MySQL, PostgreSQL, Oracle, DB2 and SQLite. In my setup I use MySQL.
Before Installing zabbix, we need to configure the database properly.
Follow instructions on this link to setup the database https://www.zabbix.com/documentation/2.0/manual/appendix/install/db_scripts
For MySQL:
mysql> create database zabbix character set utf8 collate utf8_bin;
mysql> grant all privilegs on zabbix.* to 'zabbix'@'localhost' identified by 'zabbix';
mysql> exit;
# mysql -uzabbix -pzabbix zabbix < database/mysql/schema.sql
# mysql -uzabbix -pzabbix zabbix < database/mysql/images.sql
# mysql -uzabbix -pzabbix zabbix < database/mysql/data.sql


4.
# ./configure --enable-server --with-mysql --with-net-snmp \
  --with-libcurl --prefix=/usr/local/zabbix

# make install


5. start zabbix server
# /usr/local/zabbix/sbin/zabbix_server

6. Install zabbix web interface:
# mkdir /var/www/html/zabbix/
# cp -pr frontends/php/* /var/www/html/zabbix/

7. Installing zabbix frontend
Open browser, go to http://localhost/zabbix, follow the instructions for frontend installation wizard to complete the installation.

8. After Installing, you will be able to login zabbix system, the default login ID is Admin, password is zabbix.

Next we need to install zabbix agent on remote hosts and zabbix server to monitor remote hosts.

Reference: https://www.zabbix.com/documentation/2.0/manual/installation/install#from_the_sources

Tuesday, 3 September 2013

Using Subversion to maintain programs

I am using subversion to keep all my shell scripts, it helps to maintain the scripts change history:

To use subversion, first we need to create a repository, on svn server
# svnadmin create /usr/local/scripts
cat <<EOF > /usr/local/scripts/conf/svnserve.conf
anon-access = none
auth-access = write
password-db = passwd
EOF

The rest of the parts are all done on clients:
To add files to subversion repository, we can run
svn import /usr/local/scripts/check_mem.sh \
svn+ssh://lnxscrpt@linux/usr/local/scripts/check_mem.sh

Adding         /usr/local/scripts/check_mem.sh
Alternatively, we can check out an SVN copy, update on our local copy, then check in whatever changes done on local copy to SVN repository
1. To check out
$ mkdir ~/work; cd ~/work
$ svn co svn+ssh://lnxscrpt@linux/usr/local/scripts
Checked out revision 0.

We will have ~/work/scripts directory now

After we check out, other people may have made some changes to the repository, it's always a good practice to update our local copy before changing anything.
$ svn up scripts
At revision 0.

2. To add check_load.sh, copy check_load.sh to ~/work/scripts, and run
$  svn add scripts/check_load.sh
A         scripts/check_load.sh

3. Commit our changes:
$ svn ci scripts/
Adding         scripts/check_load.sh
Transmitting file data .
Committed revision 1.

4. Make sure check_load.sh is in SVN repository:
$  svn ls svn+ssh://lnxscrpt@linux/usr/local/scripts
check_load.sh
check_mem.sh

5. check svn log
$ svn log svn+ssh://lnxscrpt@linux/usr/local/scripts/check_load.sh
------------------------------------------------------------------------
r1 | lnxscrpt | 2013-09-03 14:25:11 +0800 (Tue, 03 Sep 2013) | 2 lines

added check_load.sh at 2:25pm

------------------------------------------------------------------------

$ svn log svn+ssh://lnxscrpt@linux/usr/local/scripts/check_mem.sh
------------------------------------------------------------------------
r5 | lnxscrpt | 2013-09-03 14:44:38 +0800 (Tue, 03 Sep 2013) | 2 lines

import check_mem.sh

------------------------------------------------------------------------

Note: whenever we svn commands, we will be prompted the password for lnxscrpt@linux, we can set up password ssh login by following: http://linuxscripter.blogspot.com/2012/05/set-up-password-less-ssh-login-using.html

Monday, 2 September 2013

Using Nagios and NRPE to monitor remote hosts

We have setup Nagios to monitor our localhost centos, but how to monitor remote hosts? We can monitor remote hosts using SNMP or NRPE
In our testing environment, we will setup Nagios to monitor host centos-1 and centos-2 using NRPE.
To use NRPE, we need to install NRPE on both monitoring host and remote hosts:

Steps to setup remote hosts for NRPE monitoring:
1. Download nagios-plugins-1.4.16.tar.gz, nrpe-2.14.tar.gz
2. Create account for NRPE service
# useradd nagios
# passwd nagios

3. Install nagios-plugs
# tar zxpf nagios-plugins-1.4.16.tar.gz
# cd nagios-plugins-1.4.16
# ./configure --with-nagios-user=nagios \
  --with-nagios-group=nagios

# make
# make install
# chown nagios:nagios /usr/local/nagios

4. Install NRPE
# tar zxpf nrpe-2.14.tar.gz
# cd nrpe-2.14
# ./configure
# make all
# make install-plugin
# make install-daemon
# make install-daemon-config

5. Modify /usr/local/nagios/etc/nrpe.cfg to allow host centos(192.168.100.100) to connect to NRPE
allowed_hosts=127.0.0.1,192.168.100.100

6. Start NRPE, enable autostart to NRPE
# /usr/local/nagios/bin/nrpe -c \
  /usr/local/nagios/etc/nrpe.cfg -d

# echo '/usr/local/nagios/bin/nrpe -c \
  /usr/local/nagios/etc/nrpe.cfg -d' >> /etc/rc.local

7. Verify NRPE is working properly
# /usr/local/nagios/libexec/check_nrpe -H localhost
NRPE v2.14

Steps to setup monitoring host for NRPE monitoring
1. Download nrpe-2.14.tar.gz
2. Install NRPE
# tar -zxpf nrpe-2.14.tar.gz
# cd nrpe-2.14
# ./configure
# make all
# make install-plugin

3. Enable monitoring of centos-1
Update /usr/local/nagios/etc/objects/commands.cfg to enable command check_nrpe
# ‘check_nrpe’ command definition
define command{
        command_name check_nrpe
        command_line $USER1$/check_nrpe -H $HOSTADDRESS$ -t 30 -c $ARG1$
}

update /usr/local/nagios/etc/nagios.cfg to include this line
cfg_file=/usr/local/nagios/etc/objects/centos-1.cfg

4. Edit /usr/local/nagios/etc/objects/centos-1.cfg
define host{
        use                 linux-server
        host_name           centos-1
        alias               centos-1
        address             192.168.100.101
        }


define hostgroup{
        hostgroup_name      centos
        alias               centos
        members             centos-1
        }


define service{
        use                 generic-service
        host_name           centos-1
        service_description PING
        check_command       check_ping!100.0,20%!500.0,60%
        }

define service{
        use                 generic-service
        host_name           centos-1
        service_description Server Load
        check_command       check_nrpe!check_load
        }

define service{
        use                 generic-service
        host_name           centos-1
        service_description /rhiso Usage
        check_command       check_nrpe!check_rhiso
        }

5. Restart Nagios service
# service nagios restart

Steps to enable nrpe commands check_load and check_rhiso on remote host centos-1
Edit /usr/local/nagios/etc/nrpe.cfg, add these 2 lines:
command[check_load]=/usr/local/nagios/libexec/check_load -w 15,10,5 -c 30,25,20
command[check_rhiso]=/usr/local/nagios/libexec/check_disk -w 20% -c 10% /rhiso

restart nrpe daemon
# pkill nrpe
# /usr/local/nagios/bin/nrpe -c \
  /usr/local/nagios/etc/nrpe.cfg -d

Login to http://centos/nagios again, you will see now we have host centos-1 monitored, similarly we can setup the monitoring of centos-2

Setting up Nagios monitoring server

Nagios is an open-source monitoring software, you can monitor different host resources like host UP/down status, disk usage, system load, logfile.
Setting up Nagios on Linux is very straight forward.

In our test environment, we will setup host centos as our monitoring host.

Steps to install Nagios on centos:
1. Download nagios core, nagios plugins
http://www.nagios.org/download/core, choose DIY option.
http://www.nagios.org/download/plugins, download latest stable version.

2. Create nagios account:
# useradd nagios
# passwd nagios
# groupadd nagcmd
# usermod -a -G nagcmd nagios
# usermod -a -G nagcmd apache

3. Install Nagios
# tar -zxpf nagios-3.5.0.tar.gz
# cd nagios
# ./configure --with-command-group=nagcmd
# make all
# make install
# make install-init
# make install-config
# make install-commandmode
# make install-webconf

4. Create nagiosadmin account, this account will be used to login to nagios web interface
# htpasswd -c /usr/local/nagios/etc/htpasswd.users \
  nagiosadmin


5. Edit /etc/httpd/conf.d/nagios.conf, pointing AuthUserFile to /usr/local/nagios/etc/htpasswd.users

6. Start apache
# service httpd restart

7. Install Nagios Plugins
# tar -zxpf nagios-plugins-1.4.16.tar.gz
# cd nagios-plugins-1.4.16
# ./configure --with-nagios-user=nagios \
  --with-nagios-group=nagios

# make; make install

8. Enable autostart of Nagios and start Nagios service
# chkconfig --add nagios
# chkconfig nagios on

# service nagios start
9. Login to http://centos/nagios, you will see we already have centos monitored by Nagios


In the next post http://linuxscripter.blogspot.com/2013/09/using-nagios-and-nrpe-to-monitor-remote.html, I will discuss how to monitor remote hosts using Nagios and NRPE.

Reference: http://nagios.sourceforge.net/docs/3_0/quickstart-fedora.html

Wednesday, 26 June 2013

How to setup MySQL replication

MySQL replication is the technique to replicate one mysql database (master) to one or more databases (slave), using replication, we can:

1) Separate read and write
Write is only done on master server, all the read operations are done one the slave servers. We can also run expensive report on slave servers without affecting the performance of master and other slaves.

2) backup.
MySQL stores all the files used for database/tables in data/ directory, to backup mysql, we can backup the data/ directory. However backing up data/ while MySQL is running will give us an inconsistent copy of data/. To make consistent backup, we need to stop MySQL first, and this will cause downtime.
In MySQL replication environment, we can stop one slave and do the backup on slaves. Master and other slaves are still online during our backup.


To setup MySQL replication, each server in the replication needs to have a unique server-id. The valid range of server-id is 0 to 2^32-1, but replication only allows positive server-id.

I have 3 servers called: centos, centos-1, centos-2. All servers have MySQL freshly installed.

[root@centos ~]# getent hosts | grep centos
192.168.100.100 centos
192.168.100.101 centos-1
192.168.100.102 centos-2

Below are the steps I did to setup replication from centos to centos-1 and centos-2.
1. update /etc/my.cnf and restart MySQL
on centos:
[root@centos ~]# cat <<EOF > /etc/my.cnf
[mysqld]
server-id = 100
log-bin = master-bin.log 
EOF 
[root@centos ~]# service mysql restart

on centos-1:
[root@centos-1 ~]# cat <<EOF > /etc/my.cnf
[mysqld]
server-id = 101
relay-log-index = slave-relay-bin.index
relay-log = slave-relay-bin
replicate-wild-ignore-table=mysql.%
replicate-wild-ignore-table=information_schema.%
replicate-wild-ignore-table=performance_schema.%
EOF
[root@centos-1 ~]# service mysql restart

on centos-2:
[root@centos-2 ~]# cat <<EOF > /etc/my.cnf
[mysqld]
server-id = 102
relay-log-index = slave-relay-bin.index
relay-log = slave-relay-bin
replicate-wild-ignore-table=mysql.%
replicate-wild-ignore-table=information_schema.%
replicate-wild-ignore-table=performance_schema.%
EOF
[root@centos-2 ~]# service mysql restart

2. Get master bin log and position.
mysql> flush table with read lock;
Query OK, 0 rows affected (0.00 sec)

mysql> show master status;

+-------------------+----------+--------------+------------------+
| File              | Position | Binlog_Do_DB | Binlog_Ignore_DB |
+-------------------+----------+--------------+------------------+
| master-bin.000007 |      272 |              |                  |
+-------------------+----------+--------------+------------------+
1 row in set (0.00 sec)

mysql> unlock tables;
Query OK, 0 rows affected (0.00 sec)


3. On master, create account used for replication. Slave uses this account to connect to master. Multiple clients can share the same account, we can also create one account for each slave. This account need to have privilege "REPLICATION SLAVE".
mysql> CREATE USER 'repl'@'%' IDENTIFIED BY 'slavepass';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'repl'@'%';
mysql> FLUSH PRIVILEGES;

4. Setup slave using the information found in step 2 and 3.
On both centos-1 and centos-2
mysql> change master to
    -> master_host='192.168.100.100',
    -> master_port=3306,
    -> master_user='rep1',
    -> master_password='slavepass',
    -> master_log_file='master-bin.000007',
    -> master_log_pos=272;
mysql> start slave;
5. Confirm replication is working:
[root@centos-1 data]# tail -1 centos-1.err
130626  9:47:16 [Note] Slave I/O thread: connected to master 'rep1@192.168.100.100:3306',replication started in log 'master-bin.000007' at position 272
[root@centos-2 data]# tail -1 centos-2.err
130626  9:48:04 [Note] Slave I/O thread: connected to master 'rep1@192.168.100.100:3306',replication started in log 'master-bin.000007' at position 272

we can also insert some data on master, querying on any slave will show us the data we inserted on master.

We are done! MySQL replication is up and running.

reference:
http://dev.mysql.com/doc/refman/5.6/en/replication-howto.html
http://top-performance.blogspot.com/2012/03/how-to-setup-mysql-replication-in-11.html

Sunday, 26 May 2013

bash string manipulation

Bash supports many ways of string manipulation, it includes substring, substring replacement, substring removal, string length.

In how to rename multiple files in one command, one of the ways I shared to rename *.txt to *.sh is

for f in *.txt
do
        mv $f ${f/.txt/.sh}
done

How ${f/.txt/.sh} is doing the magic? Actually this is string replacement in bash.

${f/.txt/.sh} means replacing ".txt" in string $f with ".sh", so for f=abc.txt, ${f/.txt/.sh} will produce abc.sh

[linuxscripter@localhost ~]$ f=abc.txt
[linuxscripter@localhost ~]$ echo $f{f/.txt/.sh}
abc.sh

But the example I gave in rename multiple files in one command was not so correct.
For f=abc.txt.txt, ${f/.txt/.sh} will produce abc.sh.txt instead of abc.txt.sh

[linuxscripter@localhost ~]$ f=abc.txt.txt
[linuxscripter@localhost ~]$ echo $f{f/.txt/.sh}
abc.txt.sh

To correct this, we need to use ${f/%.txt/.sh}

[linuxscripter@localhost ~]$ f=abc.txt.txt
[linuxscripter@localhost ~]$ echo $f{f/%.txt/.sh}
abc.txt.sh

The syntax of string replacement includes:
1. ${var/Pattern/Replacement}   : replacing the first match of "Pattern" with "Replacement"
2. ${var//Pattern/Replacement}  : replacing every match of "Pattern" with "Replacement"
3. ${var/#Pattern/Replacement}  : if $var starts with "Pattern", replace "Pattern" with "Replacement"
4. ${var/%Pattern/Replacement}  : if $var ends with "Pattern", replace "Pattern" with "Replacement"

Some examples will make the syntax clearer.

[linuxscripter@localhost ~]$ f=abc.txt.abc.txt

[linuxscripter@localhost ~]$ ### testing syntax 1
[linuxscripter@localhost ~]$ echo ${f/txt/sh}
abc.sh.abc.txt

[linuxscripter@localhost ~]$ ### testing syntax 2
[linuxscripter@localhost ~]$ echo ${f//txt/sh}
abc.sh.abc.sh

[linuxscripter@localhost ~]$ ### testing syntax 3, [linuxscripter@localhost ~]$ echo ${f/#txt/sh}
abc.txt.abc.txt

[linuxscripter@localhost ~]$ ### testing syntax 3,

[linuxscripter@localhost ~]$ echo ${f/#abc/sh}
sh.txt.abc.txt

[linuxscripter@localhost ~]$ ### testing syntax 4,
[linuxscripter@localhost ~]$ echo ${f/%abc/sh}
abc.txt.abc.txt
[linuxscripter@localhost ~]$ ### testing syntax 4
[linuxscripter@localhost ~]$ echo ${f/%txt/sh}
abc.txt.abc.sh

At first these syntax may not appear very intuitive, I found this method help to understand and memorize them.
Variable is referred by add '$' infront of it
1. '/' is used for replacement, same as what we do in awk and sed
2. Once we know '/' is for replacement, '//' should be too difficult for us.
3. On keyboard, '#' is at the left of '$', so '#' is used to replace the left side of the variable.
4. On keyboard, '%' is at the right of '$', so '%' is used to replace the right side of the variable.

Besides string replacement, of course there are a lot more bash can do.
${#var} returns length of $var
[linuxscripter@localhost ~]$ echo {#f}
15

${var:pos:len} returns substring of $var, starting at position "pos", with length upto "len", if len is empty or len is too big, return substring starting at position "pos" until to the end of the string.
[linuxscripter@localhost ~]$ echo ${f:2:5} ${f:2:100} ${f:2}
c.txt c.txt.abc.txt c.txt.abc.txt

${var#substring} delete shortest match of "substring" from front of $var
[linuxscripter@localhost ~]$ echo ${f#abc}
.txt.abc.txt
[linuxscripter@localhost ~]$ echo ${f#*txt}
.abc.txt

${var##substring} delete longest match of "substring" from front of $var
[linuxscripter@localhost ~]$ echo ${f##*abc}
.txt

${var%substring} delete longest match of "substring" from back of $var
[linuxscripter@localhost ~]$ echo ${f%txt}
abc.txt.abc.
[linuxscripter@localhost ~]$ echo ${f%abc*}
abc.txt.

${var%%substring} delete longest match of "substring" from back of $var
[linuxscripter@localhost ~]$ echo ${f%%txt*}
abc.

http://tldp.org/LDP/abs/html/string-manipulation.html
http://bbs.chinaunix.net/forum.php?mod=viewthread&tid=218853&page=7

Friday, 17 May 2013

Use HTML pre tag to keep email display format

In my servers, I have some shell scripts to send system status reports to my email address.
It's very easy to implement, just run script to generage the report and send the report using sendmail, mail, or mailx.

e.g. if I need to email the output of w, I can simply put it in this way:
w | mail linuxscripter@myemail.com

but it doesn't have any subject, To fields.

Personally I prefer sendmail, so I added the email headers in this way.
(cat <<EOF
From: linuxscripter@myserver.com
To: linuxscripter@myemail.com
Subject: output of w

EOF
w)| sendmail -t

This looked pretty ok, and the email displayed nicely in my outlook express 5.

But in 2006, my company upgraded email client to outlook express 6, all the report format were gone, things should appear in the same column didn't.

How should I preserve the report format when displayed in outlook express 6?
I worked as a PHP web developer for 3 years, so the first thing came to my mind is html <pre> tag.
To use html tag in email, I need to set the email MIME type as text/html

This was how I did.
(cat <<EOF
From: linuxscripter@myserver.com
To: linuxscripter@myemail.com
Subject: output of w
Content-Type: text/html;

EOF
echo "<html><pre>"; w; echo "</pre></html>")| sendmail -t

My email reports became nicely formatted again!