Linux, Unix, Shell

Friday, 1 April 2016

Zabbix: Received value is not suitable for value type [Numeric (unassigned)] and data type [Decimal]

Recently I was told one of our monitoring item is broken.
At first I thought maybe zabbix server was busy, but only one host is having issue.
My second thought: maybe that host was busy and zabbix agent couldn't return value. But all other items are working fine, only item "product_diff" is broken.
I tried zabbix_get manually, I could get value without problem.

$ zabbix_get -s centos-1.local.vb -k product_diff
-398

$ zabbix_get -s centos-1.local.vb -k product_diff
-205

And in the zabbix_server.log, I saw entries like this:
31168:20160322:095648.251 item "centos-1.local.vb:product_diff" became not supported: Received value [-299] is not suitable for value type [Numeric (unsigned)] and data type [Decimal]
So centos-1.local.vb returned negative values while zabbix server was expecting Numeric (unassigned).

It turns out that, we configured zabbix to monitor product_diff on centos-1.local.vb, initially we were only interested in the absolute difference. But recently production team felt the real difference makes more sense to them. So they replaced the script on centos-1.local.vb.
But inside zabbix server, the item configuration was not changed

In item configuration, product_diff is configured like this:

I updated "Type of Information" from "Numeric (unassigned)" to "Numeric (float)"

Monitoring should work now, but it doesn't, and I still see errors in zabbix_server.log
In zabbix, all the configurations are kept in database, so maybe the changes made on web UI didn't update database.
Logging in to zabbix database, check the item table:

MYSQL> select key_, value_type from items where key_ = 'product_diff';
+---------------------------+
| key_ | value_type |
+---------------------------+
| product_diff | 3 |
+--------------+------------+

In zabbix value_type 3 represents Numeric (unassigned), 0 represents Numeric (float). So we need to update this table, setting value_type to 0.

MYSQL> update items set value_type = 0 key_ = 'product_diff';

After that, monitoring started working again!

Thursday, 25 September 2014

Configure ILOM for Remote Server Management - SUN servers

ILOM stands for Integrated Lights Out Manager, all T-series servers come with ILOM.
Using ILOM we can remotely manage the server including power off and power on, just like we are in front of the server.

But to access ILOM remotely, we need to configure it first. Items needed for remote access include:

IP address
Netmask
Gateway

Additionally we can also give a friendly name for ILOM.

Let's assume our out-of-band access is in 192.168.200.0/255.255.255.0, the gateway for this network is 192.168.200.254
We are going to setup ILOM for a new server websvr01, with remote access IP 192.168.200.1.

Before ILOM is configured, we have to access it in the data center. Connect your laptop to the Serial Console. Login with default userid/password: root/changeme.
Once you are inside, you can setup ILOM.

To setup a friendly name:

cd /SP
set hostname=websvr01-ilom

To setup network:

cd /SP/network
set pendingipaddress=192.168.200.1
set pendingipnetmask=255.255.255.0
set pendingipgateway=192.168.200.254
set pendingipdiscovery=static
set commitpending=true

Instead of setting IP, netmask, gateway separately, we can also configure them in one command:

set pendingipaddress=192.168.200.1 pendingipnetmask=255.255.255.0 pendingipgateway=192.168.200.254

After that connecting Net Management port to proper port on switch, you will be able login remotely.

To access remotely:

ssh 192.168.200.1

After keying in user id and password, you will see the same interface as logging in through serial console.

Friday, 18 April 2014

Sync data using Veritas Volume Manager

I have been using rsync to do data migration, recently I need to do data migration again.

Since this time the data resides in Veritas volumes, I want to try the Veritas way.

My setup in the old server is like this:

# vxprint -g ftpdg
TY NAME         ASSOC        KSTATE   LENGTH   PLOFFS   STATE    TUTIL0 PUTIL0
dg ftpdg        ftpdg        -        -        -        -        -       -

dm ftpdg01      tagmastore-usp0_0 -   461350656 -       -        -       -
dm ftpdg02      tagmastore-usp0_1 -   31439616 -        -        -       -

v volb         gen          ENABLED 461314816 -       ACTIVE   -       -
pl volb-01      volb         ENABLED 461314816 -       ACTIVE   -       -
sd ftpdg01-01   volb-01      ENABLED 461314816 0       -        -       -

v vols         gen          ENABLED 31438848 -        ACTIVE   -       -
pl vols-01      vols         ENABLED 31438848 -        ACTIVE   -       -
sd ftpdg02-01   vols-01      ENABLED 31438848 0        -        -       -

I allocated two LUNs from storage, they are identified as tagmastore-usp0_2 and tagmastore-usp0_3.

To enable the new disks
# vxdctl enable

My privlen for old disks was 2048, so set the same value for new disks
# vxdisksetup -i tagmastore-usp0_2 privlen=2048
# vxdisksetup -i tagmastore-usp0_3 privlen=2048

Add the new disks to disk group:
# vxdg -g ftpdg adddisk ftpdg03=tagmastore-usp0_2
# vxdg -g ftpdg adddisk ftpdg04=tagmastore-usp0_3

Create sub disk and plex
# vxmake -g ftpdg sd ftpdg03-01 ftpdg03,0,461314816
# vxmake -g ftpdg plex volb-02

# vxmake -g ftpdg sd ftpdg04-01 ftpdg04,0,31438848
# vxmake -g ftpdg plex vols-02

Attach the plex to volumes
# vxplex -g ftpdg att volb volb-02
# vxplex -g ftpdg att vols vols-02

After attaching the plex, both old and new disks should have identical data, we can detach the new disks and add it to our new server.

Detach the new plex
# vxplex -g ftpdg det volb-02
# vxplex -g ftpdg det vols-02
Disassociate the new plex
# vxplex -g ftpdg dis volb-02
# vxplex -g ftpdg dis vols-02

Create new volume using the detached plex
# vxmake -g ftpdg vol newb plex=volb-02
# vxmake -g ftpdg vol news plex=vols-02

Split the diskgroup
# vxdg split ftpdg ftpdg2 newb news

At this step, we will have a diskgroup ftpdg2, inside it there are two volumes: newb and news.
We are planning to use the same volume names on new server, so rename the volumes in ftpdg2
# vxedit -g ftpdg2 rename newb volb
# vxedit -g ftpdg2 rename news vols
We can deport ftpdg2 from our old server:
# vxdg deport ftpdg2

On new server we can import ftpdg2 as ftpdg
# vxdg import -n ftpdg ftpdg2

We also need to start the volumes
# vxvol -g ftpdg start volb
# vxvol -g ftpdg start vols

The device path for the volumes are:
/dev/vx/dsk/ftpdg/volb
/dev/vx/dsk/ftpdg/vols

we can mount the volumes and start using it!

Tuesday, 21 January 2014

407 Proxy Authentication Required

Recently I installed an Ubuntu box, following my windows configuration, I set my proxy as http://proxyhost:8080
But when I tried to wget something, I got this error:
"407 Proxy Authentication Required"

It turns out the proxy server requires my windows AD username/password. After googling around I found the software Cntlm (http://cntlm.sourceforge.net)
The installation and setup steps are quite straightforward:

Download cntlm_0.92.3_amd64.deb from http://cntlm.sourceforge.net
Installing cntlm
# dpkg -i cntlm_0.92.3_amd64.deb
Configure /etc/cntlm.conf
Username        linuxscripter
Domain          windows-domain
Proxy           proxyhost:8080
NoProxy         localhost, 127.0.0.*, 10.*, 192.168.*
Listen          3128
Generate encrypted password:
# cntlm -H -M http://proxyhost:8080 -c /etc/cntlm.conf
You will be prompted for Password, key in your windows AD password, and copy the command output and paste in /etc/cntlm.conf
Restart cntlm service:
# /etc/init.d/cntlm restart
Set proxy to http://127.0.0.1:3128

My Ubuntu is ready to connect to outside world!

Friday, 13 December 2013

Migrate sun4u to sun4v using FLAR

FLAR stands for FLash ARchive. If we need to install a few Solaris servers and customize them in a similar way.
Instead of installing and configuring one by one, we can install and configure one server first, then create FLAR on this server, and install the other servers using this FLAR.
FLAR can also be used for server migration.

The steps of creating FLAR and installing from FLAR is very straight forward. However if we are migrating from servers with older CPU to servers with different CPU architecture, we have to do some extras steps.
Recently I migrated a few servers from sun4u to sun4v using FLAR, Here are my migration steps:
create FLAR image on sun4u server

By default FLAR created on sun4u cannot be used on sun4v servers, we have to add sun4v as a supported architecture for FLAR.
# echo "PLATFORM_GROUP=sun4v" >> /var/sadm/system/admin/.platform
# flarcreate -n "migration flar" -c -S -x /flar -x /export/home /flar/migration.flar
Alternatively we can add -U flag to create FLAR with sun4v support
# flarcreate -n "migration flar" -U "content_architectures=sun4u,sun4v" -c -S -x /flar -x /export/home /flar/migration.flar
Verify our FLAR can be used on sun4v machines:
# flar -i /flar/migration.flar | grep content_architectures
Now move the FLAR to a storage accessible from sun4v machines, this can either be NFS, HTTP, or a local hard drive.
Boot sun4v machine, choose Flash Install, and select the migration.flar we created in step 1.
After installation completes, reboot server, we will get this error:
cannot open boot_archive
To boot the server properly, we need to upgrade the sun4v machine:
boot sun4v machine from DVD or Jumpstart, select Upgrade, after upgrade finishes, reboot server.

We have successfully migrated sun4u to sun4v machine!

Reference:
http://docs.oracle.com/cd/E19253-01/821-0436/samekernel/index.html

Tuesday, 26 November 2013

Using tmpfs to improve Nagios performance

Nagios is an excellent monitoring tool. We can monitor servers, network devices using Nagios.
Besides many of the useful plugins at nagios exchange (http://exchange.nagios.org) , we can also write our own plugins using shell scripts.

We can set up Nagios monitoring server by following Setting up Nagios monitoring server, the default setting and configuration is sufficient if we are only monitoring a few servers. However as the number of monitored hosts and services increases, we will notice the check latencies.
This is because Nagios needs continuously updating some files on disk, when there are more items to monitor, there are also more disk I/O required, eventually I/O will become the bottle neck, it's slowing down the Nagios check.

To solve this problem, we need to improve IO performance or reduce IO requests, we can install Nagios on SSD disk, but it's not cost effective.

In an earlier post using tmpfs to improve PostgreSQL performance, to boost the performance of PostgreSQL, we pointed stats_temp_directory to tmpfs.
Similarly, if some files are only needed when Nagios is running, we can move them to tmpfs, thus reduce IO requests.
In Nagios there are a few key files that affect disk I/O, they are:
1. /usr/local/nagios/var/status.data, this status file stores the current status of all monitored services and hosts, it's being consistently updated as defined by status_update_interval, in my default nagios installation, status_file is updated every 10 seconds.
The contents of the status file are deleted every time Nagios restarts, so it's only useful when nagios is running.

[root@centos /usr/local/nagios/etc]# grep '^status' nagios.cfg
status_file=/usr/local/nagios/var/status.dat
status_update_interval=10

2. /usr/local/nagios/var/objects.cache, this file is a cached copy of object definitions, and CGIs read this file the get the object definitions.
the file is recreated every time Nagios starts, So objects.cache doesn't need to be on non-volatile storage.

[root@centos /usr/local/nagios/etc]# grep objects.cache nagios.cfg
object_cache_file=/usr/local/nagios/var/objects.cache

3. /usr/local/nagios/var/spool/checkresults, all the incoming check results are stored here, while Nagios is running, we will notice that files are being created and deleted constantly, so checkresults can also be moved to tmpfs

[root@centos /usr/local/nagios/etc]# grep checkresults nagios.cfg
check_result_path=/usr/local/nagios/var/spool/checkresults
[root@centos /usr/local/nagios/etc]#

[root@centos /usr/local/nagios/var/spool/checkresults]# ls
checkP2D5bM cn6i6Ld cn6i6Ld.ok
[root@centos /usr/local/nagios/var/spool/checkresults]# head -4 cn6i6Ld
### Active Check Result File ###
file_time=1385437541

### Nagios Service Check Result ###
[root@centos /usr/local/nagios/var/spool/checkresults]#

So we can move status.data, objects.cache and checkresults to tmpfs, but before that we need to mount the file system first

[root@centos ~]# mkdir -p /mnt/nagvar
[root@centos ~]# mount -t tmpfs tmpfs /mnt/nagvar -o size=50m
[root@centos ~]# df -h /mnt/nagvar
Filesystem Size Used Avail Use% Mounted on
tmpfs 50M 0 50M 0% /mnt/nagvar
[root@centos ~]# mount | grep nagvar
tmpfs on /mnt/nagvar type tmpfs (rw,size=50m)

create directory for checkresults

[root@centos ~]# mkdir -p /mnt/nagvar/spool/checkresults
[root@centos ~]# chown -R nagios:nagios /mnt/nagvar

modify nagios.cfg

status_file=/mnt/nagvar/status.dat
object_cache_file=/mnt/nagvar/objects.cache
check_result_path=/mnt/nagvar/spool/checkresults

restart nagios so our changes will take effect

[root@centos ~]# service nagios restart
Running configuration check...done.
Stopping nagios: done.
Starting nagios: done.

we can see, nagios is using /mnt/nagvar

[root@centos ~]# tree /mnt/nagvar/
/mnt/nagvar/
├── objects.cache
├── spool
│   └── checkresults
│       ├── ca8JfZI
│       └── ca8JfZI.ok
└── status.dat

2 directories, 4 files

We can configure /etc/fstab to mount /mnt/nagvar everytime system reboots.

[root@centos ~]# echo <<EOF >> /etc/fstab
tmpfs /mnt/nagvar tmpfs defaults,size=50m 0 0
EOF

But the directory /mnt/nagvar/spool/checkresults will be gone after /mnt/nagvar is re-mounted, so we need to create this directory before starting up Nagios.
we can update /etc/init.d/nagios, add this lines after the first line:

mkdir -p /mnt/nagvar/spool/checkresults
chown -R nagios:nagios /mnt/nagvar

[root@centos ~]# sed -i '1a\
mkdir -p /mnt/nagvar/spool/checkresults\
chown -R nagios:nagios /mnt/nagvar' /etc/init.d/nagios

Since we have moved the files to tmpfs, there is no disk I/O on these files, we can see great performance improvement of Nagios.

Reference:
http://assets.nagios.com/downloads/nagiosxi/docs/Utilizing_A_RAM_Disk_In_NagiosXI.pdf

Thursday, 21 November 2013

setup nginx web server with PHP

Nginx (engine x) is a high performance lightweight HTTP server, more and more sites are using nginx, according to Netcraft survey (http://news.netcraft.com/archives/2013/11/01/november-2013-web-server-survey.html), nginx powers 15% of the busies sites in November 2013.

Nginx installation is very straight forward, we can download latest source code from http://nginx.org/en/download.html or point our yum source to http://nginx.org/packages/OS/OSRELEASE/$basearch/ and install using yum.
Replace “OS” with “rhel” or “centos”, depending on the distribution used, and “OSRELEASE” with “5” or “6”, for 5.x or 6.x versions, respectively.
So for CentOS 6.3, we can point our YUM source to: http://nginx.org/packages/centos/6/$basearch/