Early Monday morning, I was told our data warehouse loading process took much longer than before.
The application has been running fine for months, and there was no recent changes made on application, system, or database. The data volume is about the same as before.
AWR reports showed that the bottleneck was on system IO, looks like storage IO performance is much worse than before. So I asked the storage engineer the check storage battery status of StorageTek.
Yes, our batteres expired!
Vendor is delivering new batteries for us, replacing batteries should solve our issue.
A few years ago, when I first started as a DBA, I encountered similar issue on an OLTP database.
All of sudden, one database became very slow. I spent much more time, but still couldn't figure out what went wrong. Only after a few days, our storage engineer noticed the battery expiration in HDS.
After replacing new batteries, database performance became normal again.
Why expired batteries affect the system I/O?
Joshua Townsend has a very good article on storage basics.
In storage, there is some amount of cache RAM acting as a buffer to physical disks, I/O operations on cache is much faster than the disks.
In case of battery expiration, storage will disable the cache, all the I/O operations directly goes to the disks, which are much slower than cache RAM.
For details, please visit Joshua's blog:
http://vmtoday.com/2010/03/storage-basics-part-v-controllers-cache-and-coalescing/
Wednesday, 24 April 2013
Friday, 12 April 2013
How to check NIC speed in Unix
Fast Ethernet connection is essential for some operations to work properly.
backup, oracle RAC all require fast Ethernet connection.
Different OS provides different commands to check the network speed currently running.
In Linux, we can run mii-tool or ethtool to check the network speed. Both commands require root access
In Solaris, we can run kstat or dladm to check the network speed
if we have root access, we can run dladm or ndd to check the network speed.
In AIX, we can run entstat to check network speed.
if we want to check the speed of en0
We can see that en0 supports upto 1000 Mbps, but it's running 100 Mbps Full Duplex.
In HP-UX, we can run lanadmin or nwmgr to check the network speed.
lanadmin output is shorter
we can also use nwmgr to get detailed information
backup, oracle RAC all require fast Ethernet connection.
Different OS provides different commands to check the network speed currently running.
In Linux, we can run mii-tool or ethtool to check the network speed. Both commands require root access
[root@localhost eth0]# mii-tool eth0
eth0: no autonegotiation, 100baseTx-FD, link ok
[root@localhost eth0]# ethtool eth0
Settings for eth0:
Supported ports: [ TP ]
Supported link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Supports auto-negotiation: Yes
Advertised link modes: 10baseT/Half 10baseT/Full
100baseT/Half 100baseT/Full
1000baseT/Full
Advertised pause frame use: No
Advertised auto-negotiation: Yes
Speed: 1000Mb/s
Duplex: Full
Port: Twisted Pair
PHYAD: 0
Transceiver: internal
Auto-negotiation: on
MDI-X: Unknown
Supports Wake-on: umbg
Wake-on: d
Current message level: 0x00000007 (7)
Link detected: yes
In Solaris, we can run kstat or dladm to check the network speed
$ kstat -m igb -i 0 | egrep 'link_autoneg|link_speed|link_duplex'The output shows that, igb0 is running autonegotiation, 100 Mbps full-duplex
link_autoneg 1
link_duplex 2
link_speed 100
if we have root access, we can run dladm or ndd to check the network speed.
In AIX, we can run entstat to check network speed.
if we want to check the speed of en0
$ entstat -d en0 | egrep '^Device|^Media'
Device Type: 2-Port 10/100/1000 Base-TX PCI-X Adapter (14108902)
Media Speed Selected: Auto negotiation
Media Speed Running: 100 Mbps Full Duplex
We can see that en0 supports upto 1000 Mbps, but it's running 100 Mbps Full Duplex.
In HP-UX, we can run lanadmin or nwmgr to check the network speed.
lanadmin output is shorter
$ /usr/sbin/lanadmin -x 0
Speed = 1000 Full-Duplex.
Autonegotiation = On.
we can also use nwmgr to get detailed information
$ /usr/sbin/nwmgr --get --stats -C lan -I 0
*** lan0 64 bit MIB statistics:
Interface Name = lan0
PPA Number = 0
Description = lan0 HP PCI 1000Base-T Release B.11.31.0809.01
Interface Type = 1000Base-T
MTU Size = 1500
Speed = 1 Gbps
Station Address = 0x0A527E583C90
Administration Status = UP
Operation Status = UP
Last Change = Wed Mar 20 05:58:50 2013
Inbound Octets = 166627219743
Inbound Unicast Packets = 975967141
Inbound Multicast Packets = 1124454
Inbound Broadcast Packets = 6566058
Inbound Discards = 1821280
Inbound Errors = 0
Inbound Unknown Protocols = 1124769
Outbound Octets = 185484791114
Outbound Unicast Packets = 968101273
Outbound Multicast Packets = 0
Outbound Broadcast Packets = 1618
Outbound Discards = 0
Outbound Errors = 0
Counter Discontinuity Time = Wed Mar 20 05:58:50 2013
Physical Promiscuous Mode = FALSE
Physical Connector Present = TRUE
Interface Alias =
Link Up/Down Trap Enable = Enabled
Monday, 18 March 2013
Redhat Linux: how to add/remove hard disk without rebooting server
A few years ago, I managed a few hundred of Redhat Linux servers, most of them are DELL PowerEdge 1750, 1850, and 1950.
Sometimes I needed to move the harddisk between servers. Removing or adding hardisks normally requires system reboot.
But I used this way to remove or add disk without rebooting:
Recently I was asked to add a new harddisk without reboot system. It's a Redhat in VMware, as I was not managing VMware, so not clear about the hostadaptor, SCSI channel, ID, LUN, manipulating /proc/scsi/scsi may not work.
After searching around I found that I can let system to rescan the SCSI bus.
After this, I could see the new hard disk was spinned up in /var/log/message.
Besides 'scsi remove-single-device 0 0 1 0', we can also do it in this way to remove a hard disk.
Reference:
http://www.tldp.org/HOWTO/archived/SCSI-Programming-HOWTO/SCSI-Programming-HOWTO-4.html
http://www.cyberciti.biz/tips/vmware-add-a-new-hard-disk-without-rebooting-guest.html
Sometimes I needed to move the harddisk between servers. Removing or adding hardisks normally requires system reboot.
But I used this way to remove or add disk without rebooting:
echo 'scsi remove-single-device 0 0 1 0' > /proc/scsi/scsi
echo 'scsi add-single-device 0 0 1 0' > /proc/scsi/scsi
Recently I was asked to add a new harddisk without reboot system. It's a Redhat in VMware, as I was not managing VMware, so not clear about the hostadaptor, SCSI channel, ID, LUN, manipulating /proc/scsi/scsi may not work.
After searching around I found that I can let system to rescan the SCSI bus.
for scsi_host in /sys/class/scsi_host/host*
do
echo '- - -' > $scsi_host/scan
done
After this, I could see the new hard disk was spinned up in /var/log/message.
Besides 'scsi remove-single-device 0 0 1 0', we can also do it in this way to remove a hard disk.
echo 1 > /sys/block/sdb/device/delete
Reference:
http://www.tldp.org/HOWTO/archived/SCSI-Programming-HOWTO/SCSI-Programming-HOWTO-4.html
http://www.cyberciti.biz/tips/vmware-add-a-new-hard-disk-without-rebooting-guest.html
Friday, 1 March 2013
BIND DNS: reverse delegation of IP range
Besides mapping domain name to IP address, DNS systems can also map IP address to domain name. Many applications rely on DNS reverse mapping to function properly.
NetBackup will throw error if reverse mapping is not setup or is setup incorrectly, email system also needs correct reverse mapping.
Suppose we have the whole range of IP addresses in 222.222.222.0/24, we can configure reverse mapping in named.conf:
If we only have part of the IP addresses in 222.222.222.0/24, suppose 222.222.222.64 to 222.222.222.91, the reverse mapping is called classless reverse delegation.
From 222.222.222.64 to 222.222.222.95 there are 30 usable addresses, plus the network and broadcast address, there are 32 addresses, 32 = 2^5, 8 x 4 - 5 = 27. so our IP range can be represented as 222.222.222.64/27
ISP should have defined reverse delegation in their reverse zone file:
Now we can define the reverse mapping for our IP range in our own named.conf:
NetBackup will throw error if reverse mapping is not setup or is setup incorrectly, email system also needs correct reverse mapping.
Suppose we have the whole range of IP addresses in 222.222.222.0/24, we can configure reverse mapping in named.conf:
zone "222.222.222.in-addr.arpa" {type master;file "222.222.222.rev";};
If we only have part of the IP addresses in 222.222.222.0/24, suppose 222.222.222.64 to 222.222.222.91, the reverse mapping is called classless reverse delegation.
From 222.222.222.64 to 222.222.222.95 there are 30 usable addresses, plus the network and broadcast address, there are 32 addresses, 32 = 2^5, 8 x 4 - 5 = 27. so our IP range can be represented as 222.222.222.64/27
ISP should have defined reverse delegation in their reverse zone file:
64/27 IN NS ns.sg.linuxscripter.blogspot.com
Now we can define the reverse mapping for our IP range in our own named.conf:
Note: The domain name and IP addresses in this post are dummy ones, I use them for easier writing.zone "64/27.222.222.222.in-addr.arpa" {type master;file "64-95.222.222.222.rev";};
Wednesday, 27 February 2013
BIND DNS: subdomain delegation
Suppose we have the domain name linuxscripter.blogspot.com, and we have Singapore, Hong Kong, and Shanghai offices. If we want the subsidiaries to manage their own domains, we can do this using domain delegation. On headquarter DNS server, in linuxscripter.blogspot.com zone file, we can define the subdomain delegation for different subsidiaries.
In ns.sg.linuxscripter.blogspot.com, we can define zone file for subdomain sg.linuxscripter.blogspot.com as normal domains
We can setup the web server to use www now, but to use the email server, we need to define the reverse resolution for mail.sg.linuxscripter.blogspot.com properly.
Depends on the IP address range we get from ISP, the syntax for reverse resolution may involve classless delegation, you can find details on how to do reverse delegation in http://linuxscripter.blogspot.sg/2013/03/bind-dns-reverse-delegation-of-ip-range.html
Note: The domain name and IP addresses in this post are dummy ones, I use them for easier writing.
$ORIGIN sg.linuxscripter.blogspot.com.
@ IN NS ns.sg.linuxscripter.blogspot.com.
ns IN A 221.221.221.221
$ORIGIN hk.linuxscripter.blogspot.com.
@ IN NS ns.hk.linuxscripter.blogspot.com.
ns IN A 222.222.222.222
$ORIGIN sh.linuxscripter.blogspot.com.
@ IN NS ns.sh.linuxscripter.blogspot.com.
ns IN A 223.223.223.223
In ns.sg.linuxscripter.blogspot.com, we can define zone file for subdomain sg.linuxscripter.blogspot.com as normal domains
$TTL 7d$ORIGIN sg.linuxscripter.blogspot.com.@ IN SOA ns.sg.linuxscripter.blogspot.com. hostmaster.abc.domain. (
2013022701 ; serial number
2h ; refresh = 2 hours
15M ; update retry = 15 mins
3W12h ; expiry = 3 weeks + 12 hours
2h20M ; minimum = 2 hours + 20 mins
)IN NS ns.sg.linuxscripter.blogspot.com.
IN MX 10 mail.sg.linuxscripter.blogspot.com.www IN A 221.221.221.65
mail IN A 221.221.221.66
We can setup the web server to use www now, but to use the email server, we need to define the reverse resolution for mail.sg.linuxscripter.blogspot.com properly.
Depends on the IP address range we get from ISP, the syntax for reverse resolution may involve classless delegation, you can find details on how to do reverse delegation in http://linuxscripter.blogspot.sg/2013/03/bind-dns-reverse-delegation-of-ip-range.html
Note: The domain name and IP addresses in this post are dummy ones, I use them for easier writing.
Thursday, 6 December 2012
ORA-27300 and swap error
Recently I took over a HPUX running oracle 10g, when checking the alert log, I saw a few ORA-27300, ORA-27301, ORA-27301 errors, /var/log/messages also shows this:
Deferred swap reservation failure pid: 12345.
Apparently system doesn't have enough swap space for oracle.
$ swapinfo -am
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 6656 195 6461 3% 0 - 1 /dev/vg00/lvol2
reserve - 3134 -3134
memory 4536 1109 3427 24%
swapinfo shows that system has 4.5G physical memory, swap space is 6.6G, a bit smaller but quite close to oracle recommanded 4.5X1.5G
further check found that system hit ORA-27300 daily between 18:00 and 20:00, there must be some job running during this period causing the excessive swap usage.
AWR report shows this sql suspicous:
SELECT table_a.column_1, table_a.column_2, table_a.column_3
BULK COLLECT INTO
col_1, col_2, col_3
FROM table_a, table_b, table_c
WHERE table_a.created_date >= to_date(:1,'dd/mm/yyyy:hh24:mi')
AND table_a.created_date <= to_date(:2,'dd/mm/yyyy:hh24:mi');
two issues with this sql:
1. cartition join table_b table_c, yet no columns from table_b and table_c are selected columns
the cartition join of table_b and table_c produces about 2 billion rows,
so if there is 1 row in table_a match the WHERE condition, sql will try to load the same row 2 billion times in memory.
2. use BULK COLLECT INTO directly
according to oracle, BULK COLLECT should always be used with LIMIT
It turns out that the sql is part of a procedure, which runs daily at 18:00.
Since the job has been failing for months, I just disabled it, informed development team to rectify it.
So far two weeks passed, both server and database are working fine, no more ORA-27300 or "Deferred swap reservation failure" appears.
reference:
http://www.oracle-base.com/articles/9i/bulk-binds-and-record-processing-9i.php
http://www.oracle.com/technetwork/issue-archive/2008/08-mar/o28plsql-095155.html
Deferred swap reservation failure pid: 12345.
Apparently system doesn't have enough swap space for oracle.
$ swapinfo -am
Mb Mb Mb PCT START/ Mb
TYPE AVAIL USED FREE USED LIMIT RESERVE PRI NAME
dev 6656 195 6461 3% 0 - 1 /dev/vg00/lvol2
reserve - 3134 -3134
memory 4536 1109 3427 24%
swapinfo shows that system has 4.5G physical memory, swap space is 6.6G, a bit smaller but quite close to oracle recommanded 4.5X1.5G
further check found that system hit ORA-27300 daily between 18:00 and 20:00, there must be some job running during this period causing the excessive swap usage.
AWR report shows this sql suspicous:
SELECT table_a.column_1, table_a.column_2, table_a.column_3
BULK COLLECT INTO
col_1, col_2, col_3
FROM table_a, table_b, table_c
WHERE table_a.created_date >= to_date(:1,'dd/mm/yyyy:hh24:mi')
AND table_a.created_date <= to_date(:2,'dd/mm/yyyy:hh24:mi');
two issues with this sql:
1. cartition join table_b table_c, yet no columns from table_b and table_c are selected columns
the cartition join of table_b and table_c produces about 2 billion rows,
so if there is 1 row in table_a match the WHERE condition, sql will try to load the same row 2 billion times in memory.
2. use BULK COLLECT INTO directly
according to oracle, BULK COLLECT should always be used with LIMIT
It turns out that the sql is part of a procedure, which runs daily at 18:00.
Since the job has been failing for months, I just disabled it, informed development team to rectify it.
So far two weeks passed, both server and database are working fine, no more ORA-27300 or "Deferred swap reservation failure" appears.
reference:
http://www.oracle-base.com/articles/9i/bulk-binds-and-record-processing-9i.php
http://www.oracle.com/technetwork/issue-archive/2008/08-mar/o28plsql-095155.html
Wednesday, 7 November 2012
How to get the process listening on certain port
At times we are asked: "what program is listening on port XX?"
In Redhat, we can easily get this using lsof. let's say if we want to know which program is listening on port 80.
[root@localhost ~]# lsof -i :80
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
httpd 2047 root 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2049 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2050 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2051 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2052 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2053 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2054 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2055 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2056 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
From the output, we know that httpd is listening on port 80.
Another way to get the process listening on certain port is using netstat.
[root@localhost ~]# netstat -anp | grep 80
tcp 0 0 :::80 :::* LISTEN 2009/httpd
[root@localhost ~]# ps -ef | awk '$2 == 2009'
root 2009 1 0 23:21 ? 00:00:00 /usr/sbin/httpd
[root@localhost ~]#
In AIX, most of the time it has no lsof installed, and the netstat is also different from Redhat, but we still can get our question answered.
$ netstat -Aan | grep '.22 ' | grep LISTEN
f1000e0001382bb8 tcp4 0 0 *.22 *.* LISTEN
$ rmsock f1000e0001382bb8 tcpcb
The socket 0xf1000e0001382808 is being held by proccess 3670142 (sshd).
$ ps -ef | grep 3670142
root 3670142 3014676 0 Sep 06 - 0:00 /usr/sbin/sshd
Please note rmsock will not remove the socket, you can confirm this by checking the content of /var /adm /ras /rms ock. log.
reference: http://unix.ittoolbox.com/groups/technical-functional/ibm-aix-l/determine-which-process-is-listening-on-a-port-without-using-lsof-on-aix-1468555
In Redhat, we can easily get this using lsof. let's say if we want to know which program is listening on port 80.
[root@localhost ~]# lsof -i :80
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
httpd 2047 root 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2049 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2050 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2051 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2052 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2053 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2054 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2055 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
httpd 2056 apache 4u IPv6 12636 0t0 TCP *:http (LISTEN)
From the output, we know that httpd is listening on port 80.
Another way to get the process listening on certain port is using netstat.
[root@localhost ~]# netstat -anp | grep 80
tcp 0 0 :::80 :::* LISTEN 2009/httpd
[root@localhost ~]# ps -ef | awk '$2 == 2009'
root 2009 1 0 23:21 ? 00:00:00 /usr/sbin/httpd
[root@localhost ~]#
In AIX, most of the time it has no lsof installed, and the netstat is also different from Redhat, but we still can get our question answered.
$ netstat -Aan | grep '.22 ' | grep LISTEN
f1000e0001382bb8 tcp4 0 0 *.22 *.* LISTEN
$ rmsock f1000e0001382bb8 tcpcb
The socket 0xf1000e0001382808 is being held by proccess 3670142 (sshd).
$ ps -ef | grep 3670142
root 3670142 3014676 0 Sep 06 - 0:00 /usr/sbin/sshd
Please note rmsock will not remove the socket, you can confirm this by checking the content of /var
reference: http://unix.ittoolbox.com/groups/technical-functional/ibm-aix-l/determine-which-process-is-listening-on-a-port-without-using-lsof-on-aix-1468555
Subscribe to:
Posts (Atom)
loading..