Linux, Unix, Shell: AWR

Wednesday, 24 April 2013

Expired storage battery causes database performance slow

Early Monday morning, I was told our data warehouse loading process took much longer than before.
The application has been running fine for months, and there was no recent changes made on application, system, or database. The data volume is about the same as before.
AWR reports showed that the bottleneck was on system IO, looks like storage IO performance is much worse than before. So I asked the storage engineer the check storage battery status of StorageTek.

Yes, our batteres expired!

Vendor is delivering new batteries for us, replacing batteries should solve our issue.

A few years ago, when I first started as a DBA, I encountered similar issue on an OLTP database.
All of sudden, one database became very slow. I spent much more time, but still couldn't figure out what went wrong. Only after a few days, our storage engineer noticed the battery expiration in HDS.
After replacing new batteries, database performance became normal again.

Why expired batteries affect the system I/O?
Joshua Townsend has a very good article on storage basics.
In storage, there is some amount of cache RAM acting as a buffer to physical disks, I/O operations on cache is much faster than the disks.
In case of battery expiration, storage will disable the cache, all the I/O operations directly goes to the disks, which are much slower than cache RAM.

For details, please visit Joshua's blog:
http://vmtoday.com/2010/03/storage-basics-part-v-controllers-cache-and-coalescing/

Thursday, 6 December 2012

ORA-27300 and swap error

Recently I took over a HPUX running oracle 10g, when checking the alert log, I saw a few ORA-27300, ORA-27301, ORA-27301 errors, /var/log/messages also shows this:
Deferred swap reservation failure pid: 12345.
Apparently system doesn't have enough swap space for oracle.

$ swapinfo -am
           Mb    Mb      Mb   PCT START/      Mb
TYPE    AVAIL USED    FREE USED   LIMIT RESERVE PRI NAME
dev      6656   195    6461    3%       0       -    1 /dev/vg00/lvol2
reserve     - 3134   -3134
memory   4536 1109    3427   24%

swapinfo shows that system has 4.5G physical memory, swap space is 6.6G, a bit smaller but quite close to oracle recommanded 4.5X1.5G

further check found that system hit ORA-27300 daily between 18:00 and 20:00, there must be some job running during this period causing the excessive swap usage.

AWR report shows this sql suspicous:

SELECT table_a.column_1, table_a.column_2, table_a.column_3
    BULK COLLECT INTO
    col_1, col_2, col_3
FROM table_a, table_b, table_c
    WHERE table_a.created_date >= to_date(:1,'dd/mm/yyyy:hh24:mi')
    AND    table_a.created_date <= to_date(:2,'dd/mm/yyyy:hh24:mi');

two issues with this sql:
1. cartition join table_b table_c, yet no columns from table_b and table_c are selected columns
    the cartition join of table_b and table_c produces about 2 billion rows,
    so if there is 1 row in table_a match the WHERE condition, sql will try to load the same row 2 billion times in memory.

2. use BULK COLLECT INTO directly
    according to oracle, BULK COLLECT should always be used with LIMIT

It turns out that the sql is part of a procedure, which runs daily at 18:00.
Since the job has been failing for months, I just disabled it, informed development team to rectify it.

So far two weeks passed, both server and database are working fine, no more ORA-27300 or "Deferred swap reservation failure" appears.

reference:
http://www.oracle-base.com/articles/9i/bulk-binds-and-record-processing-9i.php
http://www.oracle.com/technetwork/issue-archive/2008/08-mar/o28plsql-095155.html