Thursday, 26 April 2012

Hello Python

In 2006 and 2007, I wrote a lot of Perl scripts. Having stopped using Perl since 2008, recently I find that it's very hard for me to understand those scripts I wrote 5 years ago.
So I decide to pick up a different language that is more intuitive. After googling around, I decided to give Python a try. I mainly write scripts to manipulate files and strings, So file IO and regular expression are the first two things I checked out.

First let's create a file called abc:
$ echo "Hello perl" > abc

Using sed, we can easily replace 'perl' with 'python'
$ sed 's/perl/python/'  abc

To do it in Python, we can use this script

import os
import re
import string

fr = open('abc', 'r')
mystr =

str_by_sub = re.sub('perl', 'python', mystr)
str_by_replace = string.replace(mystr, 'perl', 'python')

fw = open('def','w')

After executing the script, we will have a file called def, the content would be
Hello perl
Hello python
Hello python
There are other parameters with string.replace and re.sub, which control the behaviors of the replacement, I still need further reading to understand them better.

It was very intuitive to write my first Python script, I think I will like Python :)

Thursday, 12 April 2012

Understanding Linux Hugepages

Hugepages can be extremely useful for systems having bigger RAM.
$ grep Huge /proc/meminfo
HugePages_Total:         0
HugePages_Free:          0
HugePages_Rsvd:          0
HugePages_Surp:          0
Hugepagesize:         2048 kB

Hugepagesize is the size of one Hugepage, on most x86 or x86_64 machines,  it's set to 2048 kB.
HugePages_Total is the number of Hugepages configured on the machine. In Redhat Linux, we can change this value by updating /etc/sysctl.conf
echo "vm.nr_hugepages = 200" >> /etc/sysctl.conf
 After updating, /sbin/sysctl -p may make the setting take effect, but it's better to restart the machine.

To confirm Hugepages is configured after rebooting,
$ grep Huge /proc/meminfo
HugePages_Total:       200
HugePages_Free:        200
HugePages_Rsvd:          0
HugePages_Surp:          0
Hugepagesize:         2048 kB

By setting Hugepages to 200, we are telling linux to reserve 200 Hugepages. So the memory associated with Hugepages is: HugePages_Total x Huagepagesize = 200 x 2048kB = 400M.
The memory associated with Hugepages is always allocated, and it cannot be swapped out.

Before setting Hugepages,
$ grep Mem /proc/meminfo
MemTotal:                1030940 kB
MemFree:                  861852 kB
After setting Hugepages, check memory usage again:
$ grep Mem /proc/meminfo
MemTotal:                1030940 kB
MemFree:                  452252 kB
We can see that the free memory dropped from 841M to 441M, after setting HugePages_Total to 200. The 400M memory is the associated to Hugepages, and it will always be allocated.

We can use Hugepages to lock some memory for oracle SGA, so the SGA will never be paged out, thus improving the database performance.

Tuesday, 10 April 2012

Use md5sum to compare files in Linux

To compare two files in Linux, the first utility we can think of is diff.

Suppose we have two files /root/abc.txt and /root/cba.txt

To compare them using diff
diff /root/abc.txt /root/cba.txt
Besides diff, we can use md5sum to compare the checksum of the two files
md5sum /root/abc.txt
md5sum /root/cba.txt
Then compare the output, a smarter way to compare the checksum.
md5sum /root/abc.txt | awk '{print $1,"/root/cba.txt"}' > \ /tmp/cksum.txt
md5sum -c /tmp/cksum.txt
md5sum can be extremely useful when both abc.txt and cba.txt are too huge to use diff.

md5sum can also be used to validate files transferred to remote site.

In one of my jobs, I need to transfer big files from server to server, in the process of copying, one server may crash, we may lose connectivity between servers. Many things can cause the copying incomplete, and we end up having corrupted files on some of the servers. To avoid this situation.
we can generate the checksum of all the files to be copied on the source server.
md5sum file1 file2 file3 > cksum.txt
copy cksum.txt together with the files to destination server.

At destination server, verify all the files are being copied over and have the identical checksum value as files on source servers.
md5sum -c cksum.txt

Websites like also provide checksums together with their software, we can use these checksums to validate we have downloaded the correct file completely. shows the md5 chcksum for httpd-2.4.1.tar.gz is:
4366afbea8149ca125af01fd59a2f8a2 *httpd-2.4.1.tar.gz

Sunday, 8 April 2012

Oracle: How to roll forward standby database using archivelogs

At times oracle standby database may get out of sync with primary database. After fixing the issue causing standby out of sync, we need to roll forward standby database so that it can sync up with primary again.
To roll forward standby database, we can use archivelogs.

1. If the archivelogs are deleted from primary already.
We can restore the archivelogs from backups. Suppose our catdb stores RMAN catalog.
we can use this code to restore the archivelogs.
connect target /
connect catalog catuser/catpass@catdb

allocate channel c1 type 'SBT_TAPE' 
parms 'ENV=(NB_ORA_CLIENT=prod1)' maxopenfiles 10;

allocate channel c2 type 'SBT_TAPE'
parms 'ENV=(NB_ORA_CLIENT=prod2)' maxopenfiles 10;
restore archivelog from time 'sysdate - 1' until time 'sysdate';
release channel c1;
release channel c2;
Depending on how long the standby has been out of sync, we can change the 'from time' accordingly. 

2. If the archivelogs are still on primary database.
We can simply copy them to the standby database,
after copying archivelogs to the standby, we need to catalog the files copied over.
RMAN> catalog start with '/tmp/logs_from_primary';
Otherwise oracle cannot recognize these files.
Once we have the archivelogs, either through method 1 or method 2,
we can roll forward the standby database.
SQL> recover managed standby database disconnect;

Saturday, 7 April 2012

Oracle: How to setup physical standby database

In production environment, It's often required to have a physical standby database. In case the primary environment is gone, we can continue serve customer requests using standby database.

There are a few ways to set up oracle standby database, In any ways, please make sure you have the following ready:
  • remote login is enabled, both primary and standby SYS password are the same, this can be set using orapwd.
  • log shipping is enabled on primary, define the proper TNS entry, set log_archive_dest_x, and set log_archive_dest_state_x to enable.
  • The pfile or spfile is ready on standby database.

1. Using storage level replication
On primary database, put the database in backup mode. This is to make sure the copy replicated to standby environment is consistent.
SQL> alter database begin backup;
In storage, split the replication. Stop backup mode on primary database:
SQL> alter database end backup;
Create standby controlfiile:
RMAN> backup current controlfile for standby format '/tmp/stdby_ctrol_%U';
Copy the controlfile backup to standby servers, restore the controlfile.
SQL> startup nomoun;
RMAN> catalog start with '/tmp/stdby_ctrol';
RMAN> restore controlfile from '/tmp/stdby_ctrol_blahblah';

Mount the standby database, and enable redo apply.
SQL> alter database mount standby database;
SQL> recover managed standby database disconnect;
Note: Please make sure do NOT sync back the standby environment on storage level again, otherwise it will destroy the standby database.

2. Using RMAN duplicate.
connect to primary database and standby databse.
$ rman target / AUXILIARY SYS/sys_pwd@sbdb

Sendmail: Prvent your SMTP servers from being blacklisted or graylisted

Sometimes we need to send email updates to millions of customers, the customer may register using their office email or public email like yahoo.
If you send a huge amount of emails in a short period of time, you may risk your SMTP servers beging blacklisted or graylisted.

There are a few ways to ensure your email server's functionality.

1. Use sendmail's greet_pause
In, define greet_pause as 500 mili seconds.
m4 >
restart sendmail service:
service sendmail restart
By defining greet_pause as 500, you are telling your email server to pause for 500 mili seconds before responding to any EHLO request. So it can control the rate you send email to outside world, preventing you from flooding outside email servers.

2. rotate IP addresses of your SMTP server periodically.
Suppose you are assigning management IP to eth0, you can define eth1 as the default outgoing IP address,,, are reserved for eth1, you can write script as cron to rotate them every day.

But you have to make sure these two things:
  • These 3 IP addresses are translated to 3 different public IPs on your router or firewall.
  • The 3 IPs are legitimate addresses to send out email for domain in your header FROM, you need to define you dns records and reverse resolution properly. Otherwise your mails may be rejected by other servers.

Note: A few years ago, I manged 20+ email servers as part of my work, every day we need to send out tons of emails. Customers register to receive email updates from us, I am not a spammer :)

Friday, 6 April 2012

Use Apache RewriteRule for SEO

There are often cases some web pages need to be moved to a different location.
Let's say we have a page

After maintaining the page for years, I decide to re-organize the website structure, put everything related to food in one directory. But over the years, the original page may have been booked marked by hundreds of visitors, and have been cached by many search engines. When someone access the original page either through bookmark or search engine, we need to inform him that the pages have been moved to new locations.

To do this, there are a few ways:

1. Create a hyperlink in the original page, connecting to the new page.

<a href=/food/recipes.html>recipes at new location</a>

This way requires visitor to click the link to view the new page, apparently, it's not a good idea.

2. Auto redirect the page to new location using javascript. 

<script type="text/javascript">

3. Auto redirect the page to new location using  HTML meta tag. 

<meta HTTP-EQUIV="REFRESH" content="0; url=" />

Both 2 and 3 can redirect the page to new location, but they are using 302 redirect, it's not SEO friendly. To avoid affecting the search engine visibility, it's better to use 301 redirect. For static pages served by Apache, we can use RewriteRule to achieve this.

4. Auto redirect using Apache RewriteRule 

RewriteEngine On
RewriteRule ^/food_recipes\.html$ /food/recipes.html [R=301,L]

In this way, it tells search engine that foodrecipes.html has been moved to /food/recipes.html permanently, search engine will not treat /food/recipes.html as new page.

Sunday, 1 April 2012

Use PAM to enforce Linux password complexity

It's always an audit requirement to have a system not too short and not so easy to be guessed.
To enforce the password length, we can use /etc/login.defs
next time when user changes password, anything shorter than 8 characters will be rejected.

To enforce the password complexity, we have to make sure it consists of  uppercase, lowercase, special characters, and digits. This can be easily done through the use of PAM.
$ man pam_cracklib
(N >= 0) This is the maximum credit for having lower case letters in the new password.
(N < 0) This is the minimum number of lower case letters that must be met for a new password.
So to force at least 1 lowercase character in the password, we should use negative number, lcredit=-1
To enforce a password having at least 4 lower cases, 2 upper cases, 1 special character, and 1 digit, we can update the /etc/pam.d/system-auth
password requisite dcredit=-1 ucredit=-2 lcredit=-4 ocredit=-1