Oracle Calibrate I/O – Part 2

posted September 4, 2015, 11:23 AM by

Jim Hannan (@HoBHannan), Principal Architect

In Part 1 of this blog, published in June of last year, we discussed the value of having I/O baselines and some of the tools available to administrators and DBAs for testing I/O. Additionally, we detailed how to test IOPS using calibrate_io. In this blog, we will look at scheduling calibrate_io to run as a cron job. We will also record the calibrate_io results to a text file for future baseline analysis and I/O troubleshooting.

The script in table 2 is our calibrate_io script for automating and scheduling runs. Why would you want to schedule a regular test? We find that having baselines of regular storage tests to be very valuable for troubleshooting purposes. For example, I schedule a job to run once a week on Saturday night (the test runs for 5 minutes). On the first Saturday, my MAX_IOPS = 21380 and MAX MB/s = 337, but on the following Saturday the results drop to MAX_IOPS = 11179 and MAX MB/s = 166 MB/s. Clearly the I/O throughput has dropped. This would warrant a discovery of what is causing the slower I/O. Additionally, over time you may see a slow decline in performance, which could possibly be attributed to the SAN or array getting over provisioned with workloads – again another good use of baselines.

Taking a look at table 1, we see the output of the script hobcalio.sh after 4 runs. Unfortunately, while Oracle does write calibrate_io output to a view, it updates or overwrites the row after each test.

When writing up this blog, I worked with my coworker Jeff Klemme. Originally, I suggested we place everything in our table into a new schema. Jeff quickly talked me out of this because of the overhead and he also noted that most DBAs will be reluctant to create a new schema in their production database. Keep things simple right? So, we decided instead to write the data to a flat file. The script allows you to pick the flat file name. In my example, shown in table 1, the database name is PIG, so I choose the file name of PIG.results.

Table 1 – The Output File (in CSV format) From hobcalio.sh

[oracle@oel6-11g-pig ~]$ cat PIG.results
ID, START TIME, END TIME, MAX_IOPS, MAX_MBPS, MAX_PMBPS, LATENCY, DISKS
PIG,"20150714 09:35:31","20150714 09:43:01",21380,337,340,6,20
PIG,"20150714 09:43:56","20150714 09:51:26",21288,342,339,3,20
PIG,"20150714 12:55:21","20150714 13:03:22",20949,333,340,4,20
PIG,"20150903 13:18:05","20150903 13:25:35",21092,334,310,4,20

 

Table 2 shows the script for running calibrate_io. Jeff added in a few parameters that you can specify, see below:

Usage: hobcalio.sh [options]

This program executes the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure within Oracle to benchmark the I/O performance and save the information for analysis over time.

-d   Specify the number of disks in a LUN, DEFAULT 20
-f   Specify the output file, DEFAULT “data/calibrate_io_stats.csv”
-h   This help message
-i   Instance name REQUIRED
-l   Specify the max latency (in ms), DEFAULT 20
-o   Set the ORACLE_HOME directory.
-v   Turn on verbose output

Lastly, you can schedule a job to run through crontab to record results. We recommend once a week for a baseline. As with any I/O testing tool, you should take note and be careful when you run your tests, especially the first few times. As you get more familiar with the test and the performance of the array, you can look for times to run it during normal business processing. Let us know if you have any questions in the comments below or contact me at @HoBHannan.

Table 2 – hobcalio.sh

#!/bin/bash
#

#######################################################################
# Initialize/Default Variables
#######################################################################
DEBUG='N'
DISKS=20
MAXLATENCY=20
DATADIR=data
DATAFILE=""

#######################################################################
# Function to display usage
#######################################################################
usage()
{

echo \
"
Usage: ${0##*/} [options]

This program executes the DBMS_RESOURCE_MANAGER.CALIBRATE_IO procedure within
Oracle to benchmark the I/O performance and save the information for analysis
over time.

-d   Specify the number of disks in a LUN, DEFAULT 20
-f   Specify the output file, DEFAULT \"data/calibrate_io_stats.csv\"
-h   this help message
-i   instance name REQUIRED
-l   Specify the max latency (in ms), DEFAULT 20
-o   set the ORACLE_HOME directory.
-v   turn on verbose output
"
exit 1
}

#######################################################################
# Function to run the process and grab the results
#######################################################################

RUN_CALIO ()
{
DATA=`${ORACLE_HOME}/bin/sqlplus -s "/ as sysdba" <<EOF
set serverout on size 1000000 linesize 132 pagesize 0 heading off feedback off
termout off echo off trimspool on
WHENEVER SQLERROR EXIT FAILURE
declare
v_disks pls_integer := ${DISKS};
v_maxlatency pls_integer := ${MAXLATENCY};
v_iops pls_integer;
v_mbps pls_integer;
v_lat pls_integer;

cursor c1 is
select to_char(start_time,'YYYYMMDD HH24:MI:SS') start_time,
to_char(end_time,'YYYYMMDD HH24:MI:SS') end_time,
max_iops,
max_mbps,
max_pmbps,
latency,
num_physical_disks
from dba_rsrc_io_calibrate;

BEGIN
-- first run the procedure
--dbms_resource_manager.calibrate_io(<DISKS>, <MAX LATENCY>, iops, mbps,
lat);
dbms_resource_manager.calibrate_io(v_disks, v_maxlatency, v_iops, v_mbps,
v_lat);

if nvl(v_iops,0) > 0 then
-- Looks like we have some data
-- get the results in a comma-delimted string

for c1_rec in c1 loop
dbms_output.put_line('${ORACLE_SID}' || ',' || '"' || 
c1_rec.start_time || '","' || c1_rec.end_time || '",' || c1_rec.max_iops || ','
||
c1_rec.max_mbps || ',' || c1_rec.max_pmbps || ',' 
|| c1_rec.latency || ',' || c1_rec.num_physical_disks);
end loop;
end if;
END;
/
EOF`

RC=$?
} # END RUN_CALIO

#######################################################################
#
# MAIN
#
#######################################################################

#######################################################################
# Process the cmd line options
#######################################################################

while getopts d:f:hi:l:o:v opt
do
case $opt in
d)
DISKS=${OPTARG}
;;
f)
DATAFILE=${OPTARG}
;;
h)
usage
;;
i)
INSTANCE=${OPTARG}
;;
l)
MAX_LATENCY=${OPTARG}
;;
o)
ORACLE_HOME=${OPTARG}
;;
v)
DEBUG='Y'
;;
*)
usage
;;
esac
done
shift $(( OPTIND - 1 ))

# set the oracle home dir and SID
export ORACLE_HOME=${ORACLE_HOME:-
/usr/lib/oracle/xe/app/oracle/product/11.2.0/server}
export ORACLE_SID=${INSTANCE}

if [ -z ${INSTANCE} ]; then
echo "ERROR: Must pass in instance name"
usage
fi

if [ ! -x ${ORACLE_HOME}/bin/sqlplus ]; then
echo "ERROR: Invalid ORACLE_HOME"
usage
fi

# Default the filename if not passed in
if [ -z "${DATAFILE}" ]; then
# Create the "data" directory if it doesn't exist
if [ ! -d ${DATADIR} ]; then
mkdir ${DATADIR}
fi
DATAFILE=${DATADIR}/calibrate_io_stats.csv
fi

# Run the CalibrateIO procedure
RUN_CALIO
if [ ${RC} -ne 0 ]; then
echo "There was a problem running the Calibrate IO procedure."
echo "${DATA}"
exit ${RC}
elif [ -n "${DATA}" ]; then
# If this is a new datafile, put a header record in first
if [ ! -w ${DATAFILE} ]; then
echo "INSTANCE, START TIME, END TIME, MAX_IOPS, MAX_MBPS, MAX_PMBPS, 
LATENCY, DISKS" > ${DATAFILE}
fi

echo "${DATA}" >> ${DATAFILE}

if ["${DEBUG}" == "Y" ]; then
echo "instance, start time, end time, max_iops, max_mbps, max_pmbps, latency, disks"
echo "${DATA}"
fi

exit 0
else
# We had a problem somewhere so exit with non-zero return code
exit 1
fi
Share with your networkTweet about this on TwitterShare on LinkedInShare on FacebookDigg thisEmail this to someone

3 Comments

  • Jim Hannan says:

    Matt, good question. A few years ago I helped a customer troubleshoot why their Oracle performance had slowed down over the last several weeks. We ran calibrate_io across many of their volumes (and databases) at various times during the day with a latency of 20 ms to address Mike Yakus’ concern (if latency of 20 ms is reached the test will ramp down and back up). Over the 3 days of testing we range between 20 – 100 for the num_physical_disks. The results did not vary for us, so we standardize on 20. We did notice that setting < 20 did degrade our results.

  • Matt Kaberlein says:

    Jim,

    Since a single LUN can distribute its DB data across a pool of (let’s say) 200 disks, what would you set the parm num_physical_disks to?

    Since we live in a world of pool based storage vs. RAID group based storage, i suspect the setting of this parm needs some special attention.

    Also, you guys ever look at SLOB as an IO generation tool?

  • Mike Yakus says:

    We used I/O Calibrate to debug our new EMC SAN installation as we migrated from an HP SAN. Saving information over time was invaluable in demonstrating adverse changes. Also, a warning for productions systems: I/O Calibrate is a benchmarking tool. As such it performs as much I/O as it possibly can making it inadvisable to perform on a production system when other people are using it. We ran it during quiet periods (weekends, evenings after batch, etc.) which gave much more consistent results than if tested during times when the database and server were active. Good article!

Leave a Reply

Your email address will not be published. Required fields are marked *

WANT TO LEARN MORE?

Share with your networkTweet about this on TwitterShare on LinkedInShare on FacebookDigg thisEmail this to someone