Jim Hannan (@HoBHannan), Principal Architect
At House of Brick we specialize in virtualizing Tier-1 workloads like Oracle and SQL Server. If you look back at our history our founders built the company around performance tuning services. As part of these services we not only have proficiencies with the Oracle database, but also with platform tuning. Fast-forward to today and those fundamentals still apply. We have added Linux, vSphere, Windows and SQL Server as part of our core competencies as well. Included in that core is a strong understanding of storage and storage performance. As any DBA will tell you, storage is crucial part of any database performance profile.
As an administrator of the database, storage or platform (Linux, vSphere, RISC-Unix) what things should you be thinking about? It is critical to understand your environment’s demand for Input/Output (I/O). In our experience the demand is never a static demand, but always changing and probably steadily increasing as applications and data grow. Understanding that demand and peak throughput is important for maintaining service level agreements (SLAs) and the performance of your databases.
What tools can you use to help calculate demand? A few of the tools we would like to discuss in this three part blog series are Oracle Calibrate I/O, NMON, vSphere I/O graphs and AWR. Specifically, we will be looking at Input/Output operations per second (IOPS), which is almost a universal metric, meaning that most of the infrastructure components offer it as a metric to monitor your databases.
Why IOPS? First, as mentioned previously, it is a universal metric available at most layers. Second, databases always have some degree of random I/O. Random I/O can be difficult to measure because it is so conditional. Tools that measure straight throughput or sequential reads and writes in MB/s typically only tell part of the story. IOPS can give you a very accurate measurement of I/O demand. Lastly, it is the language of the storage vendors and storage administrators – they will appreciate your efforts to give them data and metrics that are workable for them to do sizing and performance forecasting.
Our approach for measuring our IOPS will be simple but effective. This is something we do as part of many of our services including right-sizing workloads, storage sizing and performance troubleshooting. The high-level steps are:
- Benchmark IOPS
- Determine Peak Periods
- Collect Results
- Record Results (for Future Analysis)
- Schedule Ongoing Benchmarks
We talked about IOMeter before in a previous blog. IOMeter is a fantastic tool, but we want to look at something that could actually be driven by the Oracle kernel. That tool is Calibrate I/O. Calibrate I/O is a tool provided by Oracle as part of the DBMS_RESOURCE_MANAGER.CALIBRATE_IO package. The procedure was supplied as of 11g. Calibrate I/O is able to do read-only I/O testing for both small random I/O and large sequential I/O. Oracle does a excellent job summarizing the tool in the paragraph below, from Oracle Doc ID 727062.1:
The Calibrate I/O feature is based on a PL/SQL function called DBMS_RESOURCE_MANAGER.CALIBRATE_IO(). When Calibrate I/O is invoked it will generate I/O intensive read-only random I/O (db_block_size) and large-block (1MByte) sequential I/O workloads. Unlike various external I/O calibration tools, this tool uses the Oracle code stack and runs in the database, issuing I/O against blocks stored in the database. The results, therefore, much more closely match the actual database performance. Once the workload execution is completed, a summary of the results is provided.
Here are prerequisites for running calibrate_io:
- SYSDBA privilege is required
- timed_statistics must be set to TRUE
- Asynchronous I/O must be enabled (this can be accomplished by either setting FILESYSTEMIO_OPTIONS to SETALL or running ASM)
We recommend that you first run calibrate_io in a non-production database. This will allow you to get comfortable with the tool and the potential impact it may have on I/O performance.
What does the test do? Oracle describes the tool as running two steps:
- Random block-sized reads, with the default being 8 KB to ALL datafiles on the database instance. This test is for testing the maximum IOPS and is designed for predicting maximum OLTP database throughput.
- The second step issues large 1 MB reads to ALL datafiles. This step gets recorded in MB/s. This step is designed to test for maximum data warehouse throughput. In a recent Oracle white paper, it is suggested that this step, in addition to random 1 MB test, does large sequential I/O testing. We believe this to be accurate when comparing the MB/s results with sequential I/O testing from tools like dd and IOMeter.
A few observations we have noticed while running the tool:
- If you use storage tiering, the results will look much different after tiering takes place. Typically, tiering moves frequently accessed blocks to faster storage in the late evening hours. This can greatly impact the results; it is good to be aware of this type of event. This does not mean this is not a fair test – it is actually what the storage will do with frequently accessed data. In fact, it can be used as a discovery of what to expect at each storage tier. Notice that we capitalized ALL datafiles in the previous section. This is to stress the fact that we will potentially be reading blocks that may not be frequently accessed.
- Our second observation is that the tool does some ramping, the typical test lasts about 5 minutes with heavy I/O activity in a short duration. You can observe the tool ramping by using IOPS tools like NMON, vSphere, or your storage array tools. There is also a setting to cap the test results and I/O demand. This setting is maximum latency – when the tool hits the maximum latency it ramps down, not ending the test, but recalibrating it to determine the maximum throughput given the defined parameters.
Now lets run our first test with calibrate_io. You will notice two optional settings:
We recommend setting both parameters. DISKS define the number of disks that we have in our LUN or LUNs. With new storage array technologies and huge LUNs this can be challenging to count. In this case a good starting point is 20.
The MAX_LATENCY is a cool feature, as we discussed in our observations, it allows for setting a latency cap.
Here is our sample run of MAX_LATENCY set to 20 and DISKS set to 20. This is a good test to start with before tweaking the settings.
SET SERVEROUTPUT ON DECLARE lat INTEGER; iops INTEGER; mbps INTEGER; BEGIN -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (, <MAX_LATENCY>, iops, mbps, lat); DBMS_RESOURCE_MANAGER.CALIBRATE_IO (20, 20, iops, mbps, lat); DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops); DBMS_OUTPUT.PUT_LINE ('latency = ' || lat); dbms_output.put_line('max_mbps = ' || mbps); end; /
The results are recorded in the table: dba_rsrc_io_calibrate.
Here is a video of us running the first calibrate_io test and retrieving the results. For the sake of saving time we are going to show the results from the previous calibrate_io run after kicking off calibrate_io. Also note we used NMON during the middle of the video to note the I/O activity.
This ends part one. In our next blog post we will cover recording the results, scripting the testing and using NMON to measure IOPS. If you have questions I can be reached via Twitter at @HoBHannan.