HammerDB Capture and Replay for Oracle
Jonas Mason, Senior Consultant
This blog covers using the capture replay feature of HammerDB, and helps the Oracle DBA and/or Tech Manager determine if this tool could provide value to their organization.
During the work that prompted my writing this blog, I generated trace files on a production Oracle 10g instance and replayed the converted trace files on an Oracle 12c development environment. The client’s original goal was to determine if this application’s SQL could be replayed on 12c without a significant rewrite that requires both QA and development time. The objective wasn’t as focused on changes in performance and explain plans between 10g and 12c, although this was considered an added benefit. Oracle’s Real Application Testing option would have been an excellent solution in this scenario, but was dismissed due to it’s high cost.
The free capture and replay feature of HammerDB was valuable for functional testing and yielded results that were useful. Statspack snapshots taken during SQL replay periods did provide some insight into instance and SQL wait events encountered. While SQL execution iteration and virtual user counts on replay scripts can be increased for performance and load testing purposes, this was not attempted in this scenario. While scaling up the execution of select statements in this way is feasible, scaling DML the same way is not possible.
HammerDB’s capture replay feature assists with functional testing, and when coupled with other synthetic tests such as Swingbench, SLOB or Calibrate I/O, can provide useful datapoints when migrating or upgrading an Oracle database.
HammerDB requires that trace files be generated at the source with ‘10046 trace name context forever, level 4’. These trace files are then converted by HammerDB to tcl scripts. In this case, I captured an hour of production activity for our client by enabling the trigger below provided by HammerDB in their documentation. Update the trigger code with the user you wish to capture session activity for:
after logon on database begin if (user = '<USER_NAME_TO_BE_CAPTURED>') then execute immediate ‘alter session set events ''10046 trace name context forever, level 4'''; end if ; end; /
I had to increase the size of the max_dump_file_size on the source instance, as this was originally set small enough that generated trace files were truncated with no useful information. So, I reset the max_dump_dest size parameter to its default (UNLIMITED) prior to starting my trace capture the second time. I also confirmed that I had significant storage for the dump files created, and monitored this as trace files were generated.
In this case, I generated 828 trace files of varying sizes that consumed a total of 4.2 GB. Converting this number of trace files to tcl scripts was not practical, so I eliminated both small and very large trace files from the manual conversion task. I eliminated small trace files, as they didn’t contain much SQL anyway. I also eliminated very large trace files, as they would take a long time to convert and often failed.
When converting the trace files, I found it was very important to be patient as 1) the trace file loaded, and 2) the trace file was converted to the tcl script. The GUI was fairly responsive considering the xwindows session to my desktop over VPN.
I did try concatenating the trace files in batches of 5 and 10, in hopes of converting more trace files easily. However, the concatenated files failed to parse and convert to a single tcl script.
I ended up with 34 tcl script files for replay on the 12c environment as an initial batch. Converting all 828 trace files to tcl scripts would have required significantly more time and manual effort, and would not have yielded much additional value. Given that application sessions often contain the same SQL statements, a random sample that could direct our focus was sufficient.
TCL SCRIPT REPLAY
Prior to starting the replay of the tcl scripts, I took a Statspack snapshot on the 12c dev instance so I could get a sense of the performance of the SQL being executed during replay.
I created one virtual user for each tcl script that I loaded for replay, with 10 iterations of the script, per test. I chose 10 so that multiple executions of a statement would be more likely to register in Statspack. I wasn’t conducting a performance test, so I didn’t want to increase either my virtual user count or iteration count significantly. Increasing virtual user count and iteration count increases testing time, and my primary objective was to execute the 34 tcl files as efficiently as possible.
Replaying the tcl scripts against the 12c environment yielded some valuable information, including:
- Error in Virtual User 1: ORA-00918: column ambiguously defined
- Error in Virtual User 1: invalid command name “SET”
- ORA-01790: expression must have same datatype as corresponding expression
I had not restored the production 10g instance to the 12c dev instance at of the point of time at which I started the SQL trace capture. Even if I had, this wouldn’t have been useful, as HammerDB replay does not allow for all transactions in multiple sessions to be replayed in the correct order.
As a result of these limitations of HammerDB (and eliminating the point in time recovery), unique constraint errors were common during SQL replay, which we were prepared for, as shown below.
Error in Virtual User 1: ORA-00001: unique constraint (<PK_NAME>) violated
We had these for four different tables/primary keys, and were at least satisfied that the insert statement parsed prior to hitting this unique constraint error.
However, select statements ran without issue for the most part.
When done with the execution of all playback scripts, I issued another Statspack snapshot, and generated a report to view the output.
Next Step Recommendations
The next step in this process was to identify the SQL statements in the tcl files that caused the ORA-00981, ORA-01790, and SET errors. These SQL statements require development attention prior to moving to 12c. If this SQL can be modified to work in 10g and 12c, the code can be released in production for capture, and then replayed in 12c to confirm success.
Pros and Cons
The capture replay feature can be used to capture SQL executed in production by the application, or executed against a dev or test environment by QA. A DBA, developer, or QA resource can then replay portions of the captured workload with very little effort.
HammerDB is better suited for the capture and replay of SQL executed by a smaller subset of sessions. While all production activity can be captured, there is no way to replay DML and selects in the same sequential order, even if the replay instance was restored to the exact point of time that the capture commenced. This makes replaying DML without error problematic.
There was also no way with HammerDB to replay the 823 converted trace files in the correct order, on another instance. As an alternative, Oracle’s Real Application Testing tool excels at capturing and replaying entire database workloads.
HammerDB’s capture and replay features are useful for more limited functional testing, and can help answer questions about the scope of coding effort required. However, keep in mind that HammerDB’s capture replay features are not to be used for performance load testing reasons for most shops (unless testing the scalability of select statements is the goal).
In the end, it was decided that the use of HammerDB’s capture replay feature for limited functional testing should be coupled with other synthetic performance tests, in order to determine if the new instance will perform as well or better than legacy.