Gathering Statistics in Parallel – Oracle Standard Edition

Andy Kerber (@dbakerber), Senior Consultant

Gathering statistics for the Oracle database is a common task for DBAs.  Quite often, an experienced DBA will discover that the standard Oracle method of gathering statistics skips tables that need to be analyzed frequently (a common problem on large tables), or gathers stats on tables that you don’t want to gather stats on.  In addition, gathering statistics can take a significant amount of system resources, and can run for long periods of time. This is especially the case with Oracle Standard Edition, where statistics cannot be gathered in parallel.

Fortunately, Oracle offers a solution to all of these problems in a single Oracle feature, the Oracle scheduler.  While using the scheduler will not allow us to gather statistics on an individual table in parallel, it will allow us to gather statistics on multiple tables at the same time, which can be a very reasonable facsimile of gathering statistics in parallel.

There are two methods of managing what we will call customized stats gathering.  It can be done entirely at the schema level, or on a table-by-table basis.  Gathering on a table-by-table basis generally requires that the stats gathering process be table driven, because of the typically large number of tables involved.  We will use the table driven method in our example.

Below is the layout of our driver table, called STATS_DRIVER:

OWNER       VARCHAR2(60)
TABLE_NAME     VARCHAR2(60)
GROUP_NUMBER       INTEGER
ESTIMATE_PERCENT       NUMBER
LAST_RUN_DATE       DATE
RUNNING_NOW     CHAR(1)

 

In the case of a limited window size, the group_number is designed to allow the DBA to prioritize the stats gathering job. The script runs through the groups, gathering stats on all tables in each group simultaneously, starting with group 1.

Next, we write a stored procedure that will use the driver table to gather statistics. Using the group number allows us to prioritize individual groups of tables. If you do not want to prioritize, just use one group number. Below is the stored procedure:

procedure hob.analyze_by_group(max_parallel in integer) authid current_user is
/* This Stored procedure is used to analyze groups of tables in parallel */
job_exists number;
max_group_number integer;
loopvar number;
num_running integer;
jobcount integer:=0;
begin
select max(group_number) into max_group_number from stats_driver; -- Lets us know how far we need to loop
for loopvar in 1..max_group_number
loop
for analyze_cursor in -- Cursor will be used to generate the analyze jobs and command.
(select owner, table_name, estimate_percent, group_number from stats_driver where group_number=loopvar)
loop
jobcount:=jobcount+1; -- Will be used to make sure we have a unique job name
select count(1) into job_exists from all_scheduler_jobs where job_name='ASYNC_ANALYZE_'||analyze_cursor.owner||'_'||to_char(jobcount);
if job_exists >0       -- Drops job if it currently exists, Put this in for debugging, but it doesnt hurt to leave it in.
then
dbms_scheduler.drop_job(job_name=>'ASYNC_ANALYZE_'||analyze_cursor.owner||'_'||to_char(jobcount),force=>true);
end if;
dbms_scheduler.create_job( -- Generates analyze command
job_name=>'ASYNC_ANALYZE_'||analyze_cursor.owner||'_'||to_char(jobcount),
job_type=>'PLSQL_BLOCK',
job_action=>'BEGIN dbms_stats.gather_table_stats(ownname=>'||''''||analyze_cursor.owner||''''||',tabname=>'||''''||analyze_cursor.table_Name||''''||',estimate_percent=>'||
analyze_cursor.estimate_percent||',cascade=>TRUE'||'); END;',
comments=>'Asynch analyze job for tables',
auto_drop=>true, --set this to false for debugging purposes, job will remain on dba_scheduler_jobs view until manually removed
enabled=>true); --can also be set to false if you wish to start job manually for debuggin
-- Wait here until we have room to start the next job
num_running:=max_parallel; -- This section is used to ensure that we dont have more jobs running at a time than we want.
select count(1) into num_running from all_scheduler_jobs where job_name like 'ASYNC_ANALYZE%';
while num_running >= max_parallel
loop
dbms_lock.sleep(1);
select count(1) into num_running from all_scheduler_jobs where
 job_name like 'ASYNC_ANALYZE%';
end loop;
end loop;
end loop;
end;

 

Alternatively, you could choose to analyze tables by schema, which is probably the more common method. I have rewritten the stored procedure above to use all_tables and the schema name. The revised procedure is shown below:

procedure hob.analyze_by_schema(in_schema in varchar2, max_parallel in integer) authid current_user is
/* This Stored procedure is used to analyze groups of tables in parallel. Each table in the schema entered will be analyzed. All_tables will be the driver table */
job_exists number;
loopvar number;
num_running integer;
jobcount integer:=0;
begin
for analyze_cursor in -- Cursor will be used to generate the analyze jobs and command.
(select owner, table_name from all_tables where owner=in_schema)
loop
jobcount:=jobcount+1; -- Will be used to make sure we have a unique job name
select count(1) into job_exists from all_scheduler_jobs where job_name='ASYNC_ANALYZE_'||analyze_cursor.owner||'_'||to_char(jobcount);
if job_exists >0       -- Drops job if it currently exists, Put this in for debugging, but it doesnt hurt to leave it in.
then
dbms_scheduler.drop_job(job_name=>'ASYNC_ANALYZE_'||analyze_cursor.owner||'_'||to_char(jobcount),force=>true);
end if;
dbms_scheduler.create_job( -- Generates analyze command
job_name=>'ASYNC_ANALYZE_'||analyze_cursor.owner||'_'||to_char(jobcount),
job_type=>'PLSQL_BLOCK',
job_action=>'BEGIN dbms_stats.gather_table_stats(ownname=>'||''''||analyze_cursor.owner||''''||',tabname=>'||''''||analyze_cursor.table_Name||''''||',estimate_percent=>100,cascade=>TRUE'||'); END;',
comments=>'Asynch analyze job for tables',
auto_drop=>true, --set this to false for debugging purposes, job will remain on dba_scheduler_jobs view until manually removed
enabled=>true); --can also be set to false if you wish to start job manually for debuggin
-- Wait here until we have room to start the next job
num_running:=max_parallel; -- This section is used to ensure that we dont have more jobs running at a time than we want.
select count(1) into num_running from all_scheduler_jobs where job_name like 'ASYNC_ANALYZE%';
while num_running >= max_parallel
loop
dbms_lock.sleep(1);
select count(1) into num_running from all_scheduler_jobs where job_name like 'ASYNC_ANALYZE%';
end loop;
end loop;
end;

 

In summary, Oracle statistics can be gathered much more quickly in parallel. While Oracle does not provide this capability in Oracle Standard Edition, with a little work, statistics can be gathered in parallel to allow for rapid statistics gathering.

Please note: this blog contains code examples provided for your reference. All sample code is provided for illustrative purposes only. Use of information appearing in this blog is solely at your own risk. Please read our full disclaimer for details.

 

Table of Contents

Related Posts