Convert Your CLOBs to VARCHAR2 in Oracle 12c

Michael Stone (@HoBMStone), Lead Architect and CIO

Size of [N]VARCHAR2, RAW Increased 8x!

One easily overlooked new feature of 12c is an increase in the size of the VARCHAR2, NVARCHAR2, and RAW data types. These data types previously allowed up to 4K of data to be stored in them, but can now accommodate 32K (8 times more).

VARCHAR2 vs. NVARCHAR2
You wouldn’t be alone if you were wondering; what exactly is the difference between the two VARCHAR2 data types?

The VARCHAR2 data type was introduced around 1989 in Oracle 6, and can store character-based data in the database character set (NLS_CHARACTERSET).

The NVARCHAR2 data type allows us to use Unicode for some columns, regardless of the database character set. The NVARCHAR2 is a Unicode-only data type and typically contains data in the NLS_NCHAR_CHARACTERSET format.

One reason you may want to use NVARCHAR2 is that your DB uses a non-Unicode character set, and you still want to be able to store Unicode data for some columns without changing the primary character set. Another reason may be that you want to use two Unicode character sets (AL32UTF8 for data that comes mostly from western Europe, and AL16UTF16 for data that comes mostly from Asia for example) with each one being stored in the most efficient manner in columns of the appropriate data type.

Significance of 32K
Yeah – so what? 32,767 bytes (yawn). SQL Server can store 2GB in VARCHAR (Max)!

Ignoring the fact that many historical events were accomplished with computers having a total of 32K of memory or less, what does that number mean to us today in 2016?

According to analysis performed on numerous Internet sites, the average word length in English text is about five characters. Although many words are longer, shorter words are used much more frequently. Allowing an additional 1.5 characters for the space and punctuation between words, means that in 32K (with an 8-byte character set), we can store about 5,000 words of written English text. At 130 words spoken per minute, that is approximately a 38-minute speech.

In addition to natural languages, many computer messaging formats are text-based, and 32K represents an 8x increase compared to what we could store in previous versions of Oracle VARCHAR2 columns. That means that many computer applications can now be simplified, and potentially benefit from dramatically improved performance by storing character data in-line as opposed to in Large Objects as previously required for anything larger than 4,000 bytes.

CLOB vs. VARCHAR2
Prior to 12c, any data over 4K had to be stored in Large Objects (LOBs) with the character variant being a Character Large Object (CLOB). These are inherently less efficient than native data types, which are stored in-line with the data row. LOBs have special storage requirements, and may even reside outside of normal table spaces. Working with LOBs is also … clunky.

In addition to the performance benefits and simplification of your application code, there are more built-in functions, as well as operators, for working with VARCHAR2 than with CLOB data. So, if your existing CLOB data values are less than 32K, it probably makes sense to consider a data conversion as part of your Oracle 12c upgrade. This will be especially true for CLOB data that is inserted, updated, or manipulated frequently.

Configuration
In order to enable the increased size limit in 12c, we have to set the initialization parameter MAX_STRING_SIZE = “EXTENDED” (as opposed to “NORMAL”).

Once that change is made, you will be able to create columns and declare PL/SQL data types of VARCHAR2(32767).

Unfortunately, to upgrade an existing database is not quite as simple as it sounds. The change can only be made when the database is in upgrade mode, and a support script must be run to upgrade the data dictionary in conjunction with the parameter change. Be warned, there is no way back once you’ve made the change!

Traditional DB Example:

SQL> SHUTDOWN IMMEDIATE
SQL> STARTUP UPGRADE
SQL> ALTER SYSTEM SET max_string_size=extended;
SQL> @?/rdbms/admin/utl32k
SQL> SHUTDOWN IMMEDIATE
SQL> STARTUP

 

Pluggable DB Example:

SQL> ALTER SESSION SET CONTAINER = {Normal DB};
SQL> ALTER PLUGGABLE DATABASE CLOSE IMMEDIATE;
SQL> ALTER PLUGGABLE DATABASE OPEN UPGRADE;
SQL> ALTER SYSTEM SET max_string_size=extended;
SQL> @?/rdbms/admin/utl32k
SQL> ALTER PLUGGABLE DATABASE CLOSE;
SQL> ALTER PLUGGABLE DATABASE OPEN;

 

Converting Columns
All right, so I’m convinced. How painful is this going to be?

First, you’ll want to figure out if you have any CLOB values over 32K by running a simple query:

SQL> select count(1)
from {your_table}
where dbms_lob.getlength( {CLOB_Column} ) > 32767

 

If you get back a count of ”0”, then you’re in good shape. Otherwise, you’ll have to figure out how best to continue. Perhaps you can transform or truncate the longer ones, or maybe you have to scrap the idea altogether.

Assuming you continue (and you’ve upgraded your database as described above), then it’s a matter of the following steps:

 

SQL> alter table {your table} add {VARCHAR2_Column} varchar2(32767);
Table altered.


SQL> update {your table} set {VARCHAR2_Column} = {CLOB_Column};
17000 rows updated.


SQL> commit;
Commit complete.


SQL> alter table {your table} drop column {CLOB_Column};
Table altered.


-- Optionally rename the new column

SQL> alter table {your table}
rename column {VARCHAR2_Column} to {CLOB_Column};

Table altered.

 

To truncate text in the CLOB to the maximum length of VARCHAR2, change the update query to:

SQL> update {your table}
set {VARCHAR2_Column} = dbms_lob.substr( {CLOB_Column},
amount=>32767, offset=>1);

 

Additional Considerations

  1. Once you upgrade a database, there’s no way back other than to restore from backup.
  2. If you want to index one of your new larger VARCHAR2 columns, you may run into restrictions on the maximum size of an indexed column, which is about 75% of the index tablespace block size. In such cases, you will have to move or create the index in a tablespace with a larger block size.
  3. You may need to review any application code that works with LOB columns
    1. Insert and updates may need to ensure values are not larger than 32K, but otherwise remain the same.
    2. Although many functions are agnostic, or overloaded to work with either CLOB or VARCHAR2, some code may need to be changed if using LOB-specific functionality.
  4. You should always perform a full regression test of any applications when making structural changes to your database.

Conclusion

In summary, the benefits of converting CLOB columns sized 32K or less to VARCHAR2, and the benefits include:

  1. Improved performance for DML involving character objects
  2. Ability to index the column values
  3. Simplification of application code
  4. Reduced reliance on PL/SQL procedures for manipulation of column values
  5. Increased functionality for VARCHAR2 vs. CLOB

 

Table of Contents

Related Posts