Jeff Klemme, Principal Architect
Many times we come across scripts at client sites that load data from external files into the database. Too often we find that these data files get loaded into “temporary” staging tables in the database, where another process will then evaluate, massage, and store the relevant information in the “real” tables. One of the features of the Oracle database that makes this process more efficient is external tables.
External tables were introduced in Oracle 9i as a way to query information from a flat file without having to load the data into a staging table. David Woodard, one of the co-founders of House of Brick, wrote an article about this feature in September 2005 and the basics of external tables remains the same. For this blog, I would like to expand on a few of the topics in the original article and remind readers that the feature is available. In a future post, I will discuss some of the new abilities of external tables.
Tab-Delimited Files
During external table creation, you specify the format of the flat file. Many times the data is in a fixed-position or comma-delimited format as seen in David’s original article. However, one format that almost always trips me up and causes me to look up the syntax for is tab-delimited files. If the fields in your file are tab-delimited, use the following syntax for specifying the delimiter:
… ACCESS PARAMETERS (FIELDS TERMINATED BY X'09') …
Dynamic Filenames
When an external table is created, we specify the name of the external file to process, as shown below.
… LOCATION ('my_data_file.dat') …
Oftentimes, we don’t know the name of the flat file that we need to process. Rather than having to rename the input file to the name specified in the “LOCATION” parameter, allow it to be passed in to the procedure processing the file. Here’s an example:
procedure process_data (p_file_name in varchar2) is cursor c1 is select * from my_external_table; begin if p_file_name is not null then -- dynamically set the filename of the external table execute immediate ('alter table my_external_table location (''' || p_file_name || ''')'); for v_rec in c1 loop <process records> end loop; end if; end;
By using the “execute immediate” command, you can dynamically change the filename that the external table will reference. As a reminder, because we didn’t specify the directory along with the filename (dir:filename), it will use the directory object specified by the “DEFAULT DIRECTORY” when the external table was created. If you did specify a different directory object, you would also need to consider the privileges on that directory.
Data Cleansing
Rather than loading data from a flat file into a staging table and then running statements to update the data to cleanse it, manipulate the data as you query it from the external table. All of the standard functions are available, just as they are when you query a “normal” table. Here are some examples to consider:
- Use LTRIM and RTRIM functions to remove unwanted leading/trailing characters.
- Use UPPER, LOWER, and INITCAP to standardize name fields.
- Use REPLACE, SUBSTR, REGEXP_REPLACE, and REGECP_SUBSTR functions to help format data into a specific format or remove unwanted characters.
By cleansing the data as you read it in from the flat file, your input is standardized, saving you an extra step. It also doesn’t require a staging table in order for you to manipulate your data.
This blog post serves a reminder of the external table feature of the Oracle database. If you are not currently using this feature for loading flat-file data into your database, I hope that this post has offered some reasons for you to give it a try.