Unix Operating Systems Utilities for the DBA – Part 2

David Woodard | Originally appeared in March 2006 Bricks newsletter. Content has been updated where appropriate.

In part 1 I covered the ps & pgrep, du, and isof Unix utilities. In part 2, I will focus on the system call tracing and network packet capture UNIX utilities for the DBA.

System call tracings

We’ve all had the situation where a process seems to be hung, and we can’t tell what it’s doing. In this situation, a very useful utility allows you
to trace system calls made by a running process. These utilities usually require that you be the root user, or the user that owns the process. This works well in an Oracle environment because you usually have access to the userid that owns the process.

On Linux the utility is called “strace”, on Solaris and AIX 5.1 and newer, it’s called “truss”. And on HP-UX, it’s called “tusc”. They all work in a similar fashion, so we’ll take a look at “strace” on Linux for some examples.

Before we do that, let’s discuss what a system call is. When a program runs, some of the code executes in “user mode”, which is the part of the code that does not require any outside help from the operating system. Calculations, user defined functions, loops, conditional statements, etc. are all examples of user mode code. When a program needs to interact with CPU, disk, memory, or the network, it does not have permission on those devices, so the operating system executes that code on the user’s behalf. For example, when you need to read a file, you call the open() and read() functions. You do not need to write a device driver and have access permission to the raw disk device to write a file. Open(), close(), read(), and write() are some of the most common system calls.

So, let’s run “strace” against our listener process. We’re also going to use the “-r” flag to print a timestamp relative to the start of our system call tracing:

          $ strace -r -p 6058

          Process 6058 attached - interrupt to quit

          0.000000 poll(

And it pretty much sits there. Poll() is system call that allows a process to listen on multiple TCP/IP sockets (ports) at the same time. If a connection comes in, we’ll see something like this:

          0.000460 getsockname(8,
{sa _ family=AF _ INET, sin _ port=htons(1529), sin _ addr=inet _ addr(“0.0.0.0”)}, [16]) = 0

          0.045882 getpeername(8, 0xbfff9928, [16]) = -1 ENOTCONN (Transport endpoint is not connected)

          0.000332 accept(8, {sa _ family=AF _ INET, sin _ port=htons(32948), sin _ addr=inet _ addr(“127.0.0.1”)}, [16]) = 13

          0.000397 getsockname(13,
{sa _ family=AF _ INET, sin _ port=htons(1529), sin _ addr=inet _ addr(“127.0.0.1”)}, [16]) = 0

 

And a whole lot more output. Note that you have to press <control-C> to disconnect strace from the process. Our goal here is not to understand all the system calls, but to see if a process is really hanging or if it is legitimately waiting, as in the case of our listener.

Other useful flags are “-o” to write the output
of strace to a file, “-f” to follow processes forked by the process you are tracing, “-e” to limit the system call names to a particular pattern (like “open”), and “-T” to show the time spent in each system call. For example, “strace -f -r -p 6058” will allow us to trace the listener and the shadow processes that get started when someone logs
in. All the trace data might be more than we need, but we can confirm that the listener is functioning properly.

Network packet capture

Most versions of Unix have utilities to capture network packets. On Solaris it is called “snoop”, and on most other Unix systems it is “tcpdump”. Perhaps the best of them all is an Open Source packet capture tool called Wireshark (formerly Ethereal). It is a graphical program that runs on Microsoft Windows or on X-Windows under Unix. The best feature, in my opinion, is that it can read tcpdump or snoop capture files from almost every system.

Running tcpdump or snoop requires root access, and rightly so, since we could grab passwords off the network with those utilities. So, to
trace network data, I usually ask the system administrator to run tcpdump or snoop for a particular port, then send me the output file.

I’ll open the file in Wireshark to analyze it. For example, to trace our Oracle listener, as root, do:

          # tcpdump -i eth0 -w tcp.out port 1521

 

This will capture packets on network interface “eth0”, port 1521, and generate a file named tcp.out, which I can open with Wireshark to analyze, even if the file came from Unix and I run Wireshark on MS Windows. Analyzing network traffic with Wireshark is beyond the scope of this article, but I wanted you to be aware that you can capture network traffic on one platform and analyze it on another.

Putting it all together

Recently, I was taking a look at the Beta download of Oracle XE (10g), the free limited version of Oracle available on Oracle Technet. It comes with a script for Unix that will start the Oracle listener and the Oracle XE (10g) database on system boot. 
But unfortunately, it did not start automatically (remember this was a Beta version), and actually hung the system boot process on Redhat Linux. So, I decided to dig in and see what was going
on. After the system started to boot and hung, I logged in via SSH (the GUI and local terminals had not yet started), and looked at all processes on the system with”ps -ef” (shown below is a portion of the output, edited for space):

          $ ps -ef

          root 2714 1732 0 23:16 ? 00:00:00 /bin/bash /etc/rc5.d/S80oracle-xe start

          root 2739 2714 0 23:16 ? 00:00:00 /bin/su oraclexe -c /usr/lib/oracle/ xe/app/oracle/product/10.2.0/server/ bin/lsnrctl start

 

We see that process 2739 is trying to start the Oracle listener. After running “ps -ef” several times, we conclude that it must be hung. So, let’s use “strace” (we’re running on Linux here) to see what it is doing:

          $ strace -p 2739

          Process 2739 attached - interrupt to quit read(0, <unfinished ...> Process 2739 detached

 

The read (0, <unfinished …) indicates that the process is waiting for input on file descriptor zero. As a program opens and closes files, Unix assigns numeric handles, or file descriptors (FD’s) to the files. FD zero is usually standard input from a terminal, but the program could have closed it and reopened it pointing to some other file. So, let’s take a look with lsof (again edited for space):

$ lsof -p 273

COMMAND  PID  USER   FD   TYPE   DEVICE   SIZE   NODE    NAME
-------------------------------------------------------------------------------------------------------------------------------------------
su      2739  root   cwd  DIR
1  3,1    4096    2 /
su      2739  root   rtd  DIR
   3,1    4096     2 /
su      2739  root   txt  REG
   3,1    57632    294388 /bin/su
su      2739  root   0u   CHR    136,0           2 /dev/pts/0
su      2739  root   1w   CHR    1,3             1513 /dev/null
su      2739  root   2w   CHR    1,3             1513 /dev/null

 

Under the FD column, we see that file descriptor zero corresponds to file /dev/pts/0, so the command is indeed waiting for input from the terminal. The “su” command may sometimes hang waiting on input, depending which shell is invoked to execute the command. To fix it, I edited the /etc/init.d/oracle-xe script and changed:

          $SU $ORACLE _ OWNER -c “$LSNR start” > /dev/null 2>&1

          to

          $SU $ORACLE _ OWNER -c “$LSNR start” > /dev/null 2>&1 < /dev/null

 

This causes the command to read from /dev/null and not the terminal, and it continues on with no problem. There was also another line in that same script that hung after I fixed this one. I did check the Oracle XE (10g) forum, and someone had already reported the same bug and the same fix.

As you can see, we have several tools that complement each other nicely. In another case, someone came to me with an Oracle program they had written that was running slowly. I looked in the Oracle database, and it was using bind variables, and the query had a good explain plan, so it seemed fine. To investigate further, I used a system call tracing utility to watch it.

I saw it writing lots of bytes to a file descriptor, which was unusual since this was a query. Some writes are expected as it talks to Oracle over the network, but these were excessive. A quick check with “lsof” told me the name of the file and led
me to the problem; the program was writing every returned row to a file in /tmp. Debugging code had been inadvertently left in place, and after it was turned off, the program performed very well.

Conclusion

Hopefully I’ve given you some tools to step outside that box we call Oracle and trace problems down to the OS layer.

Table of Contents

Related Posts