Troubleshooting Database Mail for SQL Server
Phil Ekins (@SQLPhilUK), Principal Architect
This is part two of our database mail blog series. You can read part one, Database Mail 101 Troubleshooting, here.
At House of Brick, one of our regular interactions with our clients is to assist them with their proactive needs. Items such as ensuring recommended best practices are in place, backup strategies are optimized etc.
Today I want to talk about a commonly disregarded aspect, Database Mail. SQL Server is great at knowing when it is hurting, but doesn’t have a default way to reach out. So using things like SQL alerts, job notifications on failure etc. bridge that gap, but they rely on Database Mail.
When Database Mail works, it works great. So, you can set up a distribution group for your DBAs and you’re good to go. When it doesn’t work how do you tell and when you know its broken how to troubleshoot it.
What Info Does SQL Provide?
So in our first step, let’s see what information SQL Server provides.
Within SSMS there is access to the Database Mail specific log, which gives us information on mail events.
This info is also available within a view in the msdb database.
select * from msdb.dbo.sysmail_event_log
I have cleared the log, so all we see here is the stopping and starting of the Database Mail queue.
Let’s Generate Some “Normal” Activity
Let’s generate some activity, and review the corresponding log entries.
The following T-SQL script is run to send to a valid email.
DECLARE @Subject VarChar(128) SET @Subject = @@ServerName EXEC msdb.dbo.sp_send_dbmail @profile_name = 'Default', @recipients = 'DBAGroup@company.com', @subject = @Subject, @body = 'Errors Logged', @body_format = 'text'
The output of which is below:
Every email sent will increment this counter.
The email arrived, and if we refresh the logs, we now see an additional entry.
Note that we are not seeing a specific entry for that send, just a “process is started” notification.
Let’s send five more emails to see what changes:
Below is the updated log.
So we see no change, and sending additional successful emails does not add extra logs. After roughly 11 minutes of inactivity in the mail queue, a new entry is shown saying the mail process is shutting down.
If we are inclined to see the individual level of detail, where can we look? There are a number of additional views that display the individual details.
For the purposes of troubleshooting, I’m going to use the ‘sysmail_allitems’ view and filter the data as needed.
Also a quick note on permissions, the views are granted select to sysadmin and databasemailuserrole roles. If running under the sysadmin role, you can see all mail activity. If running under the lesser databasemailuserrole access, you will only see activity created by your account.
For these scenarios we will run as ‘sa’, but we will discuss automation with lesser privileges later.
select * from msdb.dbo.sysmail_allitems Order by 1 desc
We can also see that the mailitem_id corresponds to the ‘Mail (Id: xx) queued’ number we see returned after every send.
Let’s Generate Some “Bad” Activity
So let’s create some less desired activity and review the logs:
DECLARE @Subject VarChar(128) SET @Subject = @@ServerName EXEC msdb.dbo.sp_send_dbmail @profile_name = 'Default', @recipients = 'DBAGroup@company.com.bad', @subject = @Subject, @body = 'Errors Logged', @body_format = 'text'
So we have an email account that will fail and is tied to Mail ID 14.
The log just tells us that the mail process has started:
select mailitem_id, recipients, send_request_date, sent_status, sent_date from msdb.dbo.sysmail_allitems where mailitem_id = 14
Querying the view directly tells us the email was sent. Eventually we may get a bounce back saying the email was undeliverable, but the bottom line is SQL Server is not omnipotent, and will not catch a typo.
Mail Server is Not Available
So let’s create a scenario where the mail server is unavailable because the server has lost access to the Internet.
Our mail queue number for this attempt is 15, so let’s see how the logs look. Remember this is a valid email account, and a valid configuration, but it has no access to the mail server.
The full error is:
The mail could not be sent to the recipients because of the mail server failure. (Sending Mail using Account 1 (2018-10-22T12:11:25). Exception Message: Could not connect to mail server. (No such host is known).
select mailitem_id, recipients, send_request_date, sent_status, sent_date from msdb.dbo.sysmail_allitems where mailitem_id = 15
So we have a ‘Could not connect to mail server. (No such host is known)’ error in the log, and the mail queue is showing the send_status as Failed.
Next, we allow the Internet issues to resolve and see if this will enable the email to be sent, and then initiate a valid send to confirm the queue is functioning.
select mailitem_id, recipients, send_request_date, sent_status, sent_date
from msdb.dbo.sysmail_allitems where mailitem_id = 15 or mailitem_id = 16
We can see that the new email did go through, but the failed email still failed, and clearly no retry is occurring after it has failed.
For the sake of completeness, we will force the mail process to stop and start.
EXEC msdb.dbo.sysmail_stop_sp EXEC msdb.dbo.sysmail_start_sp
As expected, the state of that failed email still has not changed. If the email was a critical alert, it’s never going to be sent.
Next we create some more scenarios and see how they present.
Bad Port Number
For example, I will change the port number within the Database Mail configuration, and send a valid email, mailitem_id 19.
The logs are showing three attempts to send with the same error notification as detailed below:
The mail could not be sent to the recipients because of the mail server failure. (Sending Mail using Account 1 (2018-10-22T12:35:47). Exception Message: Could not connect to mail server. (No connection could be made because the target machine actively refused it 18.104.22.168:5) and the mail queue was showing unsent, retrying and failed with corresponding time stamps to the logs.
The invalid configuration caused the ‘actively refused error’ versus the ‘no such host’ error we saw earlier.
Issues with the Mail Queue
So let’s create a stalled mail queue, which is an issue I’ve run into quite a bit of late.
Review the “normal” state of the queues:
exec msdb..sysmail_help_queue_sp exec msdb..sysmail_help_status_sp
Now we can see how those queues display with activity, and send a valid email we can monitor.
In this case, I was not fast enough but the length, basically the number of items in queue, would have incremented to one briefly. After the 11 minutes of inactivity we have previously discussed, the state will return to INACTIVE. Also of note is that the timestamps returned here are UTC times, so bear that in mind for troubleshooting.
Next we will stop the mail queue.
The mail queue is no longer active, and sending an email in this state generates the following error.
Let’s start the mail queue:
“Broken” Mail Process
Another scenario where the mail queue gets stalled, is a failure of the underlying database mail process. To demonstrate, I’m going to switch to an alternate instance where I have “broken” Database Mail.
We need to review the logs and queues, so I’m going to stop and start the mail queue and send a few valid emails.
Here we see both queues are running, Note now that the length is “3”, so we have three mail items in the queue. The log is also not showing the ‘DatabaseMail process is started’ as we have saw previously when an email was sent.
In this scenario, we also need to check the underlying Database Mail process.
If I browse the the Binn directory for this instance, I will find an app named DatabaseMail.exe.
Launching this executable should launch a black console window as shown below.
However in this case, I get the following error.
Behind the scenes SQL Server is calling this executable and supressing the error. There are also no errors generated in the SQL Server log, Agent log, OS Application, or Security and System logs.
Microsoft Support article KB3186435 describes this known issue and discusses three approaches to solving it. The first is probably the easiest, and involves recreating a missing DatabaseMail.exe.config XML file. The second option is a SQL Server repair, but seems like an overly intrusive approach which could cause an outage. The third option is to install .NET 3.5. As of SQL Server 2016, the requirement for .NET 3.5 has been removed, and option 1 demonstrated how we can fix this without installing 3.5. However, there is still some tooling that has this requirement, so you may decide installing 3.5 is the best long term solution. While .NET 3.5 can be installed without a reboot, or outage, your Change Management team may frown on that method for Production servers.
I have used both the first option and the third in the past to fix this issue and both wor. But for the purposes of this discussion, I have followed instructions for the first option in the article, and now the executable returns the blank console.
So let’s review the queue and logs.
No changes are shown yet, and we can see those three queued emails are still waiting as unsent.
Go ahead and restart the mail queue.
EXEC msdb.dbo.sysmail_stop_sp EXEC msdb.dbo.sysmail_start_sp
Now we can revisit the queue and logs.
So by restarting the mail queue, we were able to save those three hung-up emails without an outage.
External Emails are Blocked
One scenario that I’m not able to create, but is something I do sometimes see, is where the SQL Server is only allowed to send emails internally, so email@example.com will go through, but firstname.lastname@example.org will fail.
The error for such a scenario presents in the logs with the following wording.
The mail could not be sent to the recipients because of the mail server failure. (Sending Mail using Account 1 (2018-10-22T08:46:21). Exception Message: Cannot send mails to mail server. (Mailbox unavailable. The server response was: 5.7.54 SMTP; Unable to relay recipient in non-accepted domain).
The solution to this issue would be a conversation with your enterprise mail administrator about having the server in question added to a whitelist, which is an allowed list of servers that can send outside of the domain.
Next steps include how we monitor for and automate this information. Watch for my blog next week on Monitoring Database Mail, part three of the series.
Please note: this blog contains code examples provided for your reference. All sample code is provided for illustrative purposes only. Use of information appearing in this blog is solely at your own risk. Please read our full disclaimer for details.