Phil Ekins (@SQLPhilUK), Principal Architect
The second blog in this series on Database Mail discussed how to troubleshoot and identify some common issues seen with Database Mail and some options for resolution.
In this blog, we shift our focus to being proactive, and how we can get SQL Server to identify, self-heal, and finally, if needed, notify us when email is down.
So let’s revisit the elements from the second blog in this series with a proactive approach in mind. First and foremost, the three system views we have been leveraging are:
- dbo.sysmail_allitems
- sysmail_help_queue_sp
- .sysmail_help_status_sp
They are all limited to being helpful only while being run as sysadmin.
Workarounds to the SysAdmin Requirement
Hopefully your SQL Agent service account does not have that level of access. Instead, we can create procedures with the EXECUTE AS logic. These three procedures should be secured (permissions wise), as sensitive information could potentially be seen via access to the subject line.
Note: we are using procedures and temp tables to process results due to the inability to create a view with the execute as approach.
USE [msdb] GO CREATE PROC [dbo].[HoB_sysmail_help_queue_sp] WITH EXECUTE AS 'dbo' AS BEGIN Â Â Â Â Â Â EXEC msdb..sysmail_help_queue_sp END REVERT GO CREATE PROC [dbo].[HoB_sysmail_help_status_sp] WITH EXECUTE AS 'dbo' AS BEGIN EXEC msdb..sysmail_help_status_sp END REVERT GO CREATE PROC [dbo].[HoB_sysmail_allitems] @Filter INT WITH EXECUTE AS 'dbo' AS IF (@Filter = 0) BEGIN SELECT mailitem_id, recipients, send_request_date, sent_status, sent_date FROM msdb.dbo.sysmail_allitems END IF (@Filter = 1) BEGIN SELECT mailitem_id, recipients, send_request_date, sent_status, sent_date FROM msdb.dbo.sysmail_allitems WHERE sent_status = 'sent' END IF (@Filter = 2) BEGIN SELECT mailitem_id, recipients, send_request_date, sent_status, sent_date FROM msdb.dbo.sysmail_allitems WHERE (sent_status = 'unsent' OR sent_status = 'retrying') END IF (@Filter = 3) BEGIN SELECT mailitem_id, recipients, send_request_date, sent_status, sent_date FROM msdb.dbo.sysmail_allitems WHERE sent_status = 'failed' END REVERT GO
Now we can use these procedures, via a scheduled process, to review if the queue has started, and if not attempt to start it.
Check and Restart the Mail Queue
In examples #1 and #2 below we check, attempt to start, recheck and finally raise an error if the start fails.
Example #1
DECLARE @ResultOutput VarChar(64) CREATE TABLE #Result ([Status] VarChar(64)) INSERT INTO #Result ([Status]) EXEC msdb..[HoB_sysmail_help_status_sp] SELECT @ResultOutput=[Status] FROM #Result IF @ResultOutput = 'STOPPED' BEGIN Â Â Â Â Â Â EXEC msdb.dbo.sysmail_start_sp END WAITFOR DELAY '00:00:10' TRUNCATE TABLE #Result INSERT INTO #Result ([Status]) EXEC msdb..[HoB_sysmail_help_status_sp] SELECT @ResultOutput=[Status] FROM #Result IF @ResultOutput = 'STOPPED' BEGIN Â Â Â Â Â Â RaisError ('Unable to Start Mail Queue', 15, 10) END DROP TABLE #Result
Example #2
DECLARE @ResultOutput VarChar(64) CREATE TABLE #Result ([queue_type] nVarChar(6), [length] INT, [state] nVarChar(64), [last_empty_rowset_time] DateTime, [last_activated_time] DateTime) INSERT INTO #Result ([queue_type], [length], [state], [last_empty_rowset_time], [last_activated_time]) EXEC msdb..[HoB_sysmail_help_queue_sp] SELECT @ResultOutput=[queue_type] FROM #Result WHERE [queue_type] = 'mail' IF @ResultOutput IS NULL BEGIN EXEC msdb.dbo.sysmail_start_sp END WAITFOR DELAY '00:00:10' TRUNCATE TABLE #Result INSERT INTO #Result ([queue_type], [length], [state], [last_empty_rowset_time], [last_activated_time]) EXEC msdb..[HoB_sysmail_help_queue_sp] SELECT @ResultOutput=[queue_type] FROM #Result WHERE [queue_type] = 'mail' IF @ResultOutput IS NULL BEGIN RaisError ('Unable to Start Mail Queue', 15, 10) END DROP TABLE #Result
Check for Failed Emails
We also need to check if we have any failed entries. In a healthy system, the failed count should be zero. In example #3 we are using the prior six hours as our sample, but this can obviously be adjusted as needed.
Example #3
DECLARE @ResultOutput BIGINT CREATE TABLE #Result ([mailitem_id] INT, [recipients] VarChar(MAX), [send_request_date] DateTime, [sent_status] VarChar(8), [sent_date] DateTime) -- passing parameter 3 to return failed emails only INSERT INTO #Result EXEC msdb.dbo.HoB_sysmail_allitems 3 SELECT @ResultOutput=COUNT(send_request_date) FROM #Result WHERE send_request_date > DateAdd(HOUR,-6,GetDate()) IF @ResultOutput > 0 BEGIN RaisError ('Failed eMails Have Been Logged in the Prior 6 Hours', 15, 10) END DROP TABLE #Result
Check for a Stalled Mail Queue
Finally, in example #4, we check for a stalled queue.
Example #4
DECLARE @ResultOutput BIGINT CREATE TABLE #Result ([mailitem_id] INT, [recipients] VarChar(MAX), [send_request_date] DateTime, [sent_status] VarChar(8), [sent_date] DateTime) -- passing parameter 2 to return unsent (and retrying) emails only INSERT INTO #Result EXEC msdb.dbo.HoB_sysmail_allitems 2 SELECT TOP 1 @ResultOutput=DateDiff(MINUTE,MAX(send_request_date),sent_date) FROM #Result GROUP BY send_request_date, sent_date ORDER BY MAX(send_request_date) DESC IF @ResultOutput > 15 BEGIN RaisError ('Stalled eMails in Queue for Greater than 15 Minutes', 15, 10) END DROP TABLE #Result
How We Can Alert While Mail is Down?
Obviously, we can’t use SQL Server directly to tell us that SQL Server can’t talk to us… so we have to be more creative.
Thankfully, PowerShell also has a simple mail interface that can be called from within SQL, as shown below.
Example #5
$computer = hostname Send-MailMessage -From "noreply@company.com" `
-To "DBAGroup@company.com" ` -Subject "$computer - Database Mail Issues Have Been Detected" ` -Body "Database Mail Errors Have occurred - Execute these Commands to review SELECT TOP 100 * FROM [msdb].[dbo].[sysmail_event_log] ORDER BY 1 DESC SELECT TOP 100 * FROM [msdb].[dbo].[sysmail_allitems] ORDER BY 1 DESC" ` -SmtpServer "smtp.company.com"
Pull It All Together as a Complete Solution
So the final step is to review these examples, create a SQL Agent Job, and configure the on-failure step to run the PowerShell email step.
Here is how that should look to handle the RaisError logic:
The T-SQL used to create this job is listed below.
USE [msdb] GO BEGIN TRANSACTION DECLARE @ReturnCode INT SELECT @ReturnCode = 0 IF NOT EXISTS (SELECT name FROM msdb.dbo.syscategories WHERE name=N'[Uncategorized (Local)]' AND category_class=1) BEGIN EXEC @ReturnCode = msdb.dbo.sp_add_category @class=N'JOB', @type=N'LOCAL', @name=N'[Uncategorized (Local)]' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback END DECLARE @jobId BINARY(16) EXEC @ReturnCode = msdb.dbo.sp_add_job @job_name=N'Monitor Database Mail', @enabled=0, @notify_level_eventlog=0, @notify_level_email=0, @notify_level_netsend=0, @notify_level_page=0, @delete_level=0, @description=N'No description available.', @category_name=N'[Uncategorized (Local)]', @owner_login_name=N'sa', @job_id = @jobId OUTPUT IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Check Queue', @step_id=1, @cmdexec_success_code=0, @on_success_action=3, @on_success_step_id=0, @on_fail_action=4, @on_fail_step_id=4, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'TSQL', @command=N'DECLARE @ResultOutput VarChar(64) CREATE TABLE #Result ([Status] VarChar(64)) INSERT INTO #Result ([Status]) EXEC msdb..[HoB_sysmail_help_status_sp] SELECT @ResultOutput=[Status] FROM #Result IF @ResultOutput = ''STOPPED'' BEGIN EXEC msdb.dbo.sysmail_start_sp END WAITFOR DELAY ''00:00:10'' TRUNCATE TABLE #Result INSERT INTO #Result ([Status]) EXEC msdb..[HoB_sysmail_help_status_sp] SELECT @ResultOutput=[Status] FROM #Result IF @ResultOutput = ''STOPPED'' BEGIN RaisError (''Unable to Start Mail Queue'', 15, 10) END DROP TABLE #Result ', @database_name=N'msdb', @flags=0 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Check for Failed emails', @step_id=2, @cmdexec_success_code=0, @on_success_action=3, @on_success_step_id=0, @on_fail_action=4, @on_fail_step_id=4, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'TSQL', @command=N'DECLARE @ResultOutput BIGINT CREATE TABLE #Result ([mailitem_id] INT, [recipients] VarChar(MAX), [send_request_date] DateTime, [sent_status] VarChar(8), [sent_date] DateTime) -- passing parameter 3 to return failed emails only INSERT INTO #Result EXEC msdb.dbo.HoB_sysmail_allitems 3 SELECT @ResultOutput=COUNT(send_request_date) FROM #Result WHERE send_request_date > DateAdd(HOUR,-6,GetDate()) IF @ResultOutput > 0 BEGIN RaisError (''Failed eMails Have Been Logged in the Prior 6 Hours'', 15, 10) END DROP TABLE #Result ', @database_name=N'msdb', @flags=0 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Check for Stalled Queue', @step_id=3, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=0, @on_fail_action=4, @on_fail_step_id=4, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'TSQL', @command=N'DECLARE @ResultOutput BIGINT CREATE TABLE #Result ([mailitem_id] INT, [recipients] VarChar(MAX), [send_request_date] DateTime, [sent_status] VarChar(8), [sent_date] DateTime) -- passing parameter 2 to return unsent (and retrying) emails only INSERT INTO #Result EXEC msdb.dbo.HoB_sysmail_allitems 2 SELECT TOP 1 @ResultOutput=DateDiff(MINUTE,MAX(send_request_date),sent_date) FROM #Result GROUP BY send_request_date, sent_date ORDER BY MAX(send_request_date) DESC IF @ResultOutput > 15 BEGIN RaisError (''Stalled eMails in Queue for Greater than 15 Minutes'', 15, 10) END DROP TABLE #Result ', @database_name=N'msdb', @flags=0 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobstep @job_id=@jobId, @step_name=N'Send eMail Alert on Failure', @step_id=4, @cmdexec_success_code=0, @on_success_action=1, @on_success_step_id=0, @on_fail_action=1, @on_fail_step_id=0, @retry_attempts=0, @retry_interval=0, @os_run_priority=0, @subsystem=N'PowerShell', @command=N'$computer = hostname Send-MailMessage -From "noreply@company.com" ` -To "DBAGroup@company.com" ` -Subject "$computer - Database Mail Issues Have Been Detected" ` -Body "Database Mail Errors Have occurred - Execute these Commands to review SELECT TOP 100 * FROM [msdb].[dbo].[sysmail_event_log] ORDER BY 1 DESC SELECT TOP 100 * FROM [msdb].[dbo].[sysmail_allitems] ORDER BY 1 DESC" ` -SmtpServer "smtp.company.com" ', @database_name=N'master', @flags=0 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_update_job @job_id = @jobId, @start_step_id = 1 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobschedule @job_id=@jobId, @name=N'Every Hour', @enabled=1, @freq_type=4, @freq_interval=1, @freq_subday_type=8, @freq_subday_interval=1, @freq_relative_interval=0, @freq_recurrence_factor=0, @active_start_date=20181024, @active_end_date=99991231, @active_start_time=0, @active_end_time=235959 IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback EXEC @ReturnCode = msdb.dbo.sp_add_jobserver @job_id = @jobId, @server_name = N'(local)' IF (@@ERROR <> 0 OR @ReturnCode <> 0) GOTO QuitWithRollback COMMIT TRANSACTION GOTO EndSave QuitWithRollback: IF (@@TRANCOUNT > 0) ROLLBACK TRANSACTION EndSave: GO
Conclusion
As we have show in parts one, two and three of this blog series, while Database Mail is typically a “set it and forget it” process, there are still considerations and proactive steps needed to ensure the health of your Database Mail and, big picture, the functionality of your SQL Server.