The company where I currently work produces a software solution that relies on SQL databases to store its information. A lot of the issues I encounter on a daily basis have to do with performance: how come a certain module is taking so long to open or why is a certain procedure running long?
Issues with performance are exciting. In a way, the process of finding optimal performance is a never-ending hunt. Performance can be improved in a multitude of ways: by enhancing the manner in which the software’s graphic user interface interacts with the underlying tables, changing the indexes used on certain tables, rewriting the queries used to access information, etc.
The possibilities are endless!
But I’m getting ahead of myself. With this post, I hope to focus on the method I use to determine the source of a performance issue. Thus, I’ll be walking you through some of the steps I take when troubleshooting such issues. I hope that these examples can be of use to you.
I was trying to open a certain module in the application today and noticed that the process was running a bit longer than usual. Do you know what could be the cause?
So, as copied above, I received a fictional e-mail from my manager this morning. It seems that he’s having trouble opening a certain module in the application. The module is able to open, but it takes a long time to do so–longer than it used to take.
Given that I am familiar with the make-up of this application, I know several things and can begin my troubleshooting there. These are my assumptions:
- The software is built on SQL Server databases.
- Information within the software is accessed using queries that collect information from the SQL Server databases.
- This is not an issue with the configuration of the software given that opening this module is a routine operation.
With this in mind, I formulate my plan of attack. I decide on the following:
- Recreate the issue on my local machine.
- While the issue is occurring, conduct a review of the SQL processes.
- Dig deeper, as necessary.
Step 1: Recreating the Issue
Given that my manager and I are able to connect to the same servers and access the same user account, I am able to easily recreate the issue on my machine.
In other instances, this might be harder to do. You may need to screen share with the user and watch as they recreate the issue or, in the case that the issue is very unique, have them provide you with a series of detailed steps on how they ran into the issue.
For those short on writing, I recommend using Screencast-o-Matic for quick and easy recording of your desktop. No installation necessary!
Step 2: Initial Backend Review
Once the issue has been recreated, the first thing I like to do is run the following command in SQL Server Management Studio:
You can run this command on any database. The output provides you with a record of all active SPIDs on your server. When running this, I’m interested in finding which SPID and process is associated with the issue I’m witnessing and whether there are any blocks in the system (represented by SPID values in the “BlkBy” column). This is my output from running the command:
The first thing I notice is that there are no blocks occurring in my system.
Next, I try to discern which activities are of interest to me. I identify them by the information reported in the HostName, DBName, and LastBatch columns. I am looking for SPIDs initiated from my machine (MNACHUM), in the “AdventureWorks” database, and that were launched about five minutes ago. SPID 56 is the one I identify as corresponding to my issue. SPID 55 comes close, but is actually connected to the sp_who2 command I just launched in SSMS; I am able to discern this fact by noting that the last batch for that SPID was executed just seconds ago.
As reported in my query output, SPID 56 is SUSPENDED. This might be the reason why my manager’s application is taking a long time to open the module. I do some more digging by running the following diagnostic command:
The value in the parentheses references the SPID for the activity of interest. The command, when run, returns information about the last command sent out by that particular activity. This is my output:
From this, I can discern that the issue is occurring with a particular stored procedure, dbo.sp_run_a_while. However, this might be a long procedure with many moving parts. To troubleshoot the exact cause of slow performance within the stored procedure, I’ll need to dig deeper.
Step 3: Digging Deeper
So, I’ve determined that the slowness witnessed by my manager is due to a particular stored procedure. But what part of it?
If I am able to open the stored procedure, I could review it and look for a command similar to the one associated with the activity SPID reported by my sp_who2 command (if you go back and review the screenshot from earlier, you’ll notice that the associated command for SPID 56 is WAITFOR).
However, I don’t consider this method particularly efficient, since it requires scouring through the code of the stored procedure. At times, these procedures can be thousands of lines and have many similar commands.
A better procedure to follow is to do one of two things:
- Query the sys.dm_exec_requests table with a CROSS APPLY on the sys.dm_exec_sql_text table to determine the last command that was run by the stored procedure.
- Set up and run a trace on the stored procedure to determine where it is getting hung up.
For the first option, you can use the following code I developed:
SELECT sql_handle, statement_start_offset, statement_end_offset, plan_handle, total_elapsed_time, query_hash, text FROM sys.dm_exec_requests CROSS APPLY sys.dm_exec_sql_text(sql_handle)
The last column (text) of the output specifies the last command initiated by any of the active SPIDs on your server. Once again, be discerning in recognizing which of the rows actually corresponds to your particular SPID. In my case, it is the second row:
I can expand the column or copy and paste the text into the editor to review it further.
Alternatively, I can set up a trace to see where the procedure gets hung up. To do so,
- From the top menu options, navigate to Tools –> SQL Server Profiler.
- In the new window, connect to your server.
- Open a new trace and configure it with the following Events under the Event Selection tab (make sure to check the “Show All Events” and “Show All Columns” options in the bottom right-hand corner to access these events):
- Maintain the default settings
- Under the Stored Procedure heading, select “SP: Completed”, “SP: Starting”, “SP: StmtCompleted”, “SP: StatementStarting”
- Under the TSQL heading, select “SQL: StmtCompleted” and “SQL: StmtStarting”
- Open the Column Filters window by selecting the option in the bottom right-hand corner and filter on the following items:
- ApplicationName: should be like the one for your SPID (i.e. MNACHUM)
- DatabaseName: should be like the one for your SPID (i.e. AdventureWorks)
- Duration: helpful for issues with slow performance, set to “Greater Than 4000”
- ObjectName: should be like your stored procedure (i.e. sp_run_a_while)
- Start the trace.
In order to capture the long-running event, you will need to recreate the issue again while the trace is running.
When I did so in my environment, I found that the command within the stored procedure causing my manager slow performance was
WAITFOR DELAY '15:00';
Now that I know what’s causing the slowness I can communicate with the writers of the stored procedure and begin discussions as to whether the command is necessary and what can be done to improve its efficiency.
(Obviously, a WAITFOR command by nature is inefficient in this kind of setting. Maybe we’ll explore the troubleshooting of more complex queries in a different post.)
A big thank you to my colleague, Maxwell, who trained me and guided me on a lot of these troubleshooting techniques.