A client had developed an in house application that utilized data sources from multiple servers. It was an application that uses GIS data to plan trips, and was experiencing uncharacteristically slow application response times. The application had been running for several years, in various incarnations, and had been recently ‘tweaked’ by a software engineering contractor who had been hired after the lead developer left the organization five months earlier.
The application had been performing for the previous several months with an average response time 4.5 seconds. I had helped the client setup a testing environment using synthetic transactions to measure the response time, and we had plenty of supporting data for the prior time periods. The testing system was reporting times in excess of 30 seconds to perform the same function.
Identifying the problem
After discussing the code changes with the new software engineer, I began collecting data with the analyzer to determine the source of the latency. When looking at the response time between the client and the front end HTTP server, it appeared that everything in this space was performing as expected. The application, session, transport, etc. were all looking fine. Next, I moved the analyzer to the data path between the HTTP server and the Oracle™ database server. From here I was able to see several samples of a large delay between the two servers. A request would go out from the HTTP server, and a long period of time elapsed prior to the database server response. Continuing the troubleshooting required an examination of the rest of the systems architecture, so I needed to capture data between the Oracle™ server and the servers holding the GIS data. Now that I had data from all the data paths in this four tier client server application, I was able to begin the process of moving to identifying the problem.
As with any analysis, it requires one to take an iterative approach of piecing small pieces of the larger puzzle together. Once you have a few pieces, you interconnect them to determine if the entire puzzle is in front of you, or pieces are missing (requiring additional data collection and analysis). In this case, I pieced together the chain of data flowing from the client to the front-end HTTP server, next came the data from the HTTP server to the Oracle™ server, followed by the data from the Oracle™ server to the GIS server. In going through the tasks involved to deliver a valid response to the user, multiple sub-tasks are going on behind the scenes (reflected by the modules within the application) and this data can be seen using the analyzer. In this case, it was clear that the vast majority of the latency was between the HTTP server and the Oracle™ server, since the HTTP and GIS aspects were responding well.
In looking closely at the data traveling between the two servers, the HTTP server was making two separate SQL calls to the Oracle™ server. In looking at these two calls, the majority of the latency was coming from the second SQL call. The second SQL query took, on average, eight times longer to complete.
Problem resolved
I took my findings to the new software engineer, and he said there was a mistake in the analysis. He was adamant that there was only a single SQL query initiated from the HTTP server to the Oracle™ database. When I walked through multiple sessions captured during the analysis, I convinced him that indeed, there were two SQL calls going to database, instead of just one. I also convinced him that it was the second call that was causing the problem. Based upon this information (since we had the actual SQL syntax to search for in the code), he reviewed the code and located the module making the call. It soon became evident that the program was not supposed to be running the code in the module, and he had a logic problem to resolve.
Later that day, the software engineer made the appropriate changes to the code, and wanted a verification that the second SQL call was not being issued. Even though the response times returned to normal (averaging 4.5 seconds) after the change, we needed to verify that the second call was not being made. I reconnected the analyzer to the systems in question, ran several tests, and concluded that the code was running properly (not placing the second SQL call).









