At a client where I am onsite on a regular basis, I decided to take advantage of the slower than normal activity workload during the holidays, and take a look at some of the applications/systems traversing the infrastructure. I wanted to locate systems that might be exhibiting symptoms or problems, even though users were not placing help desk calls for these systems.
Identifying the problem
In looking at the main through-points on the network, I was able to uncover several misbehaving applications/systems. Some of these applications/systems were merely misconfigured, while others contain poorly written modules.
I made up a list of the top five worst offenders, and worked with the appropriate IT group in each case to make changes required to alleviate the problems.
Brief Synopsis of the issues
1) One of the systems on the network had 30 wireless access points (of the more than 72 on this system) attempting to connect to a management server at a rate of over 11,000 connections an hour (about 3 connection attempts a second). This might not sound particularly egregious at first blush, but the caveat to this one is the server refused these connection attempts, stating that it did not have a ‘listener’ for the server to connect to.
2) The disk backup system in use at the client also exhibited a problem where the backup server refused connections from approximately 52 clients resulting in 5,500 unanswered connection attempts each hour of the day. Just as in the case above, the server stated it did not have an appropriate ‘listener’ running on the server.
3) One of the servers on the network, used for discovery and inventory purposes, needed route table configuration changes, after receiving more than 1000 route redirects during the course of the analysis.
4) The server used to manage the printing infrastructure required updating due to having more than 1300 queries to non-existing services/printers on the network. It was found to be making more than 15,000 calls each day (using snmp and netbios).
5) A specialized vehicle tracking system was found to be contacting multiple systems with data requests to servers that could not respond. This was occurring hundreds of times a day, and was taking up resources from the other servers.
Problem resolved
In each of the cases, further investigation was conducted on the configuration and/or architecture of the system and its applications. In each case, configuration changes were made where possible. Some of the systems required communication with the vendor’s software development team requesting a code modification for specific module of the code. Among other items, the subsequent changes resulted in 1) reducing the overhead for the affected servers/computers required for processing these packets, and 2) removing two gigabytes of LAN/WAN traffic each month for the non-needed data produced by these five systems.
