Thursday, August 21, 2014

It's The Network! (why network engineers get so much experience troubleshooting)

After I'd been on the job for a while I began to notice a disturbing trend - the network gets blamed for a lot of problems.  At first I thought it was something unique to my company and our IT staff, but I have learned that this is a common occurrence.  Every few months a vendor will come in trying to sell us some new-fangled network monitoring tool, and the opening pitch is always something like this:

"Are you tired of having to defend the network all the time?  With (insert product name here) you can instantly PROVE that it's not the network causing problems, and refocus your troubleshooting on finding the REAL cause!"

The fact that a market exists for such tools, and that pretty much every vendor chooses the same pitch to get network engineers to buy them, tells me that this problem is widespread.  It's very common for server and application administrators to blame the network when their systems aren't working the way they expect, and this attitude is also seen in management as well.

There are a number of reasons why this is so - I'll list and comment on some of them here, and later blog posts will explore them in further detail.

  1. The word "network" means something different to network engineers and to pretty much everyone else.  To a network engineer, the "network" is a collection of routers, switches, firewalls, and VPN devices.  When we're feeling generous it may also include other devices that can alter or affect traffic flow - security devices like IDS/IPS, load-balancers, etc.  But to many people the "network" is defined as "everything other than the system I'm responsible for."  This means that if a server or application administrator is having a problem and they don't see something wrong with their own systems (and frankly, they may not know how to look), they are going to toss it over the wall to the network team.
  2. Some parts of the network are designed to block traffic.  It's true - the very definition of a "firewall" is a system that blocks everything by default, and only allows traffic by explicit exceptions.  And intrusion prevention systems can interfere with traffic that fits (or fails to fit) a particular profile.  Which leads us to this little gem:
  3. Sometimes, it really IS the network.  No part of a large IT system is immune to problems.  Firewalls may be blocking traffic if a network engineer has failed to correctly configure a necessary exception (or if the application owner has failed to request it).  IPS systems can mis-identify traffic as malicious.  Switch ports, line cards, and routers can have hardware problems, and as with any software, the operating code on these systems can be buggy.  And that leads to yet another item:
  4. It was the network last time, so it's the network this time.  I call this The Problem of Experience.  If a server or application admin has ever been the victim of a missing or misconfigured firewall rule or a bad IPS signature or a flaky switch port, the next time they have any sort of problem they're more likely to conclude that the network is causing it THIS time, too.
  5. The network team has unique powers of observation.  In addition to our ability to look at our own systems - our switches, routers, firewalls, VPN devices - we network engineers can also look at traffic.  We are usually the folks who own and operate the packet capture and analysis devices - which makes a certain kind of sense given that we have to configure the network to copy traffic to them.  Even when someone is kind enough not to actually blame the network, they often come straight to the network team for a "sniff" (and some expert assistance with the analysis) as a shortcut to resolving their issues.
  6. We're good at troubleshooting.  I addressed this briefly in my introductory post on this blog, but it comes down to one of those self-reinforcing cycles.  We get lots of problems so we develop skill at solving problems, good at checking our own systems first and then tackling other people's issues, and then because we're good at it, we get asked to do it some more, so we get better at see where this goes, right?
So if you're a network engineer wondering if it's just you, or just your company or your admins or your users...the answer is "No."  We get the same thing everywhere - if there's a problem, someone is bound to blame the network.  If you're lucky you will survive long enough to develop some skill at solving problems, and if you're really lucky you will eventually convince the people around you that it's not always the network.  But don't hold your breath waiting for them to stop asking for help. 

1 comment:

  1. Great write-ups. Might need to do one for "It's the Circuit"!! LOL.