Thursday, June 4, 2015

Some Things Shouldn't Have To Be Hard

Part of my job involves supporting the network for a business unit with government contracts.  We have a connection to a private government extranet, over which our users connect to several websites required to fulfill the contract work.

Monday morning the senior director of IT operations for this business unit called me to say that his users couldn't log into one of these sites.  He had already been in touch with tech support for the site, and they had confirmed that it was up and running, and suggested we had a problem on our end.

I started my troubleshooting by logging into the perimeter router connecting to the private extranet, and saw that the connection was up.  Next I logged into a perimeter firewall and checked that there was live traffic passing in both directions - everything looked healthy there as well.

Finally I logged into a PC on the affected network and tried connecting to the external site myself using a web browser.  I was unable to connect.  Browsers these days do a pretty poor job of indicating what the problem is if a site can't be reached.  I was using IE 11, and it gave me a list of possible causes that covered just about every possible issue.

I decided to look up the IP address of the remote site so that I could trace the path through the network and double check firewall rules.  Using nslookup at the command prompt, I got a good indication of the problem right away - I was unable to resolve the IP address of the site.  My computer was configured to point to our internal DNS servers, which in turn forward certain domains to DNS servers located across the private extranet.  

Since I was unable to resolve the IP address, I suggested that we needed to get the on-call DNS administrator to check things out.  In the meantime we also started a conference call with the tech support people for the remote network.  While waiting for our own DNS administrator to join, I described the issue I was seeing.

The remote technician asked me, "Well, what did you change?"  I told him we hadn't made any changes.  He asked, "Did you do anything to your network connection?"  No, we hadn't.  "Did you make any firewall changes over the weekend?" was the next question.  No, we didn't.  I reiterated to the remote tech that our connection was up, everything seemed to be working, but we just couldn't get DNS resolution.

After a short while our local DNS admin joined the call.  In short order he confirmed that the DNS servers were working properly, no changes were made on our end, and we seemed to be getting "denied" messages back from the remote DNS server.  The remote tech repeated just about every possible iteration of the question about what WE had done to break things.

Only after more than an hour of this line of questioning did the remote technician finally reveal that the remote DNS servers had been changed over the weekend - completely replaced with entirely new devices.  It took a little longer, but it was eventually discovered that the new devices had a built-in ACL which was blocking our requests.  The old servers hadn't had this capability, and the ACL which the remote DNS admins had put in place didn't allow our servers to talk to theirs.

So riddle me this, Batman - you know you changed out your DNS servers, but when I call and tell you my DNS queries are being refused, you spend an hour making me repeatedly assert that I didn't change anything?  I lost two hours of my time, and more importantly my business lost two hours of productive work for dozens of users trying to fulfill their quota of work on a government contract because some bozo didn't want to admit that his change broke the system?  Priceless.

No comments:

Post a Comment