James V. Fields: 2014

Saturday, November 15, 2014

Order-taking vs. Selling (or maybe just a bad metaphor)

In the world of sales, there is a big distinction between being (just) an order-taker and being a salesperson. The distinction comes down to just going through the motions (order-taker) and going out prospecting and nailing down real sales (salesperson). I am not entirely sure where I heard it first used in this way, but the term "order-taker" can also be applied to people in other professions, and generally refers to doing a very passive (sometimes passive-agressive) and reactive job, rather than figuring out what your customers need and giving it to them.

My team does networking - architecture, engineering, installation, maintenance, etc. We also manage all the network firewalls in our company. Most firewall requests are generated by other people, and come to us in the form of work tickets. The on-call person from our team has to do these and it's something no one looks forward to doing - which is why it makes a perfect example for demonstrating the order-taking mentality.

In general we tend to think of a firewall rule as a pretty simple set of four things - a source IP address, a destination IP address, a protocol, and a port number - with some obvious exceptions, that set of details makes up the vast majority of the rules we create. It would seem like simplicity itself for the requester to gather these items for us and properly document the request, and that's what we want - we basically want to get a "form" with everything filled out, so we can just go execute the request.

The sad fact is that it's just not that simple in the real world, and a lot of the requests we get can't be executed "as is". Here are just a few of the issues:

The requester doesn't understand ANYTHING about networking and doesn't know what an address, protocol, or port is
The requester doesn't know how to use the order tool (in our environment it's a sort of web order form for technical services)
The requester asks for something impossible (some parts of the network can't talk to one another for technical reasons)
The requester asks for something inappropriate (we have rules about what we let in and out of the network, you know?)
The request is rejected by the security department (they have to sign off on these) or the change management department

The question of whether a highly paid professional network engineer is reduced to being an order-taker comes down to the manner in which he/she handles these kinds of challenges. Here are some examples of order-taker behavior I have seen more than once in my career:

Engineer knows the request can't be worked due to incorrect or impossible combination of items - but instead of contacting the requester proactively so they can correct the issue, just doesn't do the work and allows the change to expire
Engineer refuses to explain anything to requester who clearly needs a little education (sometimes with a snide comment about how dumb "those people" are)
Engineer goes ahead and implements an impossible rule (such as a firewall rule that doesn't work because the network routing doesn't ever bring the traffic to the firewall) because hey, they were dumb enough to request it and the security guys were dumb enough to approve it, right?

I absolutely understand on the deepest personal level this kind of behavior. We're all overworked, the firewall changes are already a pain in the ass, and we'd all rather be doing something fun like building out a new network. But those who rise to the level of networking professional (I like that better than salesperson) will do the following, or something like it:

If the requester clearly doesn't know what something is or how something works, the professional will take the time to explain it
If the requester asks for something that will never work, the professional will be proactive in pointing this out, will find out what the requester was trying to accomplish, and help them refine the request so that it represents what they actually needed
The professional will contact the security team if something has been approved which should not have been, because even the security guys can make mistakes, and the company's security is too important to play games

I think the above examples illustrate what I'm talking about. In our work as network engineers, we can be order-takers - waiting for someone to tell us what to do, passive-aggressively carrying out stupid or impossible requests, and refusing to take an interest in our customers....OR, we can try to be more than technicians - we can rise to the level of networking professionals, an integral and indispensable part of making our companies successful.

I know which I'd rather be.

Sunday, October 19, 2014

Life As A Cordcuttter

A few years ago, tired of paying a ton of money for satellite TV channels that I never watched, I "cut the cord" - canceled my pay TV subscription and began getting my TV fix through alternate means. I had high-speed internet and no intention of giving that up, so it made sense to stream as much as possible over that connection. Eventually I also installed a good-old outdoor aerial antenna to pick ip HD broadcasts from the closest major city.

Over the years I have used several "set top" solutions for my streaming, subscribed to several of the major streaming video providers, and played with a lot of additional software solutions for finding content. I'm not going to describe everything there is to know about cutting the cord (not like I know it all anyway) as there are several good web sites that do that. I'm going to tell you what works for me, what I like and don't like, and offer some tips should you choose to try this for yourself.

There are a couple of caveats I need to list right up front. First, if you are a real sports fanatic, you probably won't want to get rid of your cable or dish subscription just yet. Legal avenues for streaming sports - especially live sports - are mostly nonexistent, and the ones that do exist are often tied to having a cable or dish subscription. There are some exceptions, MLB.tv being one of the standouts. There are also ways to watch live sports streams which are of questionable legality, and which frequently suffer from poor quality.

Second, your internet connection needs to be stable and relatively fast. 5 megabit DSL is about the lowest you want to go for this, 10 - 15 is better, and to take advantage of Netflix's ultra HD streaming (4K HD) you will need 20 - 25 megabits per second of bandwidth.

Equipment - I use the Roku 3. This little box retails for $99, although you can find deals on it pretty much everywhere. You might want to look for it on Woot where refurbished models are frequently offered for $65. The Roku 3 features up to 1080P video streaming, both wired and wireless networking, and a nifty remote control with a built-in headphone jack (so you or your partner can watch TV while the other sleeps in peace).

The Roku has channels for just about every major video service available - Netflix, Amazon Prime, Hulu Plus, Vudu, and many others. It's about as simple to use as any such device can be.

The biggest downfall to the Roku is that it isn't a full-fledged computing device, so it can't readily browse content over the local network, and one majorly-helpful piece of software - XBMC - won't run on it. All isn't lost, though - PLEX comes to the rescue. Plex is a media server you run on your PC, cataloging all your local content (movies, TV shows, music, etc.) and downloading metadata. You install a Plex client from the Roku channel store and use it to pull content from the PC. Plex also has its own "channels" you can install, and these fill in gaps that exist both on the Roku and the Hulu Plus service. For example, the Roku has no YouTube channel of its own, but Plex provides one. And while CBS television is the only major broadcaster not present through Hulu Plus , there is a CBS channel that allows you to stream most of the current shows.

While there are many other equipment choices available, if I were looking to buy something new today, I would consider the Amazon Fire TV - it's about the same price as the Roku, can stream all the same major sources, and there is a build of XBMC that will run on it.

The only other piece of equipment I'd mention is that I have an antenna outside that picks up live HD broadcasts of the major networks. The picture is crystal clear (better than I ever had with my small dish).

Content sources - I have subscriptions to Netflix, Amazon Prime, and Hulu Plus. Each has its strengths and weaknesses. Netflix has for some time been the reigning king of the streaming world, with a vast library of movies and TV shows. The content is basically what would be available on DVD or Blu-Ray - in other words, nearly always slightly older movies and previous seasons of television shows. They are offering some exclusive content these days as well. I have had Netflix on and off a few times over the years. I'm giving it another try right now, but mostly I am finding that nearly everything they have that I care to watch is available on Amazon Prime.

Amazon Prime is a movie and TV service similar to Netflix. For a long time it was a pretty distant second in terms of the size of the library, but Amazon is catching up - fast. More importantly, a Prime subscription gets you free 2nd-day shipping on much of what Amazon has for sale, access to their Kindle Lending Library with thousands of books, and now includes their music streaming service as well. It's an amazing deal, and if you had to pick between Amazon and Netflix, Amazon would be a worthy choice (unless you just HAVE to watch the stuff that's exclusive to Netflix).

Hulu Plus has movies, but not many, and they're usually older. What makes Hulu special is it is the only service with current-season episodes of many of your favorite shows. Hulu is a partnership between NBC, FOX, ABC, the CW, and UPN. Shows are usually available one day after the broadcasts air. The big hole in Hulu's lineup is CBS, and it doesn't look like this is going to change, and CBS has announced their intention to offer a separate pay service. As of now, you can watch CBS shows using the Plex channel.

XBMC - XBMC is a media center application that you can run on your PC or Mac, and on some set-top boxes as well. XBMC catalogs local content (similar to Plex, which is actually built on some of XBMC's code), and offers many other sources of content. XBMC is open-source software, and due to the number and type of add-on channels and plugins available, can be a bit daunting for people who aren't tech-savvy. But it's an amazing resource if you take the time to get it set up.

Monday, October 13, 2014

What A Week

Last week I was "on-call" for work. That meant I was responsible for watching our monitoring systems and problem queue, working problems as they arise if possible, coordinating efforts if it's something I need help resolving. The first couple of days were pretty slow, a couple of failed power supplies in systems with redundant power, no biggie.

Thursday I got a call that users in our Mechanicsburg office were experiencing a lot of performance degradation. A quick check of their primary MPLS circuit (from Level3) showed a lot of packet loss. We have BGP configured to switch them over to Centurylink if Level3 fails, but the circuit hadn't actually dropped, so we forced it - shut BGP to Level3, and opened a problem ticket with them.

A short time later, our monitoring tools reporting trouble reaching a router in Williamsport - another Level3 circuit, this time the backup circuit, normally only used when connecting to that one router. We began thinking Level3 was having a bigger issue. But before we could contact them to add the info to our ticket, we heard users in Harrisburg were having performance issues. Level3 again, and the primary circuit - so we shut BGP there, forcing them over to a backup circuit from Verizon. Finally we got the Level3 ticket updated with all the circuit information and waited for their response.

About 3:00PM a bunch of us were supposed to go out to celebrate a teammate's birthday. Right when I get to the bar, the phone rings - network admin requested to look at an application issue. So I went back in and launched into one of those 3-hour marathon sniffer sessions. Fun! I finally got out about 6:00PM and headed home.

On the way home I got a text message from Bank of America - fraudulent charge suspected on my debit card, please call or login to online banking to check. Peachy. As I walked into my house around 7:00PM, my cellphone rang - a guy at work who was going to swap some potentially bad GBICs on a fiber, wanted me to make sure we had traffic off the link.

I decided to call back from my landline because cell coverage at home is spotty. I picked up the phone, and...no dial-tone. Luckily I still had DSL service. I got logged in, called him from my cell, and got that one worked.

In the meantime I opened a chat session with the phone company's tech support. They wanted me to swap phones or try the test jack outside the house. No good - I didn't have a spare phone, and the one I did have was a cordless that requires power for the base station. I would have to wait until I could get another phone on Friday to find out if it was my problem or the phone company.

Finally I logged into BoA's web site. Yep, somebody tried to access my account from a Publix supermarket down in Florida. Of course as soon as I marked the charge fraudulent, BoA promptly canceled my debit card and notified me it would be 5 - 7 days to get a new one. You just have to love the modern world, right? I checked my wallet - $5 cash, maybe with that and the change I keep in the jar at work I would be able to eat on Friday.

Friday morning, we had an email from Level3 waiting for us. They had found a problem with a core router serving a bunch of their customers in the northeast, and routed around it. After talking it over with my director and teammates, we decided to keep Mechanicsburg and Harrisburg on their backup circuits for the day and watch the Level3 circuits. If everything held up we would re-enable BGP over Level3 sometime Friday night.

Two hours later the Verizon circuit to Harrisburg died. Just plain died. And with BGP shut over the Level3 circuit, they were cut off completely. We dialed into a modem on an emergency backup router and got BGP going again on Level3 to get them back online. Total time of that outage was maybe 5 minutes.

Friday afternoon rolls around and I got talked into trying another social outing. But just when it was time to leave, I got asked to look at another issue - a file transfer running over a point-to-point circuit between Florida and Pennsylvania was running slow. In fact, it had been running slow all week, but no one had asked for help until Friday afternoon. AAAUUUGGGHHH! So another night not getting off until 6:00PM, not getting home until 7:00PM. And to make it more interesting, it looked like there was packet loss going from us to the remote site - on a Level3 circuit. Not MPLS, true, but another Level3 circuit in Pennsylvania? They claimed to have routed around their other issue, but at this point we were getting gun-shy about putting anything else on their network if we didn't have to (Harrisburg notwithstanding).

On the way home I stopped at Target and bought a plain-old telephone that doesn't need external power. When I got home I plugged it in inside the house - no dial-tone. I took it out to the box outside - no dial-tone. Ok, it's the phone company's problem. I went in to do another online chat session with tech support, but now I had no DSL.

I got on the cell phone to call the phone company and halfway through one of the half-dozen prerecorded messages, the call dropped. I dialed back, worked my way through the menus - and got dropped listening to the same message. Now, they say that doing the same thing over and over and expecting a different result is one definition of insanity. I must be insane, because I tried a third time. And got dropped during the same message. Finally I called in and just kept hitting "0" on every menu and eventually got a live person. Of course, all they could tell me was they didn't see any trouble in my area, couldn't call my house phone (duh) and couldn't see any signal from my computer. That, and they couldn't send anyone to the house to fix it during the weekend unless I paid, otherwise I would have to wait until Monday for a visit from a tech (I was still on-call for the weekend), and I would have to stay home from work to meet the tech or they wouldn't come (despite the fact that the issue was clearly NOT inside me house).

So today is Monday. The tech came. They had moved my circuits last week to a new switch and somehow failed to configure my service.

The good news is, I'm not on-call again for about 7 weeks.

Yeesh!

Bloody Turnips

“You can’t squeeze blood from a turnip.” This old saying is a way of expressing that some things are so obviously impossible that they aren’t worth trying, that they are a waste of time. But sometimes the problem isn’t that we’re trying to squeeze blood from a turnip - the problem is assuming that we’re looking at a turnip in the first place.

The other day I got “the call.” “The call” usually comes late in the day, and frequently on a Friday. It’s when someone has been working at a problem all day, or all week, realizes they are running out of time, and in a last ditch effort at a resolution they ask for a network admin to take a packet trace. And I’m the person that frequently gets “the call.”

This time it was an application which picks up files from a server, the application was locking up, and the people troubleshooting it explained that this is frequently a sign that there was a delay in picking up the files (this application was said to be super time-sensitive). Server admins had found nothing wrong on the file server. I was asked to see if there was anything causing network-based latency, or if I could at least see something in the trace that might account for the issue.

I have to admit that I did not approach this problem with any enthusiasm. I have a life. I do not like getting called at 3:00PM to start a multi-hour troubleshooting session on something this vague. But it’s part of the job, these were my customers, and apparently nobody else was making any headway (including the vendor of the application, who had been called in to work on it).

Now despite being pretty good with the sniffer - and sometimes enjoying the challenge - I know that it can be a hard way to get to the root of a problem, so I made an effort to do things the easier way. I asked the usual questions - when did the problem start, did something change, could I get a more technically accurate description of the problem, etc. I looked at the basics - located and checked for errors on the switch ports of the file server and application system and so forth. And then, reluctantly, I fired up the sniffer and got started.

About an hour into the session, one of my teammates came up to watch, and he asked the obvious question - “Do you really think you’re going to find the problem by looking at the packet contents?” He was, in essence, asking me if I was trying to squeeze blood from a turnip. And honestly I did not know how to answer him.

It’s something I’ve thought about often over the years. I am very interested in troubleshooting - the thought processes that go into it, the practice of it, the techniques that are used. I think that the act of trying to reverse-engineer an application by staring at the sniffer until it feels like my head is bleeding is a really hard way to do things. But while I have not come up with a lot of amazing answers to those questions, I have learned one thing:

I can’t solve a problem if I don’t try.

There are a lot of times it feels like I’m squeezing a turnip. But the truth is I don’t know what I’m squeezing. It’s like sticking my hand in a bag and grabbing something, and squeezing it, and after a long time I get some blood out of it - in which case I find that it wasn’t a turnip. And sometimes I get nothing but a turnip guts.

So I just said to him - “I have no idea.” And I kept on squeezing.

I’d like to conclude this post by telling you about the amazing discovery I made in the packet trace. Unfortunately that didn’t happen. What did happen is I was able to determine that when the application freezes up, it isn’t waiting for anything from the file server. The application was getting a response that looked “complete” (for you packet monkeys, it had the PUSH flag set on the last packet of the response), the application system responded with an immediate ACK, and then sat there for a long time before doing anything else. Then the application system sent a packet and things started up again. I saw this happen multiple times during “freezing” episodes.

What does it mean? Well, it means the problem isn’t a delay in getting information from the file server. There could be a problem in the contents of the response, and being unfamiliar with the application itself I couldn’t speak to that. Or there could be something happening on the application system causing it to freeze that has nothing to do with the network traffic.

This information didn’t solve the problem for the application folks. It did get the file server admins off the hook, and it pretty well proved the network infrastructure wasn’t at issue, and it gave the application admins and their vendor a little push in the direction of looking at their own system a little harder. I hope it helped.

If there is a message here, it’s this - troubleshooting can be a painful, frustrating, and sometimes ultimately unrewarding process. Problems can be really complicated, the tools can be hard to use, and the whole thing can just be a lot of work. Even when you try your best you don’t always come up with a big win. But if you don’t try, you don’t stand a chance. I think a lot of people - including a lot of network people - think that problems can't be solved with a sniffer, or maybe that they can't solve them, so they don't try. All I can say is, I've done it often enough to know it's not impossible. Working a problem with a sniffer isn't always fruitless. So the moral of the story?

Keep squeezing.

Monday, August 25, 2014

What's The Problem, Anyway?

The first step in troubleshooting a problem is knowing that you have one. Hopefully you have some sort of monitoring system in place that can alert you to the existence of a problem in a timely manner. Unfortunately this isn't always the case, and problems are reported to us by users, system or application administrators, or in the worst case by customers.

Once we know there is a problem, the second step is to get a clear description of the symptoms (which will hopefully lead us to an actual technical definition of the problem). And herein lies one of the biggest headaches for a troubleshooter, because the reports we get are often vague, inaccurate or misleading. An important skill for the troubleshooter is therefore the ability to extract accurate information from the people reporting the problem, to get detailed descriptions, and weed out what is just plain wrong.

There are various reasons why we can't simply trust early problem reporting, some of which has to do with exactly who is making the report. In particular, getting people to concentrate on describing the symptoms rather than jumping to conclusions can be a real chore. Here are some common issues I see frequently with problem reporting:

End users frequently tend to describe what they feel rather than what they see, and to generalize - a lot. Descriptions such as "Everything is slow" are common. Users who can't get to a specific web site sometimes report that "The Internet is down."
People who have experienced one kind of problem in the past sometimes think that every new problem is the same as the old one. A recent example occurred at my office when the users of an externally hosted web application experienced extreme slowness and broken app sessions due to packet loss along the path to the external hosting site. A couple of weeks after that was resolved, there were problems with a server hosting the application, and it was reported to my team that the network problem had come back, despite the fact that the symptoms were different (and that users were getting server-side error messages displayed onscreen).
There may be one or more "human layers" between the people with the problem and the people troubleshooting, and they can muddy the waters. For example, many of our problems come to us by way of a helpdesk which takes problem calls. They provide a vital function, but inexperienced or untrained personnel may not ask the right questions, or they may provide their own interpretation before passing along the report.
People sometimes report inaccurate information, and once it's been reported it may be hard to correct. In the case described above where users were having trouble with performance of an external site, a manager who received initial complaints from his users concluded that only users of Windows XP and older versions of Internet Explorer were affected, but that users on Windows 7 and newer browsers were fine. This incorrect information was the result of failing to gather enough data before calling the helpdesk - but it went into the ticket. The problem got kicked around for several days by other areas before landing in my team's lap, but although the manager had learned during that time that his Windows 7 users were indeed affected, that information never made it into the problem ticket. Our team started the troubleshooting process with inaccurate information.
People sometimes think they already know what the problem is, and try to lead the troubleshooter to a particular conclusion that may not be warranted. A lot of the problems that come our way start out like this: "We need you to check the firewall, our app server can't get to the database server." Of course it may be the firewall, but troubleshooters who allow themselves to be led this way often lose precious time following false trails.

It's difficult to always keep these issues from occurring, but a good troubleshooter knows the importance of getting an accurate description of the symptoms. Here are a few ways it's done:

Whenever possible, talk to the people experiencing the problem. I know a lot of IT people who just HATE this - we like having the helpdesk act as a buffer between us and our users or customers (who may be in a foul mood by the time they call in a problem). But the more layers there are between us and the people who are actually experiencing the issue, the harder it will be to make sure the right questions are asked.
Concentrate on the basics. What is the user doing when the problem happens? What application are they using? What web site are they accessing? What specific function within the application or site are they accessing? What is it supposed to do that it isn't doing? When did the problem start? Did is used to work and now it doesn't, or is it something we've never seen work properly? How many people are affected? Is the user aware of a change - a new operating system or browser, or maybe a patch that got pushed out? Did an application administrator push out a new code release?
Try to see the problem for yourself. Can you try to run the same application under the same circumstances as the user? Can you remotely access a workstation in the same place and do the same thing? Can you shadow or monitor a user's session so you can see what they see? If you can't see it yourself, can someone reproduce the problem and give you a description? Can you get someone to take a screen shot, or send you an error message from an application screen or from a server or application log? (On that note, it's best if you can avoid having people try to write down or type an error message, as it may not be faithfully transmitted to you - a screenshot or actual snippet from an error log is better).
Try to recognize the difference between a problem description and a conclusion drawn by someone else about the nature of the problem - in other words, try not to be led. If the problem report tells you what needs to be checked this should be an immediate red flag. It's especially difficult to avoid if you know the person making the report and you have some respect for their technical skills, but you need to think things through for yourself - which may mean getting the reporter to back up and walk you through the symptom. If they want to describe how they came to their conclusion, that's fine as long as you can resist the temptation to let them do your work for you.

This isn't meant to be rude or disrespectful, but remember this - problem reports can be wildly inaccurate or so vague as to be nearly useless. An important part of troubleshooting is to get a clear, accurate description of the symptoms. Without that, you're half-blind and may waste a lot of valuable time and effort on the wrong path.

Thursday, August 21, 2014

It's The Network! (why network engineers get so much experience troubleshooting)

After I'd been on the job for a while I began to notice a disturbing trend - the network gets blamed for a lot of problems. At first I thought it was something unique to my company and our IT staff, but I have learned that this is a common occurrence. Every few months a vendor will come in trying to sell us some new-fangled network monitoring tool, and the opening pitch is always something like this:

"Are you tired of having to defend the network all the time? With (insert product name here) you can instantly PROVE that it's not the network causing problems, and refocus your troubleshooting on finding the REAL cause!"

The fact that a market exists for such tools, and that pretty much every vendor chooses the same pitch to get network engineers to buy them, tells me that this problem is widespread. It's very common for server and application administrators to blame the network when their systems aren't working the way they expect, and this attitude is also seen in management as well.

There are a number of reasons why this is so - I'll list and comment on some of them here, and later blog posts will explore them in further detail.

The word "network" means something different to network engineers and to pretty much everyone else. To a network engineer, the "network" is a collection of routers, switches, firewalls, and VPN devices. When we're feeling generous it may also include other devices that can alter or affect traffic flow - security devices like IDS/IPS, load-balancers, etc. But to many people the "network" is defined as "everything other than the system I'm responsible for." This means that if a server or application administrator is having a problem and they don't see something wrong with their own systems (and frankly, they may not know how to look), they are going to toss it over the wall to the network team.
Some parts of the network are designed to block traffic. It's true - the very definition of a "firewall" is a system that blocks everything by default, and only allows traffic by explicit exceptions. And intrusion prevention systems can interfere with traffic that fits (or fails to fit) a particular profile. Which leads us to this little gem:
Sometimes, it really IS the network. No part of a large IT system is immune to problems. Firewalls may be blocking traffic if a network engineer has failed to correctly configure a necessary exception (or if the application owner has failed to request it). IPS systems can mis-identify traffic as malicious. Switch ports, line cards, and routers can have hardware problems, and as with any software, the operating code on these systems can be buggy. And that leads to yet another item:
It was the network last time, so it's the network this time. I call this The Problem of Experience. If a server or application admin has ever been the victim of a missing or misconfigured firewall rule or a bad IPS signature or a flaky switch port, the next time they have any sort of problem they're more likely to conclude that the network is causing it THIS time, too.
The network team has unique powers of observation. In addition to our ability to look at our own systems - our switches, routers, firewalls, VPN devices - we network engineers can also look at traffic. We are usually the folks who own and operate the packet capture and analysis devices - which makes a certain kind of sense given that we have to configure the network to copy traffic to them. Even when someone is kind enough not to actually blame the network, they often come straight to the network team for a "sniff" (and some expert assistance with the analysis) as a shortcut to resolving their issues.
We're good at troubleshooting. I addressed this briefly in my introductory post on this blog, but it comes down to one of those self-reinforcing cycles. We get lots of problems so we develop skill at solving problems, good at checking our own systems first and then tackling other people's issues, and then because we're good at it, we get asked to do it some more, so we get better at it...you see where this goes, right?

So if you're a network engineer wondering if it's just you, or just your company or your admins or your users...the answer is "No." We get the same thing everywhere - if there's a problem, someone is bound to blame the network. If you're lucky you will survive long enough to develop some skill at solving problems, and if you're really lucky you will eventually convince the people around you that it's not always the network. But don't hold your breath waiting for them to stop asking for help.

Welcome!

Welcome! You have found the blog home of James V. Fields (that's me, on the left there, looking just the right amount of suave and geeky all at once). I'm a network engineer for a semi-large company, but I spend a lot of time - a HUGE amount of it, actually - troubleshooting.

We have thousands of users, thousands of desktop computers and servers, more than a thousand pieces of network gear, and lots of network connections, public and private. We do a lot of in-house software development, as well as using plenty of off-the-shelf stuff.

When things go wrong I'm one of the people they call to figure out the what and the why. And not just network issues - I end up troubleshooting server configurations, application problems - you name it, I've probably had to work on it.

There are various reasons why this is the case, not the least of which is that I'm fairly good at it. It also helps that I don't shrug off problems or try to get people to leave me alone - I have a strong sense that regardless of where a problem lies, if I have the ability to help, then it's my duty to pitch in. I'm not really sure if I get called so much because I'm good at the work, or if I'm good because I get a lot of calls - I guess it's a little of both.

It has occurred to me from time to time that there are themes to the work - issues that crop up again and again. Some of these are technical issues, while others have more to do with the psychology of the whole thing, the approach to troubleshooting (or lack thereof as often happens to be the case). I turn this stuff over and over in my head. Sometimes my brain just gets tired of wrestling with it all, but sometimes I snag a little piece of "truth" about how we deal with complex problems, thus the blog - this is a place I can record my observations and revelations, and maybe try to put a little structure to them.

I'm not a professional blogger or writer, and it's hard to say where this will go. I don't know if anyone will see it, but if you make it here, feel free to comment or drop me a line - I'd love to hear from you.

James V. Fields