Avoid Using Those Troubleshooting Skills

When I re-read yesterday’s blog post on troubleshooting steps, I felt guilty. Instantly. Why? Because I had failed to mention something to you in that post and failed to follow advice I just gave in my SQL Rally presentation. *In my own defense, I did mention that one of the bullets I used in that presentation was hard to type – “We need to move away from ‘do as I say‘ “

I’ll come clean with it here, just don’t tell anyone…

That Starter Replacement I Was So Proud Of?

It didn’t have to wait until I had my family stuck in the truck. It didn’t have to wait until I had three kids ages five and under getting cranky strapped into their seats on a (apparently rare this year) hot New England day. Nope. If you read that post, you’ll notice I gave a quick tell when I said, “They agreed and added, ‘starter problems almost always start intermittently, probably was better in winter because of how the metal moves’….”

See It?

I started having odd issues long before it got to the point where one of the factors in my decision to replace it myself in my driveway was the fact that it was stuck here. I had stutters, situations where it felt like the starter motor was running longer (like I was grinding my keys after start), etc. I mentioned it in passing to a garage while it was in for other service in the fall, they couldn’t reproduce it (it was incredibly intermittent) and we dropped it.

So What?

Had I dug deeper, maybe brought the starter someplace (do they still test components like starters?) or just had a shop replace it (of course it would have cost more money and I wouldn’t have that satisfaction of getting my own hands dirty), I wouldn’t have had to break out the troubleshooting skills in a pinch. I wouldn’t have had to hassle the family, move car seats and kill 30 minutes of an otherwise good day for planting the lilies we were off to pick up.

For Our Day Jobs…

Troubleshooting skills are critical. I was clear on that in the previous post and all the referenced posts. Shoddy troubleshooting breaks servers, increases downtime, stresses everyone else out and kills customer relationships. So we need to really put some effort into better troubleshooting skills. BUT… We also need to be great at keeping on top of maintenance and best practices during the “rest” of the time.

Especially as DBAs… We’ll need good troubleshooting skills at some point in our careers. Even if we did everything perfectly, never missed a beat, kept our instances well oiled and tuned, we’d still have to fix some problem sometime. Why? Because we get blamed for it so it’s in our interests to show off our troubleshooting skills from time to time ;-)  But Seriously… We’ll have to fix problems from time to time. Let’s do our best to reduce the times we have to do urgent troubleshooting, though. Okay?

We can do things like test our restores, verify our backups, check database consistency, perform routine maintenance, check the vital signs of our instances and look for variances from those baselines, be tough with what we allow to wind up in production, lock down our production servers, do code reviews, etc. Really.. Just stay on top of the instances.

You may find something wrong that isn’t critical today (like my starter last fall when it started the odd behavior). I guess I’ll leave the final choice of handling up to you, though –

A.) Whoa… I was scared for a second there, looks good now. I’ll check it out sometime and watch for it again.

or

B.) Whoa. That could have been much worse and it may be much worse next time… Let’s have a look at why that happened and verify it won’t again…

I’ll give you a hint -  A is less expensive, quicker, easier, and feels right, today. B gets to be a lot more expensive, longer to figure out, more difficult and feels pretty miserable once the circulating fan blades make their destined contact with the solid waste in the future…

So I’ll leave you with the general thought I had in a recent SQL University contribution – How are your instances doing? Do you even know what they look like anymore? Go visit them, say hi to them and ask them how they’re doing.

Share

Tags: , ,

7 Comments

Leave a comment
  1. Amit Banerjee May 17, 2011 at 09:20 #

    Very good post and I always believe that “prevention is always better than cure”. What you invest in today to figure out will save you valuable time and $$ tomorrow. I am engaged in this kind of work daily where proactive measures to keep a problem from becoming a major show stopper tomorrow do go a long way in keeping the blood pressure levels of the parties involved at a controlled level!

    • Mike Walsh May 17, 2011 at 09:25 #

      Thanks for the comment. That old adage “An ounce of prevention beats a pound of cure” really does work. Still important to know how to respond and train to events (look at the US Air flight that had to land in the Hudson… Those pilots were prepared, the preventative maintenance was up to date, the plane was healthy until those birds decided they were too cold… Their troubleshooting skills, calm, methodical approach and training paid off in big dividends) but a lot of our problems are caused by ourselves not thinking of the future and trying to save now.

      • Tim Radney May 17, 2011 at 09:35 #

        Great blog post. I always enjoy reading your blogs.

        • Mike Walsh May 17, 2011 at 09:36 #

          Thanks Tim I enjoy reading your updates as well, sir.

  2. Stuco May 18, 2011 at 08:29 #

    Thanks for this good advice and the “bring it home” illustration.

    • Mike Walsh May 18, 2011 at 09:24 #

      Thanks for the comment. Glad you find it useful!

Trackbacks/Pingbacks

  1. Don’t Splint Your Database Server To Death | Straight Path Solutions, a SQL Server Consultancy - February 4, 2012

    [...] Avoid Using Those Troubleshooting Skills – Acquiring troubleshooting skills is an important endeavor for folks. But what if you handled your environments in such a way you needed them less and less? [...]

Leave a Reply