(Disclaimer: I am not a doctor, so I can’t guarantee that anything I say about medical practices is accurate.)

Recently while watching an episode of House M.D., I realized how similar differential diagnosis is to software debugging.

Like any good doctor, the fictional Dr. House first seeks to verify the cause of a patient’s ailment before prescribing treatment. House is a modern-day Sherlock Holmes, master of observing the mundane details that give clues about what condition a person may have.

This art of observation is not limited to the present. Talking to a person and understanding their medical history is as important as performing tests and taking vital signs. Medical testing can give important insight into the current state of a person, but often it is what happened prior that allows a doctor to determine how a patient should be treated.

If I walk into the ER with a bite on my arm, the doctor is going to ask what happened. I could have been bitten by a dog, poisonous insect, or rabid squirrel, and all cases would need different (and potentially life-saving) treatments. Sure, they could clean up my arm and prescribe antibiotics, but there are too many potential causes to treat them all simultaneously.

Differential diagnosis is defined as “the distinguishing of a disease or condition from others presenting with similar signs and symptoms”.1 Because there are cases where we can never be 100% certain or have time to collect all possible data, differential diagnosis provides a framework for choosing the treatment with the highest chance of success.

Like medical diagnosis, software troubleshooting starts in much the same way: list several high-level possibilities and disprove them until you’re left with a single (or as few as possible) options. For example, suppose your client complains that a website isn’t working. Some diagnoses might be: (1) your client’s internet connection isn’t working, (2) there’s a problem with the network between the client’s connection and the web server, or (3) there’s a problem on the website server which is affecting anyone who tries to access it. You could have your client ping the server and try accessing other websites to rule out (1) and (2). Or you could check the server yourself, see the error logs which indicate a problem with the server, and confirm (3). You don’t necessarily need to confirm or deny each possibility, so long as you stumble upon an exceptional circumstance.

Proving possibility (3) doesn’t necessarily disprove (1) and (2) – the person could have lost internet access and the website is down, but that would be a rare circumstance. Similarly, proving (1) or (2) don’t necessarily mean there isn’t also a problem with the website. But these circumstances are unlikely.

This is why Occam’s razor is one of the central ideas of differential diagnosis: simple explanations are preferred over complex explanations. Picking the most likely scenario gives us the greatest chance of success in proceeding to solve the issue. This also saves us from wasting precious time exploring the more obscure scenarios.

One of the most frustrating parts of IT troubleshooting is when you find a problem and think it’s the problem. But later you find that it’s not the root cause of the issue. This issue might be caused by the original issue, or it could be a completely separate issue that has always existed.2

Exceptions happen. It’s important to take action against the most likely problem, but it’s as important to be willing to scrap everything and start over when better information is found.

I know how frustrating it can be when a doctor is wrong. When I was sick with mononucleosis, two doctors both told me it “probably wasn’t mono”. Thankfully, the second doctor realized that the probability was high enough to warrant a test. They called me the next day to admit they were wrong – it was indeed mono.

The doctor was following differential diagnosis: treating me for the most likely condition, while following-up to be sure it wasn’t something more uncommon. This is preferable to being misdiagnosed with a rare condition when one only has a common cold.3

Most software engineers perform differential diagnosis while troubleshooting without thinking about it. It’s a skill they’ve learned from years of experience (but should be taught in school, too). But, I’ve seen many types of engineers show a bias toward assuming a root cause in their area of expertise.

A user interface designer will approach a problem from how it’s displayed on a screen. A database administrator will approach a problem from how the data is structured and accessed. A system administrator will approach from the perspective of configuration and recent changes to the system. When you ask someone to solve a problem, they will spend the most time considering it from their area of expertise. Their solution will be similar to how they have solved problems in the past.

This isn’t a bad thing. No single way of thinking is “wrong”. It just means you should ensure your team is diverse and that different types of engineers talk to each other.

I think the same thing happens in medicine. If you tell your eye doctor you’ve been having headaches, they’ll check if you need reading glasses. If you tell your dentist, they’ll see if you grind your teeth. If you tell your primary doctor, they might refer you to a neurologist. You could end up with several explanations for the same symptom depending on who you ask. All may be valid diagnoses, but not all may contribute to the symptoms you’ve experienced.

Most professionals will say: “From the perspective of [my specialty], [factor] could be contributing to [symptom], but besides that, I see nothing else to conclude about you from a [specialty] perspective.” I have the utmost respect for professionals with this level of honesty and humility.

Be wary of anyone who is quick to conclude your problem is squarely in their realm of expertise.

I’m sure that several cognitive biases apply to this situation. My favorite is the Law of the instrument – “If all you have is a hammer, everything looks like a nail.”

Medicine and software are both enormous fields, and it’s impossible for any one person to be knowledgeable in all specializations. Being aware of one’s knowledge gaps is important, but there are also tools that can aid in decision making.

IBM’s Watson shows promise in the role of diagnosing disease. Watson can consume information quicker and with greater recall than humans. This could be a great tool in confirming a diagnosis, or informing a doctor of other potential diagnoses they are unaware about.

Software will always need troubleshooting, but there are some fundamental practices that can ensure it’s easier to monitor the health of a system. Proper logging gives precious insight into the history and present operation of a system. Rare errors can occur and if you aren’t logging the proper data, you will never know what happened. Proper logging and analytics can help you predict and prevent problems before they happen.

Whichever field we consider, having the right data seems to solve a lot of problems. The key lies in being able to sift through the data and find the actionable insights in a reasonable amount of time. Machines are good at behaving in an unbiased way. But once you put an overwhelming amount of data in front of a human, they will begin to look for what they can do with their tool of choice.

Set down your hammer today.

Footnotes

  1. Definition of Differential Diagnosis by Merriam-Webster
  2. In my experience, when you find something wrong that has been wrong for a long time (i.e. you can verify this in the code history), it’s not causing the new problem you’re trying to fix.
  3. On a similar note, overprescription of antibiotics is a big problem.