Not all software development processes are equal

Coding is easy / software engineering is hard – we need more engineers!

Foreward: This post has been in my drafts for ages. I’m hesitant to publish because…well…how could little old me have a valid critique of such a software engineering giant? Yet I keep reading it and I’m really happy with what I’ve written and the light it brings to the topic. I’m finally clicking “Publish”. I hope you like it too!

I was shown Uncle Bob’s 2012 blog post “After the Disaster” a while back – an interesting piece from a very experienced software developer about how “If [people] find that [software developers] are disciplined, and self regulating, then perhaps they’ll leave us mostly alone. But if they find that we are undisciplined hacks then you know that they’ll impose all manner of horrible regulation upon us.”.

He makes a really good point about how software is encroaching more and more into our daily lives, and that software development can be undisciplined and therefore potentially dangerous. And he’s right. “Coding” has become popular and very easy to learn. But software development – the broader craft of solving complex problems in maintainable, testable, well-structured, easily-understandable ways – is harder and needs more discipline than you get by just learning to code.

Software development is a lot more than just coding. Just yesterday, this Quora Q&A thread appeared in my inbox titled “Why is programming so easy?“. One of my favourite answers is “Coding is easy. In fact it is the easiest part of a programming job imo.” And in my opinion the world’s ratio of “software engineers” to “coders” is too low. This needs to change.

So I get and agree with Uncle Bob’s point. But I feel I need to give some critique of his explanation of it, and the examples that he uses.

Not all software is equal

Bob says: “Is there software in your cell phone…your watch…the light switch on the wall…the light bulbs…the intercom…the doorbell…the thermostat…your furnace…your air conditioner…your refrigerator, dishwasher, washing-machine, drier?”. And yes, he’s right. Software is all around us.

But he then seamlessly switches into this: “How about your car; and all the other cars on the road? How about the traffic signals? Did you ride an elevator today? Get in a plane or a train? How about an escalator? Do you have a pacemaker? An insulin pump?”

So let’s start by making it clear that not all software is equal. The first list of things is very different to the second list of things. The probability that something very bad will happen if software in the first list of things goes wrong is pretty low.

Yes, you can concoct scenarios like “my fridge got too hot and I got food poisoning” but there are other mitigating factors that prevent that hazard (like the fridge getting too hot) turning into harm (food poisoning). We’ll come back to this.

The probability that something very bad will happen if software in the second list of things goes wrong is pretty high. Most of those things move large, heavy chunks of metal around at great velocities, and are then subsequently responsible for stopping them moving too. The others are medical equipment that keeps people alive.

Let’s be clear – these are different things. A lightbulb and an airplane have very different requirements for the integrity of their software.

Not all software is equal.

Therefore, not all software processes are equal

Bob then says this: “How many times per day do you put your life in the hands of an ‘if’ statement written by some twenty-two year old at three in the morning, while strung out on vodka and redbull?”

And the answer is probably pretty close to “zero”.

I worked in safety critical software for 4 years. I was actually writing and testing safety-critical code when I was twenty-two. And I can tell you that it is definitely NOT the case that a young, junior software developer would be able to commit code to the stable codebase of software for a car, traffic signal, elevator, train, plane, escalator or piece of medical equipment at 2am on a whim.

The software processes for these kind of things involve:

rigid, formal software specifications
coding using appropriate tools and languages that minimise coding errors and help test that code does what it should and meets the spec
correct levels of testing
code review by peers, senior engineers, and often external third parties
compliance with relevant safety standards
production of safety cases that analyse the potential hazards, risk probabilities, and levels of harm to minimise danger

AND…in addition to all that there will be hardware fail-safes and other physical mechanisms that prevent a problem with software becoming something harmful.

In this context, I’ve worked on:

static analysis of compiled code at assembler level
review of compiled assembler code against the high-level source code
testing so thorough that I discovered a (already documented but unknown to my team) bug in a microprocessor – yes, our spec and testing were so good that when the test failed, we turned to the microprocessor manual to see what was up!

In another, related post, Bob writes: “Where’s the guy who makes sure all the errors are checked, and that references can’t be null, and that variables are thread-safe?”

Actually, in these industries, languages with provable properties are used. You can eliminate null references and entire classes of basic coding errors and performance issues. And in the safety-critical world these tools are often mandated.

These processes are expensive. They are used in this context because people’s lives are clearly at risk. Rigid, complicated, expensive processes are not needed for all forms of software. The budget for these projects MUST pay for the appropriate level of engineering, testing and approval.

I currently build websites. In most cases, if my code is wrong, an image is displayed in the wrong place on someone’s screen. Linting, code review, static analysis are helpful. But they can be safely omitted if the budget is not large enough.

Different software needs different software processes.

The Disaster

Bob then says: “Some time in the not too distant future, there’s going to be an event. Thousands of people will die. And it will be the fault of some errant piece of code written by some poor schmuck under hellish pressure facing impossible deadlines. Perhaps it will be an airline crash, or a cruise ship sinking. Perhaps it’ll be an explosion at a factory, or a train accident involving toxins. Perhaps it’ll be a simple clerical error at a medical research lab that causes a vial of smallpox or ebola to be improperly disposed of.”

Basically, there may well be a disaster as a result of code. But please, an airline crash or a cruise ship sinking or a train accident will not be caused in this way.

“Poor schmuck’s” are not employed to write code like this. “Hellish pressure” is not allowed in this context. “Impossible deadlines” exist, but there are protections… You can not…you are not allowed to…ship dangerous code in a safety-critical context like this.

Even if bad software does get in, the number of things that would need to happen between a software failure on a train and spillage of toxins in transit is huge and the probability of each of them is very low. And there are non-digital things – handling measures, hardware precautions, and so on – that try to prevent software hazards becoming physical harm.

“Will they find that developers work at all hours of the day and night, are under hellish pressure and impossible deadlines? Will they find that there are no professional standard, practices, or disciplines. Will they discover that we are all really just a bunch of undisciplined hacks?”

Well. Umm. No, Bob. In the industries that you keep suggesting will cause the disaster, you will not find these things.

We’re already After the Disaster

And Bob’s conclusion: “The population will scream for protection, and the lawmakers will respond with self-righteous indignation. In their toolkit they’ll have regulations, restrictions, licensing requirements, and certification tests…if they find that we are undisciplined hacks then you know that they’ll impose all manner of horrible regulation upon us….They might tell us what languages to use. They might tell us what process to use…we’ll work in a government regulated profession.”

So, let me introduce you to the Civil Aviation Authority, the Office of Rail Regulation, the Federal Aviation Authority, the National Highway Traffic Safety Administration, the Office for Nuclear Regulation.

Is it really thought that development of software in safety-critical applications has no government-backed processes and procedures, legal and regulatory frameworks, restrictions, certification tests?

Is it really thought that these things don’t exist? And even if they don’t, wouldn’t it be a good thing if a regulatory body told us to use a language that could be statically analysed to prove that it had no out-of-array-bounds accesses, undeclared variable uses, type errors, etc, etc. Would this not be a good thing?

These industries are already regulated. And rightly so. You can’t be a lazy indisciplined coder in these places. It’s not allowed. The “disaster” won’t lead to regulation because the regulation already exists.

So what IS Bob advocating?

“If they find that we are disciplined, and self regulating, then perhaps they’ll leave us mostly alone.”

It’s a call to arms. We need to self-impose discipline and regulation on ourselves, “Otherwise we’ll work in a government regulated profession. And then life will be hell.”

In contrast to what, I wonder? To working drunk at 2am on code for a lightbulb? Well, on one thing we agree! Both ends of this spectrum are bad. We need something in the middle.

We need to stop working when tired, drunk and powered by energy drinks. We need to do testing and code review. We need to use the tools, languages, and processes that make our software better. All of this we should do.

But we should also be pragmatic. Not every application is a fly-by-wire aircraft control system. And a good engineer is pragmatic, making trade-offs, assessing risks, defining requirements, building and testing code responsibly.

A lot of applications aren’t going to cause “the disaster”, and not all code needs to be built to prevent it.

So while I strongly disagree with Bob’s examples and his explanation of and implications of some hypothetical disaster, he is right. Our industry is a mess. We need to take ourselves more seriously.

Don’t just ship it, coders. Learn your craft well. Work responsibly. Charge properly. Say “no”. Ask difficult questions. Don’t make assumptions.

People might not be dying at the hands of your if statements, but that doesn’t mean you shouldn’t care.