Tag Archives: lmax

Investigative Gumption

Robert Heaton wrote recently and excellently on the importance of gumption in programming here

This post really chimed with some problems I’ve recently experienced while debugging unfamiliar code. It became clear to me that it was a skill I’ve neglected.

Stop asking for help

The primary manifestation of the problem was asking for help too soon. Plenty of other coders were more familiar with the area in which the problem occurred – why not ask them for help? Well:

  • Being shown the answer doesn’t give you the same level of understanding as finding out for yourself
  • Never working through to a conclusion will atrophy your investigative skills
  • Finding the root cause without assistance feels great
  • Exhausting this particular skill in other nearby, better programmers will cause them to leave

…and who will you ask for help then, when you’re actually stuck?

Symptoms and a diagnosis

Caveats: These are only the ones I noticed. There may be more. Yours may be different.

  • Feeling incredulous that the software could misbehave in such a way
  • Spreading that incredulity to others through subjective retelling of problem
  • Frustration at lack of ability to come up with a working theory of how the failure could occur

These are really three symptoms of a quieter underlying cause: fatigue. I’d exhausted my initial search tree of potential theories and I was out of gas. I’d run out of gumption.

Treatment

I’m making a conscious effort to try to get better at this, and I feel like I’ve been making progress. I don’t think I’ve increased in gumption, but perhaps I have increased my gumption efficiency. Here are some directions that I’ve attempted to follow:

  • Fight that feeling of incredulity. Software is (usually) deterministic. Cosmic rays are not the culprit.
  • Using the debugger is not a personal failure (this isn’t the TDD game!). Reproduce the problem with a debugger attached as soon as possible; it’ll often invalidate the wrong assumption that’s been getting in the way of your understanding.
  • If you can’t reproduce the problem, check your environment. Still can’t reproduce it? Check it again. Is it really the same as where the problem happened? Really really?
  • Trust your instincts – one investigation I failed to finish ended up being a ThreadLocal misunderstanding. I’d seen the usage of ThreadLocal hours earlier, winced, and then forgot about it because I was trying to be a good scientist. The brain is good at remembering causes of past pain, trust it when it reacts negatively.
  • Know your limits. I can get a temporary gumption boost from a Snickers or those terrific peanut butter KitKats. Once that’s worn off, it’s time to stop. Either seek help, or go home and sleep on it.

Corollary: Making systems that are easy to debug lowers the fixed gumption tax incurred when an investigation is required. Once you discover the cause of a knotty issue and want to begin a fix, first spend some time unravelling any unnecessary complexity you found on your way there.

Hopefully the gumption invested in this blogpost will help others to more efficiently expend their gumption (That’s enough gumption – Ed).

Some reminders about feature toggles

Feature toggles feature occasionally where I work. Today they featured in a lunchtime discussion.

Dare you switch?

Let’s say a particular toggle has been ‘off’ for six weeks. How confident are you that you can turn it on again, safely?

Well, where does that confidence come from? For us, it comes from our acceptance test suite, i.e are there tests that verify the externally visible behaviour in both the on and off state?

Toggle implementations

Compile time constants.

We weren’t too worried by these – each time we rebuild, the tests are rerun – and we only need to run the new or old behaviour test based on the toggle’s value.

Runtime variables

These generally get implemented as a piece of mutable state that changes on receipt of a particular message (e.g a jmx call). Here our test coverage was less convincing. For at least one case, only one of the toggle states had coverage. The developers who created the tests might have run them both ways, but in the automated build/test run, that wasn’t happening.

Feature Toggle? These are more like kill switches (or zombie buttons)

The majority of the runtime toggles were actually kill switches. Either an experimental feature of uncertain outcome was being prototyped, and needed an emergency stop button, or a feature long thought dead was disabled, and a switch left to revive it when, inevitably, the user of the feature turned up post release.

In these cases, the right path appears to be having tests that assert the positive case works when the toggle is on/off appropriately. More explicitly:

  • For an experimental new feature, add new tests describing the behaviour, with one test that verifies nothing happens if the toggle is off
  • When removing a dead feature, keep tests describing old behaviour running by pushing the zombie button before running them. Add one extra test that verifies that nothing happens if the revive button is left untouched.

Really these are special cases of the more general rule: If you liked it then you should have put a test round it.

Reminder 1: Match the scope of your feature toggle with the scope of your tests

Some very simple corollaries:

  • Runtime feature toggles should have tests that exercise both states in a given test run
  • Compile time feature toggles are ok to just exercise the currently configured toggle value
  • Reader exercise: what about configuration level feature toggles?

Reminder 2: Feature toggles aren’t free, and runtime toggles are particularly not free

Runtime feature toggles are (at least) twice as expensive to maintain as compile time constants. The payoff might be that you don’t need a release cycle to change them, but you’re paying for it every time you have to run both sets of tests, or change the code underneath that toggle.

Corollary: Feature toggles that don’t change should be removed. In general, once a feature is proven, removing the toggle should be a no brainer. The answer to the original question should really be “How has it survived for so long?”.

Reminder 3: Martin Fowler realised all this years ago

Corollary: There’s nothing new under the sun (but at least this blog is reasonably titled).