The Evil Of ToString

To the tune of Iron Maiden’s The Evil That Me Do.

I keep coming across this (anti)pattern, and this time it annoyed me enough that I wanted to write in more detail about it.

Object’s contract

Let’s kick off by examining the documentation of Object’s default toString(). N.B The docs on Object are actually very good; I recommend reading the whole lot while you’re there.

    /**
     * Returns a string representation of the object. In general, the 
     * <code>toString</code> method returns a string that 
     * "textually represents" this object. The result should 
     * be a concise but informative representation that is easy for a 
     * person to read.
     * It is recommended that all subclasses override this method.
     * 
     * The <code>toString</code> method for class <code>Object</code> 
     * returns a string consisting of the name of the class of which the 
     * object is an instance, the at-sign character `<code>@</code>', and 
     * the unsigned hexadecimal representation of the hash code of the 
     * object. In other words, this method returns a string equal to the 
     * value of:
     *
     * getClass().getName() + '@' + Integer.toHexString(hashCode())
     *
     * @return  a string representation of the object.
     */
    public String toString() {
        return getClass().getName() + "@" + Integer.toHexString(hashCode());
    }

So, a few interesting points to mention in there. First off:

“Returns a string representation of the object.”

This is the contract of toString : you call it, you get something that represents the object you’re calling it on. Debuggers make use of this by implicitly toStringing variables in the current scope, and it’s very helpful. Presumably that’s why we then get the later advice: “It is recommended that all subclasses override this method.”

An abuse of toString

Now you know about toString, let’s examine a particular abuse of it.

final String requestBody =
      new JsonMessageBuilder("heartbeat")
        .addValue(NodeName.TOKEN, "PC Nick Rowan").toString();

Here, toString is used in the StringBuilder/StringBuffer style; i.e toString is a synonym for build. Two (rhetorical) questions

  • Does that fulfil Object‘s contract?
  • Is there a name that works better?

Or, both questions combined: Why isn’t that method called build?

Other problems

Instincts aside, it turns out that this overriding of toString also creates some difficulties across a codebase in general. It’s difficult to tell at which points we want the built String for a business reason or for debug output. Plenty of libraries allow you to get into the bad habit of passing them an object that they implicitly toString (I’m looking at you, log4j).

Try a ‘find usages’ on an overridden toString method in your preferred IDE. Good luck working out which of those occurrences actually involve your object; particularly if the object in question implements one or more interfaces.

Theoretical alternative

If this were any other method, we might be suspicious that a concrete override was at the heart of the design. How could we change java to ameliorate this?

Well, the default implementation isn’t that useful. It just tells us the class name and the hashCode of our instance. We can probably lose this method completely, both pieces of information are available separately. We can discuss the problems with hashCode (more concrete overrides, odd name for default implementation) another day.

If we want to let the debugger know about a Stringy description of ourselves, why not be explicit, and use something like hamcrest’s SelfDescribing?

Side note: the implementations of appendValue in Description allow passing in an arbitrary Object in. Imagine my disgust.

Conclusion

Don’t override toString to provide business logic.

Corollary: alarms should sound as soon as you start writing tests for a class’s toString. Move that functionality to a method that better describes the behaviour you want.

This post was brought to you by the following reasonably reliable general principles

Investigative Gumption

Robert Heaton wrote recently and excellently on the importance of gumption in programming here

This post really chimed with some problems I’ve recently experienced while debugging unfamiliar code. It became clear to me that it was a skill I’ve neglected.

Stop asking for help

The primary manifestation of the problem was asking for help too soon. Plenty of other coders were more familiar with the area in which the problem occurred – why not ask them for help? Well:

  • Being shown the answer doesn’t give you the same level of understanding as finding out for yourself
  • Never working through to a conclusion will atrophy your investigative skills
  • Finding the root cause without assistance feels great
  • Exhausting this particular skill in other nearby, better programmers will cause them to leave

…and who will you ask for help then, when you’re actually stuck?

Symptoms and a diagnosis

Caveats: These are only the ones I noticed. There may be more. Yours may be different.

  • Feeling incredulous that the software could misbehave in such a way
  • Spreading that incredulity to others through subjective retelling of problem
  • Frustration at lack of ability to come up with a working theory of how the failure could occur

These are really three symptoms of a quieter underlying cause: fatigue. I’d exhausted my initial search tree of potential theories and I was out of gas. I’d run out of gumption.

Treatment

I’m making a conscious effort to try to get better at this, and I feel like I’ve been making progress. I don’t think I’ve increased in gumption, but perhaps I have increased my gumption efficiency. Here are some directions that I’ve attempted to follow:

  • Fight that feeling of incredulity. Software is (usually) deterministic. Cosmic rays are not the culprit.
  • Using the debugger is not a personal failure (this isn’t the TDD game!). Reproduce the problem with a debugger attached as soon as possible; it’ll often invalidate the wrong assumption that’s been getting in the way of your understanding.
  • If you can’t reproduce the problem, check your environment. Still can’t reproduce it? Check it again. Is it really the same as where the problem happened? Really really?
  • Trust your instincts – one investigation I failed to finish ended up being a ThreadLocal misunderstanding. I’d seen the usage of ThreadLocal hours earlier, winced, and then forgot about it because I was trying to be a good scientist. The brain is good at remembering causes of past pain, trust it when it reacts negatively.
  • Know your limits. I can get a temporary gumption boost from a Snickers or those terrific peanut butter KitKats. Once that’s worn off, it’s time to stop. Either seek help, or go home and sleep on it.

Corollary: Making systems that are easy to debug lowers the fixed gumption tax incurred when an investigation is required. Once you discover the cause of a knotty issue and want to begin a fix, first spend some time unravelling any unnecessary complexity you found on your way there.

Hopefully the gumption invested in this blogpost will help others to more efficiently expend their gumption (That’s enough gumption – Ed).

Some reminders about feature toggles

Feature toggles feature occasionally where I work. Today they featured in a lunchtime discussion.

Dare you switch?

Let’s say a particular toggle has been ‘off’ for six weeks. How confident are you that you can turn it on again, safely?

Well, where does that confidence come from? For us, it comes from our acceptance test suite, i.e are there tests that verify the externally visible behaviour in both the on and off state?

Toggle implementations

Compile time constants.

We weren’t too worried by these – each time we rebuild, the tests are rerun – and we only need to run the new or old behaviour test based on the toggle’s value.

Runtime variables

These generally get implemented as a piece of mutable state that changes on receipt of a particular message (e.g a jmx call). Here our test coverage was less convincing. For at least one case, only one of the toggle states had coverage. The developers who created the tests might have run them both ways, but in the automated build/test run, that wasn’t happening.

Feature Toggle? These are more like kill switches (or zombie buttons)

The majority of the runtime toggles were actually kill switches. Either an experimental feature of uncertain outcome was being prototyped, and needed an emergency stop button, or a feature long thought dead was disabled, and a switch left to revive it when, inevitably, the user of the feature turned up post release.

In these cases, the right path appears to be having tests that assert the positive case works when the toggle is on/off appropriately. More explicitly:

  • For an experimental new feature, add new tests describing the behaviour, with one test that verifies nothing happens if the toggle is off
  • When removing a dead feature, keep tests describing old behaviour running by pushing the zombie button before running them. Add one extra test that verifies that nothing happens if the revive button is left untouched.

Really these are special cases of the more general rule: If you liked it then you should have put a test round it.

Reminder 1: Match the scope of your feature toggle with the scope of your tests

Some very simple corollaries:

  • Runtime feature toggles should have tests that exercise both states in a given test run
  • Compile time feature toggles are ok to just exercise the currently configured toggle value
  • Reader exercise: what about configuration level feature toggles?

Reminder 2: Feature toggles aren’t free, and runtime toggles are particularly not free

Runtime feature toggles are (at least) twice as expensive to maintain as compile time constants. The payoff might be that you don’t need a release cycle to change them, but you’re paying for it every time you have to run both sets of tests, or change the code underneath that toggle.

Corollary: Feature toggles that don’t change should be removed. In general, once a feature is proven, removing the toggle should be a no brainer. The answer to the original question should really be “How has it survived for so long?”.

Reminder 3: Martin Fowler realised all this years ago

Corollary: There’s nothing new under the sun (but at least this blog is reasonably titled).

ACCU 2011 Nuggets: 2. Move semantics

Or: “This is C++: everything is a lie”

This nugget from Scott Meyers’s talk on Perfect Forwarding, slides here.

I should point out that I’m pretty late to the scene here; the original proposal is more than four years old, and far sharper people closer to the source have already written concisely and exhaustively on this topic.

Essentially, the problem that needed solving is the following:

  vector<string> split(string const & to_be_split, string const & splitter) { 
    vector<string> result;
    // how can we return a vector here without lots of copying
    // or an annoying out parameter polluting our method signature?
    // or by having to bind the result to a const &?
    return result;
  }

N.B The original version of this post involved changing the return type of split to be an rvalue reference. That is somewhat out of date, and also very dangerous! In fact, if you’re using an up to date compiler and you implemented split with that signature, the code would use the perfect forwarding provided by vector such that no copying occurs.

  // Here, splitByComma would be move constructed from the result of
  // split. Even in older C++ versions, you still might get lucky and find
  // the copy is elided by return value optimization
  vector<string> const splitByComma(split(string("foo,bar,baz"), string(",")));

Further (and better) motivations and examples of move semantics are detailed here (I highly recommend reading some of this before progressing).

So, we want to write code that will take advantage of this new move support in C++11. In order to do this, we need an understanding of what is safely movable from! It turns out that C++ already has concepts within it to allow us to reason about this without completely losing our sanity: lvalues, rvalues, and lvalue references.

Scott provides the following handy guide for identifying whether a particular variable is an lvalue, an lvalue reference or an rvalue.

lvalues: things you can take the address of

  • string const foo("asdf") // foo is an lvalue
  • int i, *pInt // all of i, pInt and *pInt are lvalues

lvalue references: what you think of as references now (both of our arguments to split from earlier)

rvalues: things you cannot take the address of (unnamed temporaries)

  • split("foo, bar, baz", ","); //both characters strings are rvalues
  • function returns (result from split)

In general:

  • You can’t move from an lvalue (other people can still use it – what happens when they do?). Never do this. OK, almost never (see alternative title).
  • You can move from an rvalue (no-one can get at it – you’re safe!). One might say you should move from an rvalue, unless the cost of copying is irrelevant.

These rules are handy, but there’s one more major gotcha to go. Let’s assume that someone has already made std::string compatible with move semantics ( so it has a move constructor: std::string(std::string &&) ).

  class StringMoverPuzzle {
  public:
    StringMoverPuzzle(string && to_be_moved) :
      moved_string_(to_be_moved) // do we copy or move here?
    {}
  private:
    string const moved_string_;
  };

The answer is annoying – in fact, the copy constructor is called! Outrage! We labelled our parameter with &&, it should be an rvalue, damnit! Look further though, it has a name (to_be_moved), and, if we were in the body of the constructor, we could take its address. Shock, horror, it’s an lvalue!

So when we try to construct our moved_string_, the compiler looks for a constructor that takes an lvalue rather than the rvalue we were hoping for, and we end up copying (curses!).

The solution?

    FixedStringMover(string && to_be_moved) :
      moved_string_(std::move(to_be_moved))
    {}

That handy call to std::move is actually just a cast – we’re casting away the lvalueness of to_be_moved so that the correct string constructor is used.

This isn’t anything like the end of the move story (and I haven’t got anywhere near ‘perfect’ forwarding), but I thought the mind break of rvalue reference becoming lvalue was worth a post of its own.

For further material, I highly recommend Scott’s slides, or the video of the session here.

ACCU 2011 Nuggets: 1. Hide and seek

ACCU 2011 Nuggets : short distillations of ideas presented at the 2011 ACCU conference.
My first nugget is from Giles Colborne‘s talk on simplicity (slides here).
Giles categorised the types of usability improvements you could make into four species: Reduce, Reorganise, Hide, and Displace.
I want discuss the hiding case then share Giles’s illustrative example of how to best execute this.

Hide!

We need to make two good choices in order for hiding to work.
  1. What features should we hide?
  2. Having hidden them, how does our user access these features?
Well (you say), you could hide the less used features, then have some sort of ‘reveal’ button to allow access again. You could even adaptively work out which features to hide over the first week or so of real usage. In extreme cases, you could even talk to your users to find out what they might want. It turns out that it’s very hard to get this approach right. Colborne cited various examples where this was very badly done: the menus in Microsoft Office (or your Start Menu) being a prime suspect. Think of the video remote control with the button you always need hidden behind the protective “expert” cover (hat tip Ewan Milne for that example).

Is there a better way?

Colborne regaled us of a time when he was browsing the New York Times website, and came across a word he didn’t know: “Bodega”. Naturally, he highlighted the word, ready for the standard copy-paste into Google, but before he could navigate away, a clickable “?” appeared above the word, successfully tempting him to click it for a definition, rather than looking for it elsewhere.
You can try this feature on any article on the NYT site (try this one?), and while it isn’t perfect (it opens a new window, for shame), it’s a great little example of how to take advantage of habitual user behaviour to reveal features at the exact moment when they’re needed.
Unfortunately this then leads to the perhaps more difficult question of figuring out what the habits of your users are (if they exist!) – I refuse to ruin any hopes I have of keeping this short by discussing this here, so this is left as an exercise for the reader.

Is there a good name for this idea?

I really don’t want to have to call it user-habits-as-unhide-hints, which is the best I have so far. Further examples of this sort of technique would be good, too!