Monthly Archives: June 2013

Haskell’s benevolent influence

without even a mention of the ‘M’ word

One of the things that attracted me to SPA 2013 was the Neural Net workshop on the Sunday. It turned out to be even better than I hoped; although we didn’t get quite as far as we could have done.

The first thing we wanted to do was implement a Neuron. In this particular case, a neuron took two inputs (let’s say they are doubles, for the time being) and then performs a computation on those inputs to determine a single output.

In this case, our neuron’s computation was as follows:

output = inputOne * weightOne + inputTwo * weightTwo > threshold ? 1.0 : 0.0

My pair and I quickly put together an implementation of this in Java that looked a bit like this:

package net.digihippo;

public class NeuronImpl implements Neuron {
    private final double threshold;
    private final double[] weights;

    public NeuronImpl(double threshold, double[] weights) {
        this.threshold = threshold;
        this.weights = weights;
    }

    @Override
    public double run(double[] inputs) {
        double sum = 0.0;
        for (int i = 0; i < weights.length; ++i) {
            sum += (weights[i] * inputs[i]);
        }
        return sum > threshold ? 1.0 : 0.0;
    }
}

You could obviously implement this as a static function, but suitable hints were given that the threshold and weights were properties of a given neuron, and so fields it was.

Many connected neurons – a neural network

To cut a long story short; we eventually want to plumb several neurons together to form a network. A single neuron is limited in the functions it can express, for reasons I won’t explore in this post; networks of them do not suffer in the same way.

Handily, once the weight and threshold of a given neuron are fields, the call to a neuron matches that to a network of neurons. But we needed a way to plumb a tree of neurons together, like so:

inputOne ->|          |
           | neuron 1 |_  
inputTwo ->|          | \_|          |
                          | neuron 3 |---> output
inputOne ->|          |  _|          |
           | neuron 2 |_/ 
inputTwo ->|          |

N.B Each neuron on the far left of the network is passed the same pair of inputs.

We can see that each Neuron either has precisely two ancestors or none at all (proof by terrible ASCII picture + some induction). A light, well, actually it was more a series of small, excited explosions, went off in my head. Now I just needed to remember that Node a Tree a Tree a magic and I would be laughing. My pair was working in Java, however, and somewhere between the magic and the .java files being edited in Eclipse, something wasn’t working. I needed to get it onto the screen before I could translate it across. Ten minutes or so later, I had this Haskell version of Neuron (which incorporates the concept of a network):

data Neuron = Neuron Double Double Double
            | NeuralNetwork Neuron Neuron Neuron
              deriving (Show)

runNet :: Double -> Double -> Neuron -> Double
runNet i1 i2 (Neuron t w1 w2) = bool2doub $ (i1 * w1) + (i2 * w2) > t
runNet i1 i2 (NeuralNetwork n n1 n2) = runNet n (runNet i1 i2 n1) (runNet i1 i2 n2)

bool2doub :: Bool -> Double
bool2doub True = 1.0
bool2doub False = 0.0

Side note: Feedback on any Haskell naïveté within this post would be greatly appreciated!

As you can see we have a neat, recursively defined, data type, and we defer any actual neuron run until we reach a “leaf” (which might have been a better name for the first constructor) at the far left of the ASCII diagram from before. It should be possible to translate this to Java without too much difficulty; we can map the data type Neuron to an interface, and the two constructors to two implementations of that interface.

Compiling Haskell to Java via human.

The sharper eyed readers will have noticed that we were on the way to doing this already in the Java source; NeuronImpl implements Neuron. An accidental bit of foreshadowing there, apologies. So, our NeuronImpl doesn’t have to change, but what does NeuralNetwork look like?

N.B Apologies for the use of above and below – they are extremely poor names for the two parent neurons.

package net.digihippo;

public class NeuralNetwork implements Neuron {
    private final Neuron neuron;
    private final Neuron above;
    private final Neuron below;

    public NeuralNetwork(Neuron neuron, Neuron above, Neuron below) {
        this.neuron = neuron;
        this.above = above;
        this.below = below;
    }

    @Override
    public double run(double[] inputs) {
        return neuron.run(new double[]{above.run(inputs), below.run(inputs)});
    }
}

Remarkably similar, it turns out! The Haskell version is somewhat terser, but the idea appears to have survived translation. It’s quite pleasing to know that lessons from Haskell can (albeit in this simple example) be applied in less fun languages. One wonders if there are places where we cannot translate Haskell data types and their constructors into interfaces and their implementations.

A new requirement arrives

Two inputs? Bah! We laugh in the face of two inputs. We want to be able to deal with n inputs. N.B This means that each level of the network has n times more neurons than the next. Perhaps this might be where our Haskell -> Java translation starts to creak? Let’s add the idea of multiple inputs to our Haskell implementation.

data Neuron = Neuron Double [Double]
            | NeuralNetwork Neuron [Neuron]
              deriving (Show)

runNet :: [Double] -> Neuron -> Double
runNet inputs (Neuron t weights) = bool2doub $ (sum (zipWith (*) inputs weights)) > t
runNet inputs (NeuralNetwork n ns) = runNet (map (runNet inputs) ns) n

Nice and simple – if anything, the variable input version is actually simpler than the fixed, two input version. The use of sum (zipWith ... is particularly pleasing, to my eye. Let’s run it through the mangler and see what we get out in Java…

package net.digihippo;

public class NeuralNetwork implements Neuron {
    private final Neuron neuron;
    private final Neuron[] parents;

    public NeuralNetwork(Neuron neuron, Neuron[] parents) {
        this.neuron = neuron;
        this.parents = parents;
    }

    @Override
    public double run(double[] inputs) {
        final double[] results = new double[inputs.length];
        for (int i = 0; i < inputs.length; ++i) results[i] = parents[i].run(inputs);
        return neuron.run(results);
    }
}

Not bad

Conclusion

Translating a well meaning Haskell implementation into Java:

  • can work
  • might even be a good idea.
  • involves writing some Haskell, and is therefore good for you.

A horrible warning

There is recursion here. Mind how you go.

A laziness trick to relax your code

So, this week on Distilled Derivatives, some more laziness. And also, exceptions. Everyone loves them, really useful things for letting you move error handling into a single location that they are. Now and again though, their misuse can really get in the way of getting something done.

Retrieve the content of the first functioning url

Given n urls in some ordered collection, get the content of the first “working” url, where working is defined to mean “has http code 200 and non empty response content”. Once you’ve found it, put the response body of the url into some local file.

This should be easy, thought I. How wrong I was.

First problem

All the java http libraries throw IOException. Ok, that’s not too bad, we can get past that with only a little evil. Here’s our first go.

  def withUrlConnection() : String = {
    for (url <- urls) {
      try
      {
        val actualUrl = new URL(url)
        val connection = actualUrl.openConnection().asInstanceOf[HttpURLConnection]
        return stringFromHttpUrlConnection(connection)
      }
      catch {
        case e: IOException => {}
      }
    }
    throw new RuntimeException("all failed")
  }

This isn’t too bad – we’re only teetering on the edge of breaking the “don’t use exception for control flow” rules, rather than falling into the abyss. We can simply take the resulting String and write it to a file (implementation elided). Oh, and in case you were curious, here’s the implementation of stringFromHttpURLConnection (we’ll need it again later).

N.B I’ve completely elided any closing of any connection everywhere in this article, because like the technique I’m trying to describe, I am lazy.

  val stringFromHttpUrlConnection: (HttpURLConnection) => String = { connection : HttpURLConnection =>
    Source.fromInputStream(connection.getInputStream).getLines().mkString("\n")
  }

The problem worsened, however, when some kind soul tried to help out by creating a function like the following:

Problem 2 – an unhelpful helper

  def withUnhelpfulFunction(path: String) {
    var done = false
    for (url <- urls) {
      try {
        if (!done) {
          downloadFile(url, path)
          done = true
        }
      } catch {
        case e: IOException => {}
      }
    }
  }

  def downloadFile(url: String, path: String) {
    // lobs url response body into path on success, IOException if not. Implementation too boring to include.
  }

At first glance this function looks like it might save us some code, but once you start talking about failure or multiple urls, we’re screwed. It’d be perfect if we wanted a one shot download that noisily failed, and that’s probably the use case the original author had in mind. In general the approach the function takes is a bit evil – it’s performing IO and letting us know we succeeded by not throwing an exception. That’s not the world’s greatest signal.

So, two quite ugly solutions with exceptions. Is there another way?

My hobby: getting lazy evaluation to write my program for me

Functional programming has given us some excellent tools to help decompose this problem, and there’s a neat lazy evaluation trick that can get us out of jail here. What do we need? A function that will:

  • synchronously resolve a url to the response body
  • include the possibility of failure in its return type

Roughly, this is equivalent to URL -> Maybe String (or Option[String], in Scala land). Try as I might, I can not find a scala or java http client library that follows this approach. Dispatch looks terrific but is also asynchronous, and I don’t really want to talk about futures in this post. Handily, numerous other sandbox projects I’ve tinkered with have already forced me into writing my own, so I have a handy function lying around.

  def withHttpLib() : String = {
    val requestor: HttpRequestor = new InsecureHttpRequestor()
    val results = urls.map(url => requestor.performGet(url, stringFromHttpUrlConnection))
    results.find(_.wasSuccessful()).getOrElse(throw new RuntimeException("all failed")).getResult
  }

Look ma, no for or if! Even better, this approach appears much shorter, however, this is mostly because much of the responsibility has moved into the little http library. Let’s walk through the execution of this program.

A little secret

  val urls = Stream("www.digihippo.net/home.html",
                    "www.digihippo.net/index.php",
                    "www.digihippo.net/foo.bar")

The structure the urls are sitting in is actually Scala’s excellent Stream abstraction. By default, operations on Streams are applied lazily. For example, if we invoked map on a particular instance of Stream, execution of the function we pass in is deferred until someone tries to extract an element from the Stream. In our earlier examples, we use Scala’s for comprehension to iterate across urls. In our favourite implementation, you can see that we’ve started applying map to it instead.

N.B These won’t actually work if you do new URL(url) on them. They’re in the current format because my library doesn’t yet work out whether to connect securely or not from the passed in url String.

What’s this http library doing then?

    val requestor: HttpRequestor = new InsecureHttpRequestor()

The requestor here is an object from my http library. It’s insecure because it doesn’t deal with https (it doesn’t lack confidence), just so you know. To execute a given request, one provides a url, and a response decoder – the requestor returns you a type parametrized Http object, which may contain the thing you wanted to decode, or an error. What’s happening underneath? Well, something like this:

  • Attempt to get url
  • On failure, return Http.failure() with some helpful details
  • On success, attempt to parse using provided response decoder
  • If response decoder succeeds, return whatever it decoded
  • If it failed, return Http.failure() with some helpful details

Hold on though – why does our decoder get the opportunity to return any sort of Http? Why doesn’t it just attempt a parse and give up? Well, we have to really think about what makes any given http request a failure. The definition in this post is not necessary universal – sometimes we expect a 404, or a 500, and still wish to parse something from the resulting connection.

As such, at the moment, the decoder can examine any aspect of the created HttpURLConnection and decide for itself, if it wants to, that this request was a failure. Our decoder does no such thing though – it just attempts to grab the response body and whack it into a string. The library will helpfully catch any exception resulting from playing with HttpURLConnection and return you an unsuccessful Http instance with the error information stashed inside.

What is this line really doing?

    val results = urls.map(url => requestor.performGet(url, stringFromHttpUrlConnection))

So, hands up – who thinks we’re firing off http requests as we invoke map here? No-one? Good. Glad to see people are keeping up. What we’re really doing here is creating a new Stream, which at the moment is actually a collection of functions that, when called, will return an Http[String]. So this line actually has no side effects, we’re really just defining how to compute the next elements of results, rather than computing them.

This is where the magic happens

    results.find(_.wasSuccessful()).getOrElse(throw new RuntimeException("all failed")).getResult

Now, the density of this line is considerably higher than the others. If that’s a problem for you, you might be a n00b. So, what the hell is happening?

Well, that find invocation would be better named findFirst. It traverses results, examining each element to see if it was a success. That means it needs to compute the element! So, find actually does this:

  • Execute function to obtain element. N.B this finally executes our map function
  • Examine result to see if it was successful
  • If not, repeat with next element

So, in fact, just by defining results in the way that we have means that find expresses the control flow of our application for us. One could legitimately criticize the first two solutions for repeating this standard library code, and, in fact, functional languages go out of their way to make extracting this sort of legwork easier. If you’re wondering why this is important, measure how many times you write try/catch/finally in java (or similar) next week.

The remainder of the line does the necessarily evil to fulfil something like the original contract, where we wanted a String that we could later write out to a file. We need to extract the actual value from the find result (find actually gives us back Option[Http[String]], because there’s no guarantee any of our results matched), throwing if none were successful, and then we need to extract the String out of the found successful request.

An apology

Purist functional programmers will likely be groaning at this point. The side effects of our little getter don’t happen at the top of our program, necessarily, and the usage of Option (and the Http type, which is pretty similar to Option) is not particularly idiomatic (hitting get like functions on monadic types is frowned upon).

Some side notes

Interestingly this very same function was implemented as part of a bash build script elsewhere, and it embodies a lot of the lessons that we’ve seen in the code here. Here’s a cut down equivalent.

#!/bin/bash

function noisy_curl() {
    echo "issuing request to get $1"
    return `curl -s -f $1`
}

noisy_curl http://www.digihippo.net/main.html ||
noisy_curl http://www.digihippo.net/index.php ||
noisy_curl http://www.digihippo.net/site.html

This is, given how much effort we’ve gone to set up the same effect in Scala, embarrassingly trivial. One can start to see the elegance of the unix way of doing one simple task, and writing the result to stdout, while also indicating success with a return code. Here we use the -f flag to tell curl to ‘fail’ if we get a suitably erroneous looking response code, and then just set up a series of curl invocations such that the first success short circuits the others. One starts to agree with all those people on the internet who think UNIX pipelining looks a helluva lot like composition as seen in functional programs.

We’ve come a long way baby.

So let’s stop (for now). We could certainly expand on our example by doing something more interesting that putting that result String into a file. We could consider approaches where we never parse the whole String but instead hold on to the InputStream instead. These deserve to be posts in their own right though, so we will not discuss them further here.

What did we learn?

  • Lazy evaluation allows you to elide boring control flow from your code. More generally, a functional approach can help you separate the valuable parts of your code from arduous syntactic toil
  • Wrapping exceptions into return types at the point where they are thrown makes life so much easier
  • Communicating errors in exceptions is really only good for one shot calling patterns

…and, again, more generally, exceptions are vastly overused. How often per day do you get a broken web page when browsing? Is it really exceptional? Are browsers dealing with 404 or 500 via an exception path? I seriously hope not. Exception abuse seems particularly prevalent in java, perhaps because so many libraries use IOException as their IO alarm, and possibly also because the language makes it so difficult to work with something like the Http type. Once again, I find myself thinking that Java 8 really cannot arrive soon enough.

Moral of the story

Stop throwing all that stuff around and relax, dude.