Monthly Archives: September 2013

The Classpath

Before we begin – I really feel like “The Class Path” ought to be the name of a TNG episode. Perhaps something to suggest to the TNG_S8 twitter account.

Anyway…

There comes a point in every java program’s lifetime where it must answer once of life’s fundamental questions: how am I going to talk to my config file (or, indeed, any file like resource that’s not another class)?

As good test driven developers, we’ll usually have a short digression about the definition of unit testing before agreeing that there should be at least one test that reads from a real file in the same way that the production code will, and then start to write that test.

I have noticed a prevalence of projects that choose to use the src/test/resources area of a project in order to get the configuration file onto the classpath. The production code then reads that file from the root of the classpath and everything is fine. The discussion of whether this is the right thing to do appears to have happened long ago, unbeknownst to me, and that is a little problematic, because I am not convinced it is a good idea. I would much prefer it if most resources of this sort could live on the file system.

Why?

We’ll get to that shortly (if you must know the answer, click here), first though:

A little history

Perhaps readers have come across the concept of “maven layout”. This is a convention for arranging the source code in a project so that maven can easily understand and build it. Maven layout has become standard even in projects not built by maven, and that is, for the most part, no bad thing; in particular I particularly like that it allows for multiple languages within the same src tree.

One unfortunate introduction that maven makes, however, is the inclusion of src/main|test/resources. This is an area in which we can drop resources and expect them to turn up on the classpath without really thinking about it. As such, there is temptation to avoid answering the question of “where should we store this file” when there’s an easy get out of jail answer right in front of us.

This is fine right until the point where we stop using maven, resources in that directory no longer get bundled on to the classpath at the desired time, and a cacophony of alarm bells when we try to run our program or its tests.

An easy way out

When in maven layout, do as maven does: try to follow the classpath and assembly rules that maven applies to that filesystem structure accordingly. That’s not the point I want to make, however (and why use !maven if you’re going to make your specific !maven exactly like maven?), so forget I said anything.

Why is this bad?

..and why is the filesystem a better solution?

#1 Bad diagnostics

This is probably the one that sways me most. If we mis-specify a file path in our program, figuring out where it was supposed to be is as simple as logging the result of invoking getAbsolutePath on the java.io.File object that’s not finding bits where it should be.

What about loading a resource from the classpath? Well, a standard call to getResource will return us a java.net.URL. That sounds, initially, as if it might be quite useful. How about when the resource isn’t found, though? Well, in that case, we get back null. That is considerably worse than the file case; we have no idea where the classloader looked, and therefore no idea how to fix the problem.

If we’re lucky, we can get an idea of what the classpath is by looking in a debugger (sun.misc.URLClassPath is relatively easy to figure out). If we’re in a more complicated world with custom classloaders, however, this gets far more difficult.

[
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/jce.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/rhino.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/charsets.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/resources.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/jsse.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/compilefontconfig.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/management-agent.jar,
file:/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/rt.jar,
file:/usr/lib/jvm/java-6-openjdk-amd64/jre/lib/javazic.jar,
file:/usr/share/java/java-atk-wrapper.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/ext/localedata.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/ext/sunpkcs11.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/ext/pulse-java.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/ext/dnsns.jar,
file:/usr/lib/jvm/java-6-openjdk-common/jre/lib/ext/sunjce_provider.jar,
file:/home/james/blogs/the_class_path/out/production/the_class_path/,
file:/home/james/idea-IC-123.94/lib/idea_rt.jar
]

A sample collection of paths from a UrlClassLoader.

#2 Ambiguity

So, let’s now imagine that we’re successfully running our program, and we want to know which config file is in use. We work out the list of places on the classpath, and find that config.txt is in three of them. Which one is in use? Yes, usually it is the one which is found first (just as it would be for an actual object file), but the order of finding is an implementation detail of a specific classloader, not a guarantee. With this ambiguity comes the dubious ability to override configuration by racing to put config.txt in the first listed place on the classpath.

With a file – well, there is a possible ambiguity in that if we specify the path to a file in a relative pattern, our lookup may be affected by the current working directory of the java process we’re in. If this error does occur though, we can trivially create a diagnostic message that tells us the absolute expected path of the resource we are seeking, and amend appropriately.

So what is the classpath for, then?

The classpath is a path, or collection of paths, where compiled java object files are found, just as LD_LIBRARY_PATH is a collection of places for the linker to look for shared objects. Wikipedia’s definition provides some more detail – and also avoids even the slightest hint of loading anything other than classes from it, to my delight.

Exceptions to the rule

Some java deployment platforms take the file system option off the table. Is this an excuse to start bundling non class resources on to the classpath? No.

  • The best ones provide better APIs for storage/retrieval of non-class resources.
  • A file is really only one type of URL. We could try loading configuration from a different protocol, like http.

Conclusion, and tl;dr for the impatient

The class path is for classes. While it does provide enough scaffolding to create general purpose resource loading solution, using it as such is error prone and unclear to both users and maintainers.

Snappier Conclusions

  • The clue is in the name.
  • The classpath. A path for classes.
  • This ‘filesystem’ concept. Might it be useful for storing files?

Close Quietly (with no exception)

A post about closing resources in Java. First though, an aside.

“Is it a good idea for destructors to throw?”

— Common C++ interview question

A quick bit of code can show us.

#include <iostream>
#include <stdexcept>

namespace {
  class Foo {
  public:
    void someUsage() {
      std::cout << "Using foo" << std::endl;
    }
    ~Foo() {
      std::cout << "Destroying foo" << std::endl;
      throw std::runtime_error("Exception 2");
    }
  };
}

int main() {
  std::cout << "Program started" << std::endl;
  try {
    Foo f;
    f.someUsage();
    throw std::runtime_error("Exception 1");
  } catch (std::exception const & e) {
    std::cout << "Caught exception!" << std::endl;
  }
  return 0;
}

Consider the above case where an object with a throwing destructor is popped from the stack. We naïvely hope our catch block will save us, but no, an exception has already thrown from another frame. We now have two exceptions to deal with and our program calls std::terminate (thanks to Alan for a correction here – I originally claimed this was UB).

Program started
Using foo
Destroying foo
terminate called after throwing an instance of 'std::runtime_error'
  what():  Exception 2
Aborted (core dumped)

This StackOverflow thread provides an excellent discussion of this issue from a C++ perspective. Many of the answers and comments gel nicely with the remaining content of this post, so it is well worth a read (but only once you’re done here, obviously).

What of our safe, garbage collected languages, though? What approach do we take to closing resources not managed by the VM?

If at first you don’t succeed…

try, catch, finally. This is the typical resource usage pattern in java – we acquire a resource (outside of the block – if we fail to acquire it, no close is required), use it in the try block, perform any desired exception handling in the catch block, and then clean up (usually invoking the close method on the object in question) in the finally block, which is (roughly) guaranteed to be called, regardless of what happens during try and catch.

Frequently the close method is marked as throwing a checked exception. This is annoying, as it means that even though we’ve performed some resource cleanup, our finally clause must also handle that exception type (or our function must propagate it upwards). Typically, we get around this by writing a closeQuietly method (or using one from an appropriate apache library) that catches the exception, suppresses it, then perhaps logs that a resource has failed to close.

This is absolutely fine for single resource usages – like reading all the data out of a file, or performing an http request to a remote server.

A more complicated world

Why’d you have to go and make things so complicated?

— Avril Lavigne

Commonly we will want to write applications that keep several resources of different species open for the lifetime of our application (or at least, considerably longer than a single function call). Perhaps a collection of sockets, some shared memory accessed through JNI and a sprinkling of semaphores.

What happens when we want to cleanly stop our application? These different resources will presumably be looked after by different pieces of our object graph, so we will have to traverse that graph calling close where appropriate.

In a perverse world, our code might look like this:

    public static void main(String[] args) {
        final SharedMemory memory = new SharedMemory();
        final CustomSocket customSocket = new CustomSocket();
        try {
            runApplication(memory, customSocket);
        } catch (Exception e) {
            e.printStackTrace(System.err);
        } finally {
            try {
                memory.close();
            } catch (SharedMemoryException e) {
                e.printStackTrace(System.err);
            }
            try {
                customSocket.close();
            } catch (SocketCloseException e) {
                e.printStackTrace(System.err);
            }
        }
    }

This may be a rather extreme example. Neither resource implements java.io.Closeable (which would make things easier as we could extract a single method), and both give feedback only in the form of a checked exception. I’ve left the try/catch blocks in place to illustrate just how annoying this is. How much worse this gets per different resource species is left as an exercise to the reader – one hopes it is obvious that this is another place that exception throwing fails to scale.

An alternative approach

We could mandate extension of Closeable for all such resources, but all this buys us is the ability to have just one static closeQuietly function. That would be a start. Can we do better though?

Groans from the crowd. Is this another anti-exception post? You bet it is. Let’s consider this alternative interface:

package net.digihippo;

import java.util.Collection;

public interface Closeable {
    public Collection<? extends Exception> close();
}

A couple of things to note here before we continue.

  • The interface isn’t actually that important – here’s the previous program with resources that follow this pattern without implementing the interface:
    public static void main(String[] args) {
        final SharedMemory memory = new SharedMemory();
        final CustomSocket customSocket = new CustomSocket();
        try {
            runApplication(memory, customSocket);
        } catch (Exception e) {
            e.printStackTrace(System.err);
        } finally {
            final List<Exception> closeProblems = new ArrayList<Exception>();
            closeProblems.addAll(memory.close());
            closeProblems.addAll(customSocket.close());
            for (final Exception e: closeProblems) {
                e.printStackTrace(System.err);
            }
        }
    }
  • Exception may not be the type that we want here. Once we have a return type, it’s possible that we just want a Collection of String messages that tell us why close has failed, rather than the considerably more expensive Exception. I chose it mostly because it was nearest, and possibly because having the stack could be useful for post-hoc rationalization.

These particular resources, in our example, share the program’s lifetime, and so the only customer for the close mechanism is the programmer trying to correctly order a whole bundle of unrelated shutdown and close calls. We will make their life a complete misery if we take the lazy option of failing to close by throwing – what on earth do we expect our shutdown writer to do with it?

Conclusion

For resource library creators

C++ has given us a good hint – throwing from close functions (and other functions in that family, like stop and shutdown) is unhelpful. In particular, throwing a custom checked exception type from a close method verges on malicious. It makes writing shutdown code tedious, and tedium is not a fate we should wish on anyone. We should do our brave shutdown hook writers a favour and provide our error output in a sanitized format. If we really, really must throw (and by goodness we need a superb excuse to do so), implementing java.io.Closeable is mandatory.

For shutdown writers

We are in a pretty poor place right now. From a brief scan of libraries I use, those pesky resource library creators (Presumably this includes the author? – Ed) have been out to get us – almost all of their close methods throw checked exception types that we have no hope of recovering from. Try to wrap their laziness as close to the source as possible; who knows, perhaps the idea will take off, and our lives infinitesimally improve.

Postscript

Having written no C++ for some time, I fashioned the top example without having to look up what header std::runtime_error was in, and the program compiled and ran at the first attempt. Get in. Admittedly, it didn’t do what I thought it would (my Foo was optimized away), but even that small victory made me smile. Having written this rather smug post-script I now eagerly await -Wpedantic like feedback from less rusty C++ writers!