Handling errors

You're reading a file, and right in the middle of doing it, you hear a buzzing noise, and a series of clicks, as the disk drive tries multiple times to read the same sector. But, uh oh, it's corrupt! Not really surprising given the way you treat your floppies! Humidity matters, you dumb-dumb!

Anyway, what on Earth is your program supposed to do about it? This is pretty much fatal to whatever operation your program was trying to do! And how do you find out?

(Note, this is basically me noting thoughts based upon a Mastodon discussion I had with some friends. I'm just putting it somewhere more permanent.)

ON ERROR GOTO

I'm largely mentioning this for completeness, but if you still program in BASIC then... OK, let's just go to the next one. Well, OK, this is a variant of “exceptions” except that every error is considered a single exception, and it's not easy to distinguish between different issues. It's ugly and better left unexplored.

errno

The second way you might find out is from code that tells you that the last operation you called didn't work. C has an “errno” global that acts that way. Actually technically I think they've made threadsafe versions in more recent versions of C. You can write something like:

errno = 0;
while(!feof(fp) && !errno) {
     fread(buffer, 1, BUFFSIZE, fp);
     if(!errno) process(buffer);
}

if(errno) raise_hell();

Note C has no mechanism for making sure you test errno.

There are several clear problems with errno which is why few programming languages copy the pattern:

Go

I'm covering Go's method here because while it's intended as an alternative to exceptions (see below) it's also in many ways a cleaned up errno.

Go functions that return errors tend to return tuples. The first object in the tuple is the real return value. The second is an error value, which is nil if there's no error, and an errors object if an actual error.

fh, err := os.Open(srcName)

In essence it's basically errno again, except you're forced to unwrap a tuple which in turn means you're forced to actively make a decision about ignoring an error, the error object itself is not an opaque integer without any ability to send context. And no global variables are involved. All of which resolve most of the problems with errno.

Go, itself, provides only limited syntactic help with functions that receive error objects. So you essentially have to test for errors immediately upon making a function call with an explicit if statement. This means your core algorithm can get lost in a sea of if statements and one liners become impossible if one of the functions called in that line might return an error. But this is more of a problem with Go than with the concept of using tuples. Nothing stops Go from having error handling functionality, its authors just choose not to add more streamlined ways to handle errors.

A positive is that no Go programmer has ever been enthusiastic about returning errors. If correct, well behaved, code can be written to avoid returning an error status, the code is written that way. This is mostly a good thing. The bad news is that the libraries don't seem to reflect that attitude. Oh well.

Exceptions

Now we get to the good stuff. Exceptions cause a physical break in the code, where control is switched to an exception handler, the most recent active handler for that exception being selected. A function/method can choose to “throw an exception” instead of returning a value. Then whatever called it can include code that to handle those exceptions, so you might see the above written as:

try {
    while(!fp.eof()) {
         process(fp.read(BUFFSIZE));
    }
} catch(IOException ex) {
    raise_hell();
}

At first sight, this looks great. The best part is that if you look closely, you'll see the algorithm itself is not sullied with checks to see if anything's gone wrong. So it's a huge improvement on readability. There are multiple issues though:

  1. In C++ there's a whole collection of issues with who owns the Exception object that need to be resolved. OK, that's a C++ specific issue, but.
  2. There's something called unwinding to be done, and that can be a problem as the compiler can't tell necessarily what needs to be unwound.
  3. It's very easy to abuse for situations that aren't exceptional.

The first, in fairness, is mostly a C++ issue, and not an issue with Java or other more modern languages that track created objects at the runtime level.

Now, as an example of 2 and 3, it's tempting, for example, to see that “raise_hell” could be simply a return value sent to the caller to say “Wasn't able to read the file”. And it's only one step from there to say that return value could itself be an exception.

Also if I was an idiot, and I am, I might be tempted to implement EOF itself as an exception. After all, it's exceptional right? That is, most calls to “read” won't fail due to EOF. Surely the programmers using my code would be happy if I sent an exception whenever the code tries to read beyond the file limit because they could then just catch the exception to leave the loop.

So our algorithm, expanded into a function, turns into:

SomeObject readTheFile(string filename) throws IOException {
    SomeObject r = new SomeObject();
    fp = new FileReader(filename);
    
    try {
        for(;;) r.process(fp.read(BUFFSIZE));
    } catch(EOFException) {
         // Yay
    }

    fp.close();
    r.processed();

    return r;
}

Looks OK, right?

Well, the handling of EOFException is ugly in practice. Although... oddly, technically it's probably slightly more efficient than checking for EOF each time. Regardless making something that isn't an error an exception has made the code a little uglier and harder to follow, with an unnecessary catch block that does nothing.

But the main problem here is we open a file (fp = new FileReader()) but we only close it if nothing goes wrong. If the while loop throws an IOException, then FileReader::process will drop all the remaining code and call down to whatever called readTheFile, which means the FileReader will never be closed, and as that presumably has an operating system file handle open, that's a problem. (Java belatedly added features to encourage developers to track resources they open, but these are completely optional because of Java's obsession with backward compatibility.)

Looking at the same example, the implied rethrowing of IOException is a very common pattern in Java, largely because it's taught at a very early stage as an option to handle exceptions which fixes code that the compiler is generating errors for. It's quick and dirty and many programmers simply assume that there's nothing wrong with the approach if you're not able to handle the exception immediately.

Of course, that's rarely true, and there's an additional problem with returning the same exception that was received by the code – it's typically less and less relevant the further it gets from the code that generated the error. Should readTheFile be returning IOException or would it be better for it to create its own exception for this case? What if it's not really a problem? Maybe readTheFile was written to read a configuration file and if the file can't be opened defaults will be used? For the most part an exception dropping through multiple levels of code should be considered an anti-pattern. It might be appropriate in a small number of cases, but perhaps it should be handled manually in all of them.

It'd be remiss for me not to mention the technically correct but extremely dubious claim that the stack unwinding associated with exceptions is inefficient. Why is dubious? I'll get to that shortly, but I will say now that much of this criticism comes from C++, where it's more accurate to say handling stack unwinding is clumsy.

(And also for some reason a fair amount from people think software interrupts are involved somehow and each time you throw an Exception the OS has to get involved. Yes, many decades ago that was something someone told me... though as she was part of a circle of self-described “trolls” I'm not entirely sure she was serious.)

Anyway... yes, there is a CPU overhead to throwing and catching exceptions, but it's honestly not that big, especially for languages like Java where it's just a matter of resetting the context to that of the handler for that exception and letting the mark and sweep garbage collector do the clean up later. The biggest overhead in Java is actually creating the exception object itself. The stack unwinding is not likely significantly rougher than the regular return statement in most JVM implementations.

Rust

Rust is the creation of C++ programmers who are trying to fix everything wrong with C++ will retaining what they believe to be its qualities. One of the major changes was to try to find a workable alternative to Exceptions because they're difficult to use in a streamlined, efficient, way in C++.

Rust has an Result type, a sort of union (they're called enums in Rust for some reason) that can contain either an OK value or an Error value. OK means “No problems, here's the data you wanted!” And there are tools to “unwrap” it. Also “OK” is written as “Ok” to annoy English language pedants. Result in C would look something like:

struct Result { enum { Ok, Err } type; union { struct wantedvalue okvalue; struct error err_value; } }

(Not perfect as Result uses Generics to determine the types in that union, but it'll pass for now.)

Here's what something in Rust looks like, albeit (for consistency) translated to a generic Javaish syntax as with the above but adding one Rust construct so it doesn't look exceptionally clumsy (especially as Java doesn't have unions!).

char[] buffer = char[BUFFSIZE]; int count; boolean ohnoanerror = false; while(!fp.eof()) { match(fp.readChunk(buffer)) { Ok(block) => process(block); Err() => { ohnoanerror = true; break; } };

process(buffer, count); }

Unlike Go, Rust offers some shortcuts. Most assume you're perfectly happy with your program terminating (panicking) if you use them (or have set a panic handler, which is generally not recommended); some allow you to exchange a default value for an error if one is available. And most importantly,“?” can be used as a “rethrow this” type thing, eg:

Result readTheFile(string filename) { SomeObject r = new SomeObject(); fp = new FileReader(filename)?;

while(!fp.eof()?) r.process(fp.read(BUFFSIZE)?);

fp.close()?; r.processed();

return new Result<>(OK, r); }

This behaves similarly to the Exception based examples above. But its subject to the same problem I mentioned above about blindly rethrowing errors being largely an antipattern.

If you want to handle the errors however (beyond using a default value), generally you need to break down your code as above and handle them immediately.

So... upsides and downsides?

Like Exceptions and the Go thing, you can't ignore an error. Well, to put it another way, you have to intentionally indicate you want to do nothing if you detect it. Alas also like exceptions you're practically encouraged to send errors, context free, downstream if you don't feel like handling them yourself.

Unlike exceptions, there's an overhead on each and every call you make to a method that might return with an error. That is, the overhead is with both the exceptional behavior and the non-exceptional behavior. This is why I called the criticism of Exceptions as “inefficient” as dubious. The overhead takes the form both of mandatory parsing of the return object's error status, and the returning of additional data even in instances where there is no error.

Also unlike Exceptions you cannot choose to separate error handling from an algorithm, which means code written using these methods tends to be longer and more difficult to understand. The underlying algorithm is hidden as every call to a function or method is surrounded by conditionals determining what to do if the last thing failed or not.

Other options

One option, jQuery being an example, is to build libraries that reflect the likely use and allow code to be structured around them in a way that separates the algorithm from the problem handling. For example, this is what an AJAX call looks like in jQuery:

$.ajax({
   url: "/ping",
   success: function (data, status, raw) {
      // Processing stuff goes here //
   },
   error: function(raw, errorstatusmsg, errormsg) {
       // Error handling here
   }
});

Closures have their own overheads, but this isn't a bad use of them. We can rewrite our file processing thing above as:

boolean ohnoanerror = false;
FileReader fp = FileReader(filename);
fp.foreachBlock(BUFFSIZE, boolean (byte[] block) -> { 
    process(block);
    return true;
}, () -> {
    ohnoanerror = true;
});

if(ohnoanerror) ...

Error handling within the main algorithm then ends up being only those errors the algorithm itself needs to address.

Better options?

Ugly though Exceptions are, I'm inclined to think most newer attempts to create a better error handling system have actually made things a little worse. I understand why C++ programmers are desperate to get away from Exceptions, but most languages are avoiding the problems C++ has anyway. Exceptions prevent normal behavior from having an overhead, and allow the core algorithm to be coded without being drowned and hidden in error checks. If every single return value has to be checked for a potential error, you end up with messy code, the underlying algorithm obscured by constant error checks, and code that has additional error checking overhead for non-exceptional situations, which is what you're trying to avoid.

But there's no denying exceptions just aren't that great either. Exceptions are generally overused and implementations tend to over-engineer them. And there is some complexity if the underlying language doesn't use mark and sweep garbage collection – although again, the fact the complexity is purely during exceptions makes the trade-off worth it. Most implementations encourage programmers to just rethrow exceptions, making it easier to implement that antipattern than to handle exceptions properly.

Some would argue, rightly, that checking that there was a problem is ultimately part of the overall algorithm even if it isn't the core algorithm and so it should be clearer where the checks are. That is, simply rethrowing an exception or throwing it later doesn't make it clear where your core algorithm may stop due to an issue. Exceptions, alas, do have issues here too. A bug can be hidden in an unclear jump that can happen under certain circumstances.

The jQuery example above is an interesting, albeit bloated, way of building an API around a likely use case that results in errors being handled in a more organic way. No exceptions are thrown, no code is constantly having to check if something returned has a problem. But it's also very tied to a specific use case. And it's still syntactically verbose, because few languages are particularly clean when it comes to closures.

A possible improvement on Exceptions would be to make it harder to just rethrow the same exception without being explicit enough that the same code could easily be rewritten to process it properly.

From a developer's standpoint, the aim should be both that algorithms are clear, and that error handling is clear. Before throwing an exception or returning an error, a developer needs to consider how their function will be used and whether their code is being designed in an appropriate way that's friendly to the code calling it. This sounds obvious, but given the number of language libraries that blindly copy historical patterns and throw errors at awkward times, everyone understands it until they start coding, then it's forgotten.