March 6, 2017
Most code can fail in multiple places and for multiple reasons. Handling these failures seems pretty trivial, something you'd cover in the basic tutorial to your programming language. Actually, I think that doing this well can be surprisingly subtle, and ultimately dances around the control flow constructs of your programming language of choice. Let me illustrate with a simple example in Python, my second-favorite programming language:
import requests url = "https://www.example.com/api/v1/status/" r = requests.get(url) data = r.json() status = data["results"]["status"]
As you can see, this code has three functional lines: it GET's a url from an example API, tries to parse the resulting content as JSON data, and then tries to read a "status" from the data. (In a more complete example, I might add a timeout for the GET or check the HTTP status, but let's keep things simple).
In this post, we'll talk about a half-dozen distinct ways of handling the different exceptions that can result from these three lines, and consider some of their advantages and disadvantages. We'll both want to catch the exceptions as well as figure out where they come from. Without further ado:
The simplest idea is just to wrap the entire code block in a try-except block:
try: r = requests.get(url) data = r.json() status = data["results"]["status"] except: status = "Error"
Advantages: The code is very easy to read and understand. It's pretty bombproof--after the code executes the status variable will surely be defined. (Other possibilities for mischief--such as waiting for the url forever, crashing the Python interpreter, or having the OS kill your python process, would affect all of the examples in this post and will thus be considered out of scope. I'll quote Jeff Dean here, from a talk I heard him give on how Google improved their latency. Someone asked what would happen if a data center burned down. Jeff: "We handle that at a higher abstraction layer").
Disadvantages: There's no differentiation between different kinds of errors. We can't tell apart errors from the three lines: a JSON parsing error is indistinguishable from a connectivity issue, for example. We also can't tell "expected" from "unexpected" or "programming" errors: You expect connectivity errors occasionally, but you don't expect that "status" was supposed to have been capitalized as "Status", (a possible programming error), or that you accidentally misspelled a variable name. These errors however are caught and masked.
An obvious next thing to consider, then, is taking advantage of Python's ability to catch specific exceptions:
try: r = requests.get(url) data = r.json() status = data["results"]["status"] except requests.RequestException: status = "Connectivity error" except ValueError: status = "JSON parsing error" except (IndexError, KeyError): status = "JSON format error"
Advantages: As before, it's pretty easy to read this code. This time, though, it distinguishes between different kinds of errors, and doesn't catch and mask most programming errors.
Disadvantages: While the code is easy to read, it's not obviously correct. Unless you're very pretty familiar with the requests library, I doubt that you knew that RequestException is the base exception in requests, (I had to look it up myself). Similarly, are you sure that requests will throw a ValueError if it can't parse the response content into JSON? In fact, after taking a closer look at the code above, I now see that it actually isn't correct as written--if data is a list instead of the expected dictionary, then data["requests"] will throw an uncaught TypeError, (which should be classified as a "JSON format error").
Another key disadvantage is that exceptions in different places can be coalesced into the same exception. For example, if requests.get(url) threw a KeyError for some reason, it would be incorrectly classified as a JSON format error instead of an error getting the url. This coalescing issue makes the exception handling code somewhat non-robust to future changes. For example, perhaps a programmer will later generalize this code to be able to pull the url from a configuration system. They might update the GET line to requests.get(CrazyConfigClass.get_url()), which could easily throw a ValueError or KeyError that would be caught in the wrong place.
Another approach you might try, then, is to nest multiple try-except blocks, one for each line:
try: r = requests.get(url) except: status = "Connectivity error" else: try: data = r.json() except: status = "JSON parsing error" else: try: status = data["results"]["status"] except: status = "JSON format error"
Advantages: This code correctly isolates exceptions that arise from each of the three functional lines. Unlike in the above Fine-Grained Exceptions approach, we don't have the coalescing issue where different functional lines that throw the same exception get treated the same.
Disadvantages: This code is extremely hard to read, at least for me. The "happy path" executes on lines 2, 7, and 12, at three different indentation levels. If we had four or five functional lines, instead of our three, this code would be even less readable. I feel like I'm playing chess when I try to verify that each of the four cases--happy case, connectivity error, parsing error, and format error--executes correctly. And I'm not very good at chess.
(This code also suffers from the masking of unexpected and programming errors issue that I discussed in the Giant Try-Except approach. You can replace the blanket except statements to catch more specific errors, to fix this issue, subject to the same caveats about making the code less obviously correct as discussed in the Fine-Grained Exceptions approach).
The fundamental reason that the code above is so nested is that Python doesn't have the goto statement: Ideally, after defining the status in an exception handler you'd like to jump to the end of the rest of the code. Breaking out of a loop is an implicit way of doing this, so let's try that:
while True: try: r = requests.get(url) except: status = "Connectivity error" break try: data = r.json() except: status = "JSON parsing error" break try: status = data["results"]["status"] except: status = "JSON format error" break
Advantages: This code correctly isolates exceptions that arise from each of the three functional lines, and is somewhat more readable than the above Nested Try-Excepts approach.
Disadvantages: The while True is pretty scary to me--what if you forgot to break somewhere? Fortunately, you can fix this by replacing the "while True" with "for _ in [None]", so that the loop more obviously terminates and is more robust to programming errors. But even if you do that you'll still be left with the problem of remembering to break after handling each error.
Another disadvantage is that this code is a bit too "clever". It would not necessarily be obvious to a colleague or a future version of yourself what you were attempting to do here without thinking about it a bit. Who uses a loop as a bastardized goto?!
Another kind of goto that I think is a bit more natural in this case is that of exception raising and catching:
class MyUniqueException(Exception): def __init__(self, status): super().__init__() self.status = status try: try: r = requests.get(url) except: raise MyUniqueException(status="Connectivity error") try: data = r.json() except: raise MyUniqueException(status="JSON parsing error") try: status = data["results"]["status"] except: raise MyUniqueException(status="JSON format error") except MyUniqueException as e: status = e.status
(Technically, in this toy example we don't need to define our own exception--we could have just used Exception and pulled the argument out with e.args. But if we were more specific about the kinds of exceptions we caught for the functional lines, we would want to define our own exception class so that we don't mistakenly catch things we didn't intend to).
Advantages: This code correctly isolates exceptions that arise from each of the three functional lines, and is a bit less clever than the While True approach.
Disadvantages: That said, I think it's inelegant to define your own exception class just for control flow issues. I suppose that if this style of exception handling was the one you settled upon for an entire codebase, perhaps you'd define a ControlFlowException or similar that you'd share throughout your code, instead of defining new exceptions for each block you wanted to execute. But even in that case, I personally feel somewhat uncomfortable raising exceptions in my code when there isn't a serious issue, even if I later go on to catch them. The semantics of raising an exception, at least to me, should be that despite your best efforts you don't know what to do, so you're throwing your hands up and giving up.
Our final approach to this problem uses the goto nature of function returns:
def get_status(): try: r = requests.get(url) except: return "Connectivity error" try: data = r.json() except: return "JSON parsing error" try: return data["results"]["status"] except: return "JSON format error" status = get_status()
Advantages: This code correctly isolates exceptions that arise from each of the three functional lines, and I think it's clean and readable. The fact that it can use "return" means that we don't need to add an additional "break" line like we did in the While True approach. This keeps the line count down a bit.
Disadvantages: You might have noticed that as written get_status doesn't take any arguments; rather it uses the url that was defined in the enclosing scope. That was deliberate--in general the functional lines might use many locally defined variables rather than just one. Some people frown upon nested function definitions, (i.e. defining functions inside of other functions), so they might want to define the function outside of this scope. (The functional lines, in general, could be deep inside of some function or method, rather than at the top-level scope as in this toy example). In that case, they'd need to pass all of the local variables that are used as arguments to get_status, which would lead to fairly frequent changes to its signature as those functional lines changed.
Personally, defining nested functions and having them use the variables in their enclosing scope doesn't bother me too much. I personally feel it's better style in this case, since the function we've defined isn't really general use--it's only good for this one thing--and so defining it only in the place it can and should be used makes more sense to me. This approach is my favorite of the six I've discussed above.
Let's summarize some of the characteristics of the approaches with a nice table:
|Approach||Isolates Exception Source||Readable||Robust to Programmer Errors||Clever|