Friday, February 02, 2018

An Extremely Common and Terrible Line of Java

At my last start-up, there was a day our system nearly died. The day before, everything was fine but one morning the load on the database inexplicably spiked 1000x. We frantically looked through the check-ins from yesterday and found nothing. In fact, we found no code that even touched the database at all. What happened?

The issue was extremely well-hidden. Months earlier a wayward programmer had put a database call into an object's hashcode() method. None of our code much called that hashcode() method until a change that went in yesterday needed to put a few thousand of those objects into a HashMap. Whammo - that code unknowingly slammed the database.

You might be thinking of the evil surrounding such an act. While terrible person did this? While I can't agree more it was a bad idea, object-orientation and typing does not dictate performance. It's an externality. It's not called hashCodeAndShouldBeFast(), it's called hashcode() and if you override it you darn well better return a hashcode. That's your only real duty.

In that spirit, there's a far less impactful but so far more common piece of code that bothers me along the exact same lines. I doubt I'm going to change the useless fad by writing about it here, and in practice the harm here is quite small, but at a minimum writing this line of code on the whiteboard has become one of my favorite conversation-starters in technical interviews. Without further adieu:

List<String> myList = new ArrayList<String>();

Looks familiar doesn't it? You can replace String with anything you like, that's not the important part. You can even change the ArrayList to any other common Java List that makes you happy too - that's also not what I feel is wildly broken. Feel free to instantiate the list using tricky Guava for those haters of the new keyword - that indeed, is also not the part I'm talking about.

The part I think is broken is where the list is casted to the interface. That's a big object-oriented paradigm isn't it? "Program to the abstraction", not the concrete class.

Sounds great and quite often it is. Like most methods of shooting your foot off in programming, abstraction however can be a -edged sword.

Abstraction is a fantastic concept. It is a foundational piece of a type system that allows us to think in non-specific terms when we don't need to. It hides implementation. Abstraction in the general sense always has a cost. Eventually the JVM or computer removes abstraction to actually execute the code. In an object-oriented hierarchy, as you become more abstract you lose detail - which rather becomes the point.

If I have a method that accepts a Shape that I wish to draw(), I don't generally want to create code instances for every type of Shape that might come along in the future. At this level of abstraction I don't care "how" the Shape is drawn (the shape will take care of that), I just care that all Shapes do indeed know how to draw() themselves.

And therein lies the rub. There are many kinds of lists in Java but for most part, there are two primary ones:  ArrayList and LinkedList. There is exactly one reason you'd ever choose one rather than the other.

Their respective performance characteristics.

LinkedList is fantastically fast at removing elements, it's also fantastically slow at finding a given element at an index. ArrayList is the exact opposite - it's woefully terrible at removing elements and wonderful at indexed lookups. From a performance standpoint, these could not be less interchangeable.

"Programming to the abstraction" makes incredible sense when staying with the confines of object-oriented typing. But different Lists exist only for their performance characteristics. Sure both Lists can get(int) and remove(int) but that's akin to saying both Usain Bolt and I can run the 100 yard dash (and an object model casting Usain and Paul to Runner, then calling run100ydDash() is completely valid - just expect different concrete results).

Types and object models don't convey externalities. They don't convey performance or security concerns among many other things.

By casting concrete Lists to the interface, you throw away the performance information. You can send that List to any method which accepts one that may do who-knows-what to it. Using abstraction to hide performance characteristics is dangerous at best and this line of extremely common code does just that.

Of course the great benefit of casting ArrayList or LinkedList to List is... um, actually I can't think of one. I can't think of a time I don't want to know the performance (and memory) characteristics of a data structure I'm using. That's the entire point of choosing a data structure at all - and it's being thrown away. I see this code constantly (in other people's code). As far as I can tell people just robotically following a venerable OO paradigm and mis-applying it to no benefit.

Incidentally we made some changes at that startup. We started using naming to illustrate performance when it wasn't obvious (the "Principle of Least Surprise" and all). We had methods like getNameInASurprisinglyExpensiveWay instead of getName when appropriate. We made a conscious effort to apologize to our users every time we hit the database or network. Performance regressions hide well but we did our best to expose them.

Oh and hashcode() - hashcode() got a new rule - two sign-off code reviews for hashcode(). Watch out for those things, who knows what you'll find in there.

No comments: