The arrival of the Programmer-Archeologist

Or, how to copy and paste from StackOverflow for fun and profit!

Compared to other professions, software engineering is still in its infancy. But having almost reached a point where the code still running at the bottom of many large systems wasn’t written in living memory, there are now some early signs that this phase may finally be passing.

The day-to-day life faced by most programmers today rarely involves writing large amounts of code. Opening the editor on an entirely new project is a memorable event. Instead they spend time refactoring, tracking down bugs, and sticking disparate systems together with glue code.

“The word for all this is a ‘mature programming environment.’ Basically, when hardware performance has been pushed to its final limit, and programmers have had several centuries to code, you reach a point where there is far more significant code than can be rationalised…” — Vernor Vinge, A Deepness in the Sky.

The term “Programmer-Archeologist” was invented by science fiction author Vernor Vinge to describe the environment faced by programmers onboard Bussard ramjet powered slow ships travelling between the stars. However I think we may well have reached a point where it applies far sooner than he imagined. I think in fact, that we might be quite close to that point right now.

If you look at any modern system where there has been time to accumulate legacy code — most big banks for instance — I think we’re already there really. Layers of legacy code, encapsulating institutional knowledge. A legacy software system is years of undocumented corner cases, bug fixes, codified procedures, all wrapped inside software. If you start from scratch you’ll miss things. There is no guarantee that you’ll end up in a better situation, just a different one.

If you try to reimplement a large code base in a “modern” language it turns out that even getting your new code to give you the same answer as you had before before is fraught with complexities, and that’s before you even get to the domain-specific corner cases, because of the way different languages implement things as basic as adding numbers together.

I’ve yet to speak to anyone that’s been involved with a project to reimplement a large legacy code base from scratch that has anything good to say about the idea. Document, improve the build system, modernise the infrastructure around it. Write tests. But don’t throw it away. Just getting a modern build system on top of your legacy code base, getting it into revision control system, writing tests. That’s years of work right there. But afterwards you’ll be left with something that you can build and test easily. Which is what you wanted.

Because even when modern programmers do write new code the number of third-party libraries and frameworks the code sits on top of means that the lines of code ‘under the surface’ that the programmer doesn’t necessarily understand is far greater in number than the lines they wrote themselves.

While most companies are now looking for “Full Stack Developers” pretty much everyone in the business of writing software is secretly laughing at them, while of course still writing it on their résumé to satisfy the recruiters. Anyone that genuinely believes that they understand the “full stack” is probably lying to themselves. Because these days, and maybe not even for the last ten or even twenty years, that mostly hasn’t been possible. Soon, nobody will be able to understand the code all the way down.

Even the rites of passage for a new developer, their journeyman work, the piece of code they write to prove to themselves that maybe they’re actually pretty good at this stuff, are very different than they once were. The first big piece of code that most programmers write is a moderate sized piece of glue code. Even if they’re writing a new ‘service’ from scratch, they are working with tools, and in an environment, where the abstractions they’re addressing are just layers on top of work done by a previous generation of programmers.

That’s not a bad thing. Modern technology is a series of Russian nesting dolls. To understand the overall layering, and know how to search for things that will fill the gaps and corners, to make disparate parts into a cohesive whole, is a skill. Perhaps even amongst the most valuable skills that a modern programmer has.

“The Programmer-Archaeologist churns through this maddening nest of ancient languages and hidden/forgotten tools to repair existing programs or to find odd things that can be turned to unanticipated uses…”

The age of the Hero-Programmer, that almost mythic figure who builds systems by themselves from the ground up, is mostly over. It has not yet come to a close, at least not quite yet, I know several. But far fewer people are doing fundamentally new things than they think, and instead of writing new code, they should should be gluing together existing code.

Image processing in real time is now a party trick, and has been for some years, just 57 lines of code to implement license plate recognition. But it isn’t that object recognition has reached the same level of ubiquity as a bubble sort, or how to balance a binary tree, it’s that to do it you don’t have to think about how it works. You don’t really need to know how it works. Our level of abstraction has changed again.

After all, if something works. Why would you waste your time replacing it? Most coding is an attempt to carry out a task, to get something done. Once you can do something, that system tends to stick around and get used again.

Systems in use tend to remain in use; the more critical they are, the more inertia they have. But worryingly perhaps a lot of those critical systems are maintained by free labour, and breaking changes of underlying dependencies can have unexpectedly consequences. These changes don’t even necessarily have to be in the software itself.

Just changing the license on a widely used dependency can have far reaching consequences on larger systems that rely on it. This is because there aren’t really aren’t any large systems any more, just a lot of loosely coupled smaller ones with many dependencies and abstraction.

However these added layers of abstraction are starting to appear not just in software, but also in the physical world. The cost of computing has dropped to a point where it is now basically free and realistically battery power is far more of a constrain on many systems than processor power. A decade after software was declared to be “eating the world,” machine learning is in turn now eating software. It is this trend that is allowing us to add abstraction into the physical world. We’re increasingly using software and machine learning to “fix” legacy hardware, allowing mechanical dials and readouts to be remotely monitored automatically.

Which leaves us with the problem of what happens when the software and machine learning goes wrong or just isn’t very good. Because the whole point of adding abstraction, of wrapping complicated systems inside black boxes, is to make them more accessible. This is all just fine, right up until the last person to open the black box retires. Or worse yet, dies.

Which is where the Programmer-Archaeologist comes in, and the next big job in software becomes fixing old code rather than writing it.

Scientist, Author, Hacker, Maker, and Journalist.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store