Forensic Design Documentation

NOTE: This article is also posted on Medium.

“Prediction is very difficult, especially about the future.”

Niels Bohr
Juggling Knives
Don’t Try This At Home, Kids!

WARNING!

TLDR; Don’t try this at home, kids.

The process I describe herein is not one that I recommend for teams, or relatively inexperienced engineers! Doing it this way could result in truly Brobdingnagian disasters!

It needs to be done by someone with a lot of experience. Book-larnin’ is not likely to be as valuable as scars and limps.

Also, the person implementing this needs to be pretty damn obsessive.

That’s me.

ABSTRACT

I spent most of my career at large, “classic” corporations. Ones with long, storied experience in engineering.

They pretty much all used a “classic waterfall” development methodology. It’s an easy-to-learn, process-oriented methodology that managers can understand.

NEVER underestimate the necessity of making sure that managers can understand your development process. I suspect that many engineers dismiss the importance of their bosses understanding what they do, and the consequences of that attitude can be dire.

I was an engineering manager for 25 years. I know of what I speak.

So no matter how “cool” your methodology is, if you can’t sell it to management, the chances are better than even that your project is gonna end up Waterfall; even if they call it “agile.”

I think that I can pretty much guarantee that what I’ll describe here will never be acceptable for managers. It’s way too tenuous, and depends on some very difficult-to-quantify metrics. It’s far beyond agile. It’s “wet noodle.”

However, it’s how I work alone, and it works very, very well for me. I produce exceptionally high-quality software, complete with tests and documentation, quite quickly, and I ship it.

In this article, I’ll refer to my published open-source work. In particular, the BAOBAB Application Server project (No longer open-source), which I used as a self-teaching project to reestablish my engineering chops.

THE “NAPKIN SKETCH”

I don’t write large upfront design specs (qualifier: when working alone). That’s because I consider them to be “concrete galoshes.” I try to avoid structure as much as possible, if I am given a choice (NOTE: If you are working in a team, then you MUST have some structure, or Bad Things Will Happen).

I start with an overall system diagram; what I call my “napkin sketch.”

Here’s the one I made for the BAOBAB Server:

I had worked on a similar application before, so had a fairly good idea of the type of architecture I’d need.

I decided to use two databases, so a security database could be segregated and supplied with forensic tools, while the main database could be optimized.

I knew that it would need to be implemented in layers; with each layer being given a specific scope and responsibility.

I knew that I wanted the final access to be bottlenecked through a semantic REST API, as opposed to any kind of direct presentation via Web access. This meant that SDKs would be pretty much required, as REST has no state, and isn’t necessarily the easiest interaction.

I wanted to be able to completely abstract the data layer, because I wanted to be able to “swap out” the model, if necessary, to accommodate technical limitations or scaling.

In my initial plan, I decided to use PHP 7 as the base technology for the server, with an ambiguous database requirement, as expressed by PHP PDO. This allows limited abstraction, and also helps protect against SQL injection attacks. The main reason that I chose PHP, is because I was already a highly experienced PHP developer, and would be working on this completely alone. I didn’t want to have to deal with learning a new language. The principal reason for this project was to re-establish my engineering process skills and habits; not to learn a new language. I’d already decided to develop my Swiftskills, once I had re-established my engineering creds.

Despite all the shade thrown at it, PHP is still a highly-capable server language, and won’t be going away any time soon. It has adequate object handling, and its standard install would have the functions I needed to implement the capabilities necessary.

More importantly, though, is that BAOBAB was designed to be used by non-profit organizations that wouldn’t have the means or the talent to implement more “buzzword-compliant” technologies.

I decided to decouple the security and main databases, with no database relations. This allows each one to be completely abstracted, and possibly implemented using different technologies.

I decided to use an “ultra-simple” DB schema, as that would allow the most flexibility in swapping out Model technologies. Just a single table, with a single schema in each, and the base of the schema would be identical in each database; allowing me to assign a common abstract base class to parse them.

I decided that I would design the initial release with support for both MySQLand PostgreSQL. These are both free alternatives (see “nonprofits,” above).

I designed a sequestered admin handler and security access. I assumed that I might want to change the way they were implemented, but having them listed as separate entities meant that I wouldn’t forget about them as I wrote the system. In reality, they would probably be mixed in.

Importing data to the system is an important capability. I designed an importer at the same level as PDO. This was another “don’t forget” design element. In reality, I’d probably want to implement it at a much higher level, but I wasn’t ready to think about that, quite yet.

External data gateways would be another issue, but I had a feeling that the top REST layer would end up handling them.

I also wanted to avoid dependencies. Since this was being aimed at NPOs, with limited budgets (including salaries for tech folks), I wanted to make sure that the system could be stood up on its own, on a cheap, low-overhead server host.

This is the sum total of all pre-development documentation. As you can see, a lot of it was not committed to “paper,” and remained in my head.

BADGER: The First Layer

I needed the “bottom” layer to be absolutely rock-solid, and to provide a flexible, robust foundation for the rest of the stack. I chose the Honey Badger as its spirit animal, as The Honey Badger Don’t Care:

It needs to be tough as nails, hard to kill, flexible, low to the ground, and rather primal in its priorities.

Plus, I decided that the server would have an East African aspect.

I already knew that I’d be using PDO, so BADGER would implement PDO.

Tests and Code Grow Up Together

An important aspect of my process, and the project, would be how it would be tested.

It’s critical to start thinking about four things at the start of the project:

  • Testing (and Quality Metrics)
  • Security
  • Localization
  • Error Handling

These four aspects are all things that will cost an incredible amount of technical debt, if not handled from the very start.

Localization wouldn’t really be much of an issue. I could assign a “tag” column in the DB schema for a localization token, and leave it at that. Since the server, itself, would not be presenting any localizable content, this tag would allow its consumers to implement their own localization.

Security is a fairly involved process that would be expressed in many ways. It’s really too big a topic for this article, but suffice it to say that I act as my own “red team,” when designing and coding, with every line of code accompanied by the thought “I am a blackahat with code execution privileges. How can I leverage this line of code to dig in deeper?”.

I also use a fairly classic “token-based” system. It implements data access “at the bone.” Token filtering is actually done in SQL, so blocked data never even makes it into query responses. It does not read everything in, then sort it out after caching.

I designed in a fairly simple, yet effective error-handling system that would allow errors to be trapped and reported. I don’t like exceptions, so I tried to keep it to a rather traditional model.

I like the idea behind TDD. However, TDD requires that the design be far more mature in a project’s lifecycle than I planned to use, so I needed to figure out a way to get the testing benefits of TDD without the requirement for the concrete galoshes.

I decided to “pair program” my tests and functions. As I start to implement a function, I write out its signature, and start to write out a test for it.

I start with a test harness. In some cases, there’s one already available; in other cases, I need to “roll my own.”

In this case, I opted to “roll my own.” PHPUnit is a pretty good test harness, and, normally, I would have used it. However, it’s not a trivial process to set up PHPUnit. I would have no problem doing it, myself, but I didn’t want to raise the bar too high for users of the system.

Most of my test harnesses are pretty modular. This is because I’m not sure what tests I’ll be writing when I start. I try to allow as much flexibility as possible from the start.

As I set up the project for the layer, I have already decided on its domain and constraints. I can use that to guide the test harness.

In most cases, the test harness is the first thing I write.

So, as I develop each function, I start testing it almost immediately. At first, the tests are fairly manual, and rely on “internal access,” but that changes, as I go up the stack.

The important thing, is that as much code as possible, is exercised, as soon as possible after creation.

Here’s the final block design for BADGER:

The “forensic” part is that I develop this AFTER actually writing the code. It is “ground truth.”

I write all my code to be exhaustively documented by Doxygen or Jazzy. I rely on that to provide a great deal of the “internal user manual” for the software.

That means that I need to write SERIOUSLY GOOD code comments. Not just a simple catalog of parameters and arguments; but a discussion about WHY things are the way they are.

The good thing is that I no longer have “concrete galoshes,” yet I also have complete documentation for the project.

No, I am Not Going to Discuss Every Layer

That would be tedious in the extreme. I just wanted to show you how I start. The process is similar for each layer, with the test harnesses getting more powerful as I climb the stack.

Also, I am not so concerned with the tests becoming obsolete as I walk up the stack. If they do get knocked out by a fix that I make after the fact, it’s not that big a deal, as I have even more complete tests running on the higher layer. I also know that it probably wouldn’t take much to go back and adjust the test to match the new ground truth, if I should so desire.

The simple fact of the matter is that every layer is tested brutally before moving on to the next layer. Also, each layer’s tests are designed to exercise the previous layer, as well, so it’s really a “stack test.”

Experience also plays a big part, here. I sometimes need to write out an API between layers, but try to avoid that, if possible, as every word I write adds some concrete to the bucket. If I have a clear vision of the layer connection, I can map out the interface fairly easily. Experience means that I don’t need to go back and change it after I move up to the next layer.

However, I Will Mention the BAOBAB Test Harness Project

That is the “top layer” project that implements the “deliverable” of the BAOBAB Server. Its test harness is fairly intense. It’s automated, and uses cURL to drive the server through its REST interface. It takes nearly half an hour to run on my laptop, but considerably less on a hosted instance.

This is the design that ended up reflecting the BAOBAB Server as it was released in March of 2019:

Note the “REST PLUGINS” section at the top. That’s where the future extensions can be written to handle services like OAuth (like Vault). Also, there are REST commands for doing things like loading external data into the server (replacing that “Bulk Importer” in the Napkin Sketch). The gateways are handled by the REST API, and it is possible to write REST plugins that can implement gateways.

If you were to “swap out” the Model, it would happen at the ANDISOL layer. That is the layer that completely abstracts the database from the rest of the server.

The SDKs and whatnot were added after the server was complete, and are projects in their own rights.

Another thing that I did, and it isn’t obvious at all from the diagram, is that I allowed the server implementor to add a few callout functions that would do things like validate logins (so you could completely ignore the internal login credentials, and use an entirely separate system), trace database interactions, or track all server interactions in logfiles.

LESSONS LEARNED

There was no way that I could have predicted many of the issues that I encountered as I was developing the BAOBAB Server. It needed to happen organically.

If I had followed my Napkin Sketch completely, the product would have been lower-quality, less secure and far less flexible.

This is very much like “paving the bare spots.”

HAVE YOU BEEN EXPERIENCED?

As I mentioned at the start, DON’T DO IT THIS WAY UNLESS YOU ARE EXPERIENCED!

let capable = (Smart != Experienced) && (Educated != Experienced) && (Hardworking != Experienced)

I’m quite aware that I could have done this in many different ways other than the manner I chose. I am quite aware that there are dozens of places where a more efficient algorithm could have been used, or an alternative design could have been implemented; giving me greater flexibility or robustness.

But that was not what happened. I wanted to write SOFTWARE THAT SHIPS.

“Shipping is the #1 feature of any software project.”
-A Manager I Worked With, Years Ago.

Even though the project ended up being an open-source, MIT-licensed server, I wanted to treat it as a shippable product; employing a true engineering, delivery-oriented workflow.

The entire thing was written by me -ENTIRELY ALONE- in less than seven months.

The deliverable was a top-quality codebase, that I had no problem throwing open, complete with extensive documentation and tests.

CONCLUSION

The whole idea behind “Forensic Documentation” is that I keep the project fluid and flexible, the entire way, but at the same time, I need to have the benefits of a rigid, disciplined structure, and the end results need to be damn near perfect. I have extremely high standards.

Forensic Documentation gives that to me.

Yeah, this is kind of a “one-off” project. Most folks don’t get to write their projects from soup to nuts alone. Also, the “no dependencies” thing kind of makes this a “mutant” project, and that it could have been done in a third of the time, if I had used some pre-established libraries.

The flexibility, modular nature and REST plugin system gives BAOBAB a lot of oomph towards using dependencies, and it gives the implementor iron-fisted control of how the dependency integrates. It’s quite possible to sandbox dependencies.

However, I feel that we still haven’t quite emerged from the “dependency dark ages” yet. We are seeing some really good, robust stacks, but they are mixed with some really awful ones that look just as good. I think we probably have another three years before the dust truly settles, and we will have a reliable, well-understood, tested, secure and robust platform of frameworks and libraries.

Also, remember that my personal goal was to exercise my “engineering muscles.” I had been a manager for twenty-five years, and all my open-source work was done on nights and weekends. I needed to get back to full-time, no-distractions, delivery-oriented coding. BAOBAB was my thesis.

I am now working on a much smaller, but (to me) far more interesting scope: Device control, using Swift, for Apple devices. I can use the same techniques for these projects, and I barely need to think about them. Also, the Apple development ecosystem gives me a lot more flexibility and power in the tools at my disposal. It’s easier for me to do this with Xcode and Swift.