META: This is also posted on Medium
“The most erroneous stories are those we think we know best -and therefore never scrutinize or question.”
Stephen Jay Gould
“Concrete Galoshes” is the nickname I give to project assets that “weigh the project down.” Those aspects of the project that prevent it from changing, or at least that make project changes more difficult.
“Concrete Galoshes” are not always bad. In fact, projects that have no structure are often referred to in the vernacular as “failures.”
STRUCTURE VS. FLEXIBILITY
A project needs to have structure. We need to be able to calculate and track costs, coordinate releases, validate bug fixes, test and address usability problems, ensure branding cohesiveness, write documentation, etc. “No structure” is not an option; especially for extremely complex projects that cross department boundaries and involve large teams.
The problem comes when the structure defines the project. In some cases, this is entirely appropriate. For example, algorithm engines should be absolutely rock-solid: tested, reliable, trusted, documented, and supported. They shouldn’t change often, and when they do, it should be a big deal.
A splash screen, on the other hand, is a completely different matter. It could change on a seasonal basis. Some apps change the splash screen almost every time you launch them (think many games).
Each has its priorities (and project sponsors). The algorithm engine is the “heart” of a feature. If it’s a central feature of the app (think Google’s search core), then it’s incredibly important, and should be sponsored by some of the top corporate officers. It may be cared for by the CTO and the top corporate scientists. The splash screen, on the other hand, would most likely be sponsored by the Marketing department. It may be a primary vehicle for brand reinforcement, which could be every bit as critical to the corporate bottom line as the algorithm engine. Its sponsors might include the CMO.
Which is more important? The solid, science-backed algorithm engine, or the “flighty” splash screen? Truth be told, they are both critically important to the company. Without either one being treated as a “first-class citizen,” the entire corporation could go down the drain.
Yet each also represents an entirely different project structure requirement. If the algorithm is changed frequently, and primarily managed by Marketing, it’s likely to develop severe quality and performance issues. The teams that rely on it would suddenly lose faith in it, and it could weigh them down with the need for excessive testing and validation. If the splash screen is prevented from changing to fit the needs of a rapidly-changing consumer environment, then it completely loses its effectiveness and power as a brand reinforcement and customer engagement tool.
LET’S LOOK AT AN EXAMPLE
Let’s say that we are in a company that has 5 different applications; most, based upon a central algorithm, and some that feature a branded splash screen (We’ll say the butterfly logo is “the brand”):
PROJECT B and PROJECT E are what the corporation has decreed to be “critical” apps. Let’s assume PROJECT B is a heavy-duty, workhorse application of the central algorithm, and PROJECT E is an interactive documentation and support portal app.
Notice that 4 of the 5 projects all use the central algorithm. PROJECT B doesn’t use the splash screen, and is the corporate gold mine. It sells for hundreds of dollars a seat, and is used by folks with lots of letters after their names.
It doesn’t use the splash screen. We would assume this is because it’s “serious software,” and no one cares about “frivolities.” If we sat down and chatted with folks that know the history of the project, we’d see that the application was first developed when the CEO was still in college, and thoughts of marketing and “bling” were nowhere to be found. The CEO was an engineer first, and that guided her work. It wasn’t until the stock had split that someone thought to set up a Marketing and Branding division. It wasn’t because it’s “serious software.” It was because no one had commoditized the splash screen until the project was a well-established code base people didn’t want to mess with. If someone insisted on adding the splash screen now, and there were problems, heads would roll. No one wants to stick their neck out.
Another issue is the two projects are run by two completely separate organizations. The Core Algorithm Group is an entity unto itself. The project leader reports directly to the CEO, who still insists on having a hand in the project (It’s her “baby,” after all). It also is the entity that develops and maintains PROJECT B. All the other projects are run by the CTO and the Software Engineering division. The CTO is a relatively new hire. Even though he has a higher position than the Lead for the Core Algorithm team, he doesn’t have the same influence, and won’t be trusted with stewardship of the Core Algorithm project.
Welcome to corporate reality and basic human nature.
Now, the CTO is a very capable chap. One of the first things he did, upon arriving, was to have lunch with the CMO, and analyze the requirements for the market. The CMO was, at the time, beginning a project to establish and build the corporate brand, and the CTO saw that he could help by developing a simple-to-use splash screen module. That’s why it features in 3 of the 5 main apps. Since doing that, the brand recognition has gone through the roof, and the company has become synonymous with its principal product (PROJECT B). It wasn’t just the splash screen, but that helped. The reason it helped, despite not being used in PROJECT B, is because only people who are already sold use PROJECT B. Their bosses tend to use PROJECT E, which is why PROJECT E is such a high-priority project. The folks that make the decisions see the branding, and the recommendations from their skunkworks people will come, regardless of branding.
The CTO runs a fairly “agile” shop. He has a very responsive, iterative development process that is basically constantly working on the product. He uses TDD and has a very data-driven process. The Lead of the Core Algorithm project, on the other hand, uses a classic “Waterfall” process. The project is quiescent, much of the time, with work done in carefully-planned, linear sub-projects. Nothing gets done until the sub-project has been rigorously specified, the specifications vetted, and a careful, test-driven project plan has been established. Both methodologies are modern (Waterfall can be modern, too, if done right), well-structured and disciplined. Each is basically perfectly suited for their project.
THE REST OF THE STORY
Now, let’s zoom into that “frivolous” splash screen project.
Let’s say it’s composed of 4 modules:
- The Display Module, which has graphics drivers and image processing routines to present the various assets and corporate messages
- The UI Module, which runs a minimal UI that allows the user to have some interaction during the loading process
- The Communication Module, which is responsible for querying the corporate servers for updates and new assets
- The “Brand” module, which is really just a resource manager
Let’s think about each of these, in terms of complexity, risk and propensity to be changed:
The Display Module
This is a very dynamic module. Graphics hardware is constantly changing, and the landscape is fluid. There’s a fair amount of realtime rendering and even video playback in the splash screen, so this needs to be at peak performance, while consuming as few CPU resources as possible (remember that a splash screen tends to be shown while the main app is doing something else in the background, so the splash screen can’t be a hog).
The UI Module
This is a very basic module. There’s not much UI, but there is one button that can be dynamically repositioned, and there is a requirement for a radio button set to be occasionally displayed. It’s important to be reactive.
The Communication Module
This is an important module. It connects to the corporate servers, and exchanges information, picking up instructions on what to display, which is transmitted using a lightweight JSON protocol. Communications are encrypted, as it’s important not to let someone use this as a “backdoor” to the system, while also allowing the corporation to control what goes into the splash screen. It may be used as a registration validation.
The Brand Module
This contains branding elements in an internal cache, updated via the Communication Module. It’s vital that the branding elements never get corrupted, and are available quickly and with little processor or data bus overhead.
Development Process Alternatives
Looking at these four modules, we see we have different priorities, and, one would think, different development processes. For example, the UI Module probably doesn’t change often, isn’t particularly complex, but must work correctly and effectively.
The UI Module mostly uses the manufacturer SDK and the need for user interaction is low (It’s a splash screen, after all), so it’s not a disaster if there are issues.
The Display Module is fairly complex, and undergoes a great deal of change. It also needs to be tested against many different hardware configurations. However, it is also the one best suited to rapid turnaround, and uses manufacturer SDKs, making it slightly less risky than other modules.
The Communication Module probably doesn’t change too often, but is quite complex, with proprietary code and glossaries. It uses encryption, and security is of utmost importance. It needs to be rigorously tested whenever a change is made (In reality, this would usually be a separate library that applies to the entire project, but this is a simple example, and we’re making believe the splash screen has its own comm module).
The Brand Module is the most “important” of the four modules. The branding elements must always be in pristine condition, and in sync with the current corporate brand guidance. In addition to graphical assets, it also has talking points and localization glossaries.
Of course, the whole project needs to be tested, to some degree, whenever any of these modules are changed.
Let’s display these in a 3-dimensional quadrant chart:
The X-axis is complexity, from least complex on the left, to most complex on the right.
The Y-axis is risk, from least risky on the top, to most risky on the bottom.
The Z-axis is likelihood of being changed, from least likely in the back (smaller square), to most likely in front (larger square).
This should give us a good idea of how each project should be managed. It looks like the Brand Module should be treated in a similar manner to the Algorithm Core project. It could even have a waterfall process. The UI Module is a bit of a toss-up. It isn’t that risky, but is also fairly simple and won’t change too often. You could apply a number of processes to it.
The two scary projects are the Display Module and the Communication Module.
The Display Module practically cries for an iterative, agile process. The Communication Module probably needs an agile process, but also needs really, really rigorous testing. A screwup in the Display Module is embarrassing, but if the Communication Module gets borked, you could have a huge security problem. An “ELE,” so to speak.
HERE’S WHERE THE CONCRETE GALOSHES COME IN
As I said in the beginning, “concrete galoshes” refers to the assets and aspects of a project that “tie it down.” In my experience, the two biggest “galoshes” are:
- Specifications
These often represent many man-months of work. They can be enormous documents that need to have sixteen layers of approval for even the smallest change. - Documentation
These can be gigantic asset libraries, incorporating hundreds of pages, Web sites, video, promotional material (bobblehead dolls, for instance), training modules, etc.
With TDD taking over, there’s another HUGE pair of boots: TDD Unit Tests. The TDD process requires tests to become permanent project assets. The basic premise is that you write a unit test (before writing the code), work on the code until the unit test passes, then keep the unit test as part of the permanent project infrastructure, so that you are constantly rerunning it whenever a build is made. Whenever a new release is made, you should apply the entire library of tests to ensure there are no “atavisms.” This is an excellent practice, and goes a long way towards creating extremely high-quality code that has “lasting power.”
That said, it’s also one hell of a “concrete galosh.” You don’t want to introduce new changes that could invalidate or cause those tests to fail.
STORY TIME
A few years ago, I wrote an Objective-C static library. It was an iOS/MacOS WiFi ISO 15740 (PTP-IP) driver SDK for controlling cameras (but was never used).
At the time, TDD was starting to pick up steam, so I applied TDD to the process, writing tests for each of the packet parsers (there were a LOT of them), then writing the code to pass the tests. I used an aggregation model (as opposed to a class hierarchy model), so I could easily slap in a mock.
Once I had the SDK’s packets all decoding and encoding (verified by piece-by-piece tests of simple data structures) properly, I made a mock for many of the various datasets. It was a huge pain, and took a couple of months, but I was able to make a passable build-time unit test, featuring a rather primitive, but effective mock. I ran builds, and everything passed swimmingly. Woo-Hoo! This TDD stuff is AWESOME!
GET YOUR HANKY – HERE COMES THE SAD PART
Then I connected the TCP/IP transport, bringing in the system’s BSD Sockets and worked up from there. I wrote a Network and a Transport layer, and connected it to the same spigot I used for the mock. I then hooked up a shipping device, so I knew I would get good data, and hit “Build.”
BANG
Everything went to hell. Xcode lit up like a Christmas tree.
You see, despite the ironclad rules in the standard, there were still avenues for interpretation, and the devices took liberties, like using two simultaneous sockets for the data transfer and the heartbeat (I’m embarrassed to admit how long it took me to figure that out).
Now, I could have gone back and spent another couple of months fixing the mock and tests to reflect the “ground truth,” but, to be honest, I would have rather spent the time banging rusty carpet tacks through my thumb.
So I removed all those dozens of tests from the lowest-tier parsers.
I know, I know, I’m going to hell, and Satan will torture me by forcing me to program Newtons in COBOL, but I had to do it.
The tests had done their job. They had proven the lower-level parsers and generators. I just couldn’t keep them as permanent project assets. Removing them meant I added considerable risk to working on the parsers and generators, but I reasoned that they wouldn’t be changing in the foreseeable future. The vendor-specific packets were interpreted at a higher level that could still have tests.
There. You’ve got my confession. Can I have my carpet tacks back?
Now, I was faced with what is known as “ground truth.” It happens almost all the time. I’ve learned that the old dream of being able to catch all possibilities in the specification phase died about the time the last IBM 360 hit the landfill.
There’s an old Swiss Army saying:
“When the map and the terrain disagree; believe the terrain.”
The map in my hand said “6-lane interstate,” but the terrain in front of me said “game trail.”
Truth be told, I may have been able to do a better job of writing the testing infrastructure, so the tests wouldn’t have blown up quite so badly, but that would have introduced another showstopper: The mock.
Writing a testing mock can be a HUGE deal. When you have a communication substructure, the mock project could easily become larger than the main project, and the mock needs to be just as high quality as the software it’s supposed to test.
Since I was working on this project completely alone, I had limited options. The mock was gonna remain as a simple playback harness of pre-written JSON code. That didn’t reflect the reality of the actual devices in that nasty, messy, signal-attenuating-packet-dropping-device-going-to-sleep-in-the-middle-of-the-conversation-like-Grandpa-when-he’s-been-at-the-schanpps real world out there.
If you’ve never written low-level communication software before, take my word for it. It is a FAR from perfect world. There’s a reason for all those blasted handshakes and checksum packets (Frankly, there’s seldom a need for it, these days. The built-in libraries tend to take care of all that crap for us. This was a “one-off” project).
So the mock went down the drain, as well. It was now officially worthless. In fact, it was downright dangerous, as it failed to anticipate the vagaries of the real device interfaces, and encouraged me to write bad software. I had become TOO reliant on my “in a perfect world” tests.
All that said, the project was one of the best communication drivers I’ve ever written, and I can credit the rigorous testing I did at the beginning. It was never used, but I also wrote the project unsolicited, so couldn’t have an expectation that it would have been used.
YEAH, YEAH. SHUDDUP AND GET BACK ON TOPIC.
Where were we? Oh, yeah…the Splash screen project.
So, it looks like we can’t apply the same development process to all 4 modules in the Splash Screen, and we should determine who should be fitted for concrete galoshes.
- The Display Module: Should have some permanent unit tests, but we need to be careful of ones that could get scragged by real-world problems (Doc Martens).
- The UI Module: Probably will do fine with a simple test harness and a couple of light-duty unit tests. Doesn’t need much documentation, and the specs are pretty light (Sandals or Sneakers).
- The Communication Module: Needs a hard-core set of unit tests, good documentation, including cookbooks, an audit plan, and heavy-duty specs (Definitely Concrete Galoshes).
- The “Brand” module: Only a few unit tests are probably necessary. It mostly needs audits by human eyeballs, and good specifications. It may actually be managed by an outside department (Either Doc Martens or Concrete Galoshes).
USE THE RIGHT TOOL
In the examples given above, a NASA-like SEI CMMI Level 5 process would probably be a good choice for the Algorithm Engine, but some of these projects would be absolutely destroyed by the kind of rigor that would be applied to the Algorithm Engine. In those cases, a CMMI Level 3 (or even 2) might be better.
Here’s a really good reason to reduce the degree of “concrete galosh” in a project:
In almost every project I’ve ever worked on, I have encountered unexpected, unplanned-for “curve balls.” Sometimes, these were bugs, sometimes, the competitive environment changed, but most times, we just ran into an unexpected usage. People clicked on the right link, even though the left link was the one covered in affordance. People didn’t understand the icon, even though it was based on an ISO icon. People never even knew it had that capability because they never saw that screen, and so on.
In many of these occasions, the fix was absurdly easy, but was never done.
Because the manual had already been printed. Because no one wanted to talk to the cranky VP and get his signature on the change order. Because Joanne didn’t care whether or not the users preferred the right link, because THE USERS ARE WRONG, and Joanne is in charge of the signoff. and so on.
In other cases, the fix required a fairly drastic refactor, but it was easy to do, because there was so little extra baggage. In my experience, doing the coding isn’t actually a huge deal, once you’ve figured out the problem. Making sure it works properly (and was done properly) takes a bit more time.
“Once a new technology rolls over you, if you’re not part of the steamroller, you’re part of the road.”
— Stewart Brand
Anyone who has been in the software industry lately knows that six months is a lifetime. Vast change can sweep through an entire segment like a tornado. If we aren’t able to change with it, then we will be in trouble.
The other side of the equation is that “concrete galoshes” exist for a reason. There’s some stuff that JUST. SHOULDN’T. BE. CHANGED. ON. A. WHIM., no matter what the industry is doing. We shouldn’t always chase transient fads or freak out every time someone says “boo!”
IT’S NOT AN “EITHER/OR” CHOICE
People who are committed to one design philosophy or discipline, tend to cast these types of choices as “binary.” You either use THEIR way, or the wrong way. You are either “THEIR” kind of engineer, or you are a poor, benighted soul; bereft of clue. I’ve been encountering this since I first started programming. With development discipline, it’s almost always cast as “Super-Competent NASA Engineer/Crazy Undisciplined Party Animal,” or “Visionary Artiste/Uptight Obstructionist”, depending on who you talk to.
BE CAREFUL OF BANDWAGONS
We’re constantly encountering people standing on soapboxes, megaphone in hand, declaring that they’ve found “The New Way™,” after reading a book, attending a conference or having some really…interesting…punch at a party. It can get exhilarating…and infuriating. They insist on getting everyone frightened to death that “they’re missing the boat,” and that we need to follow them, put down all other “primitive” tools, grab a tuba, and climb aboard.
…AND TRAIN WRECKS…
What inevitably happens, is that we encounter a situation where the “new way” is inappropriate, and, instead of saying “Well, I guess we could try something different here, do you have any ideas?” we say “No, no, no! You’re looking at it wrong! Here, twist it like THIS, then bend it like THIS, and squeeze it like THIS…you there, gimme that hammer, willya? -BANG- -BANG- -BANG- See? Fits perfectly!”
At that point, all credibility goes right into the bog. “The New Way™” gets written off as a fad, and the people who fear change feel vindicated; especially if the person selling “The New Way™” was particularly grating or patronizing in their approach (I’m sure that NEVER happens </s>).
“We should be careful to get out of an experience only the wisdom that is in it -and stop there; lest we be like the cat that sits down on a hot stove-lid. She will never sit down on a hot stove-lid again -and that is well; but she will never sit down on a cold one anymore.”
–Mark Twain
This is not an optimal outcome, any way that you look at it. “The New Way™” usually becomes popular EXACTLY BECAUSE it’s an excellent idea, and a very important technique/technology/philosophy/whatever. Ignoring it because of one instance where it doesn’t work is destructive, as it will also not be used in places where it does work. This is something that has happened to me – many times. I have been the arrogant fool in the circus outfit, trying to convince skeptical, seasoned veterans that they are doing it wrong (I can’t IMAGINE why they didn’t want to listen to me </s>).
This is a really sensitive issue, and we can sometimes get so caught up in our own zeal we can forget “the other side” also has very valid points and concerns. My experience is that there are very few “lazy” software engineers. Instead, there’s a lot of uncertainty and fear. Failure in the industry is rampant, and we seem to go through an endless succession of “The One, True Path™”. It can get exhausting, trying to keep up. Expensive and demoralizing, too. In my experience, there is no “Philosopher’s Stone” for all matters software development, and we need to be careful not to drown in Kool-Aid. Here’s an example from my own experience of reality being orthogonal to dogma.
FEAR IS THE MIND-KILLER
In over 30 years of working in the industry, I have never once encountered anyone who was willfully destructive or malign. I have encountered a hell of a lot of people who were scared; often because they felt like they were “in over their heads.” That’s a big reason I insisted on being a full working engineer during my tenure as a manager. That kind of fear is a killer, and managers often succumb to it. Since managers also tend to have a lot of power, Bad Things can Happen.
I’ve watched frightened and insecure people do a great deal more damage than deliberately malevolent people could ever dream of.
Some managers believe a state of anxiety and fear is “productive.” The theory is that scared and insecure people work harder than people who are secure. That probably works great in some departments, but I’ve yet to see it yield positive results in the Engineering Department. Scared people don’t think straight, and don’t “think outside the box.” I have always believed an atmosphere of fear is counterproductive, and, as a manager, I worked to reduce that particular type of tension.
Of course, when we encounter a “crunch time,” we need to pull up our socks and roll up our sleeves, but this should be an exception; not a rule.
HERE’S A CRAZY IDEA
Try this new-fangled team-building technique called “Talking to Each Other™.” Sit down and find out what the others are afraid of. Tell them your fears, and, for God’s sake, give them some respect, even if it seems they are ignorant or willfully destructive.
The goal of this new-fangled team-building exercise is NOT TO CONVINCE THEM TO TAKE YOUR SIDE. It’s to work out what is best for the project, and drain the fear factor out of the project.
I know this is just crazy talk, but hear me out: THEY JUST MIGHT BE RIGHT, or maybe partially correct. The issue they mention could happen, but there’s a vanishingly small chance of it actually happening. That said, it can sometimes be easy to throw in a quick mitigation step or error handler, instead of adding a new chapter to the specification, or writing an enormous suite of tests.
There’s a lot of talk about technical debt. People will say that skimping on the structure will increase technical debt. Usually, they are absolutely correct. That said, excessive structure also brings serious debt. Debt comes from different sources, and you need to calculate TOTAL debt, not just what fits your frame. If you can’t change a project to meet a new industry need, or you damage your brand by not updating your branding elements in a timely fashion, all sorts of nasty things could happen, including shuttering the company or canceling the project (I know this effect quite well -all the structure in the world won’t help you if you can’t deliver something the users like and use). We need to realistically analyze our plans, and look at ALL ramifications; not just the ones the textbooks and keynote speakers highlight.
Concrete galoshes are often the product of fear. In some cases, the fear is justified, and in others, people are so convinced that no one else “gets it,” that they need to throw as many speed bumps into the path as possible to keep the project from “going over the cliff.” If we can get folks of varying disciplines and experience together, working with each other, instead of fearing each other, we can make sure that the various subprojects are fitted for exactly the right shoes, whether it be a pair of flip-flops, or 16-hole, steel-toed DMs.