The great pyramid
This article is a translation from the French original post on Arolla‘s blog.
Pluto Krath was approaching his office, almost running in anger. He didn’t mind the indistinct voice.
— K.I.T.T, remind me to buy some milk when we’re back to F.L.A.G.
He entered the room, put his laptop on his desk:
— What do they don’t get about the test pyramid? Even a kid could get it.
He turned around and froze, his eyes screaming, his mouth professionally shut.
— What the frog are you doing here, David?
Colonel Shepard was again missing from his Sharknado The 4th Awakens poster. David Hasselhoff was standing near the window. He raised his watch at his mouth.
— Forget about the milk K.I.T.T. This is gonna take a long time.
— Got it Michael
— Now you’re the Knight Rider, fabulous. Non-sense down to your fingertips.
— So the test pyramid uh?
Pluto sat on his chair.
— Fine let’s pretend. Yeah, the devs don’t get it. Our automatic test campaign lasts forever and unit tests don’t catch any bug. I’m trying to explain them how to do unit, service, and UI tests, and all they tell me is “it won’t work here”. How many times have I heard that?
David had a look outside the window
— The test pyramid, that was a great metaphor. Many unit tests, few end-to-end tests. You’re right, everyone understands it. You must be doing something wrong. How did you help them get it?
— I told them that in order to have a quicker campaign, they should turn more integration tests into unit tests. That would even allow them to test more edge cases, thus making their testing more robust.
— Were integration tests part of the testing pyramid?
— No, the pyramid only talked about unit, services, and UI tests. But you see my point. You said it yourself: many unit tests, few end-to-end tests.
David blinked maliciously at Pluto.
— Oops, I’m using the wrong terms. Probably because I never really understood the pyramid myself. And you didn’t get I didn’t get it. So I’m wondering what you got they didn’t get. Words are important when you explain something. Dev is all about detail.
— Oh come on. Everybody knows what a unit test is. You test a method or a function, so that it doesn’t do any I/O. It makes your test quick, and you can run tens of thousands of them in seconds.
— And services tests? Integration tests? System tests? End-to-end tests? UI tests? I can point you to several pyramid alternatives that differentiate those. Because the pyramid needed amendments for different contexts. Still, none of these pyramids makes any consensus. I can also guarantee you that unit test advocates don’t agree on what it means, for that matter.
— Sure, you can split hairs. Still, the test pyramid has never failed me. In all the teams I worked with, we separated 2 or 3 categories of tests, and we always had a a solid test campaign. I’m trying to help the team with my experience, here.
— Here are a few other common features of your past products. They were all small textbook monoliths, with very few dependencies with other teams. One was a textbook Spring/react shopping cart, the other one was a textbook nightly file aggregation batch. And last but not least, none was the product you’re working on now. Context is everything. How is your current product different?
Pluto leaned forward. He suddenly got enthusiastic.
— It’s a set of 17 micro-services, developed in several languages, sometimes by people outside of the team. Its features sometimes depend on services provided by other teams. It’s distributed, running non-stop, and services are deployed independently.
— And the micro-services are more or less close to the OS resources, more or less critical from a performance point of view, developed at various times, have various levels of maturity, depend more or less on directory trees or scheduling of shell scripts. They sometimes get split, their contracts evolve, as the team learns about what it must and can do.
— What’s the link with a unit or a service test?
— Well what do you call the service to test? The user service at the product boundary, or the micro-service http interface? What do you call a method or function to unit test in C? In haskell? In shell? How do you talk with another team to test the service globally?
— Again, that’s nitpicking. We’ll just come up with our definitions of Mike Cohn’s words explicit, and we’ll be fine.
— First congratulations. You just said you’d make definitions explicit, and that’s a move worth making. People arrive with their culture, with different definitions of the same words. If it’s as simple as you think, it won’t take much time. Second, so you want to keep the original three words.
— Yes the magic number.
— Remember your Spring project, where you bundled everything that was neither unit nor end-to-end as integration tests. You could feel the opportunities you lost by trying to horseshoe each of those tests in a common framework. That felt heavy didn’t it?
Pluto looked down his memories
— Oh yes it did. There were quite some fights about it. So let’s just name a few categories in our context, and use them everywhere.
— Exactly! And don’t forget to keep this list alive. Now you need to find what the context is. Do you have a single context? What about the department? Does technology matter? Domain? Culture?
— Do you mean we should have different sets of categories for different services? Or sets of services? Or that we should share some vocabulary with other teams?
— All, none, I don’t know. Talk about it. Keep whatever makes sense. Make things explicit and relevant.
— So the only remaining thing about the pyramid is the triangle.
David sat on the sofa and stared at Pluto.
— What? Tests are either quick and reliable or they represent reality, and are fatter, more flaky. Aren’t they?
— Hmm that’s interesting
— Are you doing the shrimp?
— Am I doing the shrimp?
— You’re making your point by yourself. Tests have several desired properties, and you want to maximize all of them. Among those properties, you can find:
- Speed, to run loads of tests.
- Expressiveness, in code and in diagnostics. When some test fails, you want to know why.
- Isolation, of runtime environment and of tested features.
- Reliability, i.e. not flakiness. Tests that embed much stuff are the most flaky.
- Representativity. For example, is your test representative of real life if it does not include scheduled scripts that move input files to your input folder? Is production environment representative if you use a test flag for the incoming request?
- Coverage. Imagine the combinatorial explosion of edge cases you need to test.
- And many more. Don’t ask me a mnemomic acronym.
You can see that most of them conflict with each other. For each context, you’ll find several toolboxes with a fair compromise among these properties. In some contexts, they’ll be tightly ordered as a pyramid. In some others, you’ll be able to maximize most of the properties in a single category. The properties don’t balance each other on a linear gradient. It depends.
— It’s more a set of multi-dimensional bubbles than a pyramid.
— But “Testing shows the presence, not the absence of bugs“, doesn’t it?
— I never said the opposite. I talked about fair compromises. You’ll identify the right bubbles with your sensibility. It’s about trust and confidence, it’s very subjective. Oh, and of course, we’re only talking about automatic stuff, not about actual testing.
— Finally, nothing is left from the pyramid. The Knight Rider beats Mike Cohn.
— Not at all, are you crazy! The metaphor is great, I told you. When I discovered it, it was an epiphany. We’re talking about your precise context, not a generic frame of mind. The pyramid is just a model, so it’s wrong, that’s all. This one is particularly simplistic to stand the contact with the field, but it’s still very useful as a basis for thinking.
Pluto laid back on his chair, rubbing his nose, and stretched his legs.
— Real life is more nuanced than the illustration, I get it. You did it again, I’ll have to get back to the team.
When Pluto opened his eyes, David Hasselhoff was back in the poster.
— I really need to sleep more.