Category: software development

The great pyramid

This article is a translation from the French original post on Arolla‘s blog.

Pluto Krath was approaching his office, almost running in anger. He didn’t mind the indistinct voice.

— K.I.T.T, remind me to buy some milk when we’re back to F.L.A.G.

He entered the room, put his laptop on his desk:

— What do they don’t get about the test pyramid? Even a kid could get it.

He turned around and froze, his eyes screaming, his mouth professionally shut.

— What the frog are you doing here, David?

Colonel Shepard was again missing from his Sharknado The 4th Awakens poster. David Hasselhoff was standing near the window. He raised his watch at his mouth.

— Forget about the milk K.I.T.T. This is gonna take a long time.
Got it Michael
— Now you’re the Knight Rider, fabulous. Non-sense down to your fingertips.
— So the test pyramid uh?

Pluto sat on his chair.

— Fine let’s pretend. Yeah, the devs don’t get it. Our automatic test campaign lasts forever and unit tests don’t catch any bug. I’m trying to explain them how to do unit, service, and UI tests, and all they tell me is “it won’t work here”. How many times have I heard that?

David had a look outside the window

The test pyramid, that was a great metaphor. Many unit tests, few end-to-end tests. You’re right, everyone understands it. You must be doing something wrong. How did you help them get it?
— I told them that in order to have a quicker campaign, they should turn more integration tests into unit tests. That would even allow them to test more edge cases, thus making their testing more robust.
— Were integration tests part of the testing pyramid?
— No, the pyramid only talked about unit, services, and UI tests. But you see my point. You said it yourself: many unit tests, few end-to-end tests.

Copyright Mountain Goat Software and Mike Cohn

David blinked maliciously at Pluto.

— Oops, I’m using the wrong terms. Probably because I never really understood the pyramid myself. And you didn’t get I didn’t get it. So I’m wondering what you got they didn’t get. Words are important when you explain something. Dev is all about detail.
— Oh come on. Everybody knows what a unit test is. You test a method or a function, so that it doesn’t do any I/O. It makes your test quick, and you can run tens of thousands of them in seconds.
— And services tests? Integration tests? System tests? End-to-end tests? UI tests? I can point you to several pyramid alternatives that differentiate those. Because the pyramid needed amendments for different contexts. Still, none of these pyramids makes any consensus. I can also guarantee you that unit test advocates don’t agree on what it means, for that matter.
— Sure, you can split hairs. Still, the test pyramid has never failed me. In all the teams I worked with, we separated 2 or 3 categories of tests, and we always had a a solid test campaign. I’m trying to help the team with my experience, here.
— Here are a few other common features of your past products. They were all small textbook monoliths, with very few dependencies with other teams. One was a textbook Spring/react shopping cart, the other one was a textbook nightly file aggregation batch. And last but not least, none was the product you’re working on now. Context is everything. How is your current product different?

Pluto leaned forward. He suddenly got enthusiastic.

— It’s a set of 17 micro-services, developed in several languages, sometimes by people outside of the team. Its features sometimes depend on services provided by other teams. It’s distributed, running non-stop, and services are deployed independently.
— And the micro-services are more or less close to the OS resources, more or less critical from a performance point of view, developed at various times, have various levels of maturity, depend more or less on directory trees or scheduling of shell scripts. They sometimes get split, their contracts evolve, as the team learns about what it must and can do.
— What’s the link with a unit or a service test?
— Well what do you call the service to test? The user service at the product boundary, or the micro-service http interface? What do you call a method or function to unit test in C? In haskell? In shell? How do you talk with another team to test the service globally?
— Again, that’s nitpicking. We’ll just come up with our definitions of Mike Cohn’s words explicit, and we’ll be fine.
— First congratulations. You just said you’d make definitions explicit, and that’s a move worth making. People arrive with their culture, with different definitions of the same words. If it’s as simple as you think, it won’t take much time. Second, so you want to keep the original three words.
— Yes the magic number.
— Remember your Spring project, where you bundled everything that was neither unit nor end-to-end as integration tests. You could feel the opportunities you lost by trying to horseshoe each of those tests in a common framework. That felt heavy didn’t it?

Pluto looked down his memories

— Oh yes it did. There were quite some fights about it. So let’s just name a few categories in our context, and use them everywhere.
— Exactly! And don’t forget to keep this list alive. Now you need to find what the context is. Do you have a single context? What about the department? Does technology matter? Domain? Culture?
— Do you mean we should have different sets of categories for different services? Or sets of services? Or that we should share some vocabulary with other teams?
— All, none, I don’t know. Talk about it. Keep whatever makes sense. Make things explicit and relevant.
— So the only remaining thing about the pyramid is the triangle.

David sat on the sofa and stared at Pluto.

— …
— What? Tests are either quick and reliable or they represent reality, and are fatter, more flaky. Aren’t they?
— Hmm that’s interesting
— Are you doing the shrimp?
— Am I doing the shrimp?
(sigh)
— You’re making your point by yourself. Tests have several desired properties, and you want to maximize all of them. Among those properties, you can find:

  • Speed, to run loads of tests.
  • Expressiveness, in code and in diagnostics. When some test fails, you want to know why.
  • Isolation, of runtime environment and of tested features.
  • Reliability, i.e. not flakiness. Tests that embed much stuff are the most flaky.
  • Representativity. For example, is your test representative of real life if it does not include scheduled scripts that move input files to your input folder? Is production environment representative if you use a test flag for the incoming request?
  • Coverage. Imagine the combinatorial explosion of edge cases you need to test.
  • And many more. Don’t ask me a mnemomic acronym.

You can see that most of them conflict with each other. For each context, you’ll find several toolboxes with a fair compromise among these properties. In some contexts, they’ll be tightly ordered as a pyramid. In some others, you’ll be able to maximize most of the properties in a single category. The properties don’t balance each other on a linear gradient. It depends.
— It’s more a set of multi-dimensional bubbles than a pyramid.
— Coughnerdcough.
— But “Testing shows the presence, not the absence of bugs“, doesn’t it?
— I never said the opposite. I talked about fair compromises. You’ll identify the right bubbles with your sensibility. It’s about trust and confidence, it’s very subjective. Oh, and of course, we’re only talking about automatic stuff, not about actual testing.
— Finally, nothing is left from the pyramid. The Knight Rider beats Mike Cohn.
— Not at all, are you crazy! The metaphor is great, I told you. When I discovered it, it was an epiphany. We’re talking about your precise context, not a generic frame of mind. The pyramid is just a model, so it’s wrong, that’s all. This one is particularly simplistic to stand the contact with the field, but it’s still very useful as a basis for thinking.

Pluto laid back on his chair, rubbing his nose, and stretched his legs.

— Real life is more nuanced than the illustration, I get it. You did it again, I’ll have to get back to the team.

When Pluto opened his eyes, David Hasselhoff was back in the poster.

— I really need to sleep more.

A veto killed the team

You’ve worked for several days on a complex problem with a couple of teammates. You are proud of your approach and your code, and the three of you high-five when the CI server and testing confirm everything works fine. But the guy shows up, has a look at your code, and tells you that this detail is not the way we work here. You ask him for detail, try to convince him your solution is OK, but he doesn’t agree, and he tells you how to rewrite that part of the code. You know he’s considered right, so you just do as he says. The consequences are huge. You get 2 additional days of refactoring to comply to the rules, and you end up with a code you’re not so proud of.

Next time, you know you will ask the guy first. You will wait for his approval up-front, take his instructions into account, and do as he says. As you ask him, he will tell you all the details about how he would have done it. You know you’d better take notes of every detail, cause you don’t want to rewrite your stuff next time. And if anything doesn’t work as expected, you know you’ll ask for his advice.

You got a veto.

Veto is a smell: it happens once something is somehow decided, discussed, or done. Which means waste.

Veto tends to prevent teams from empowerment. If you need the approval from someone who is likely to put a veto, the result is very quick: people wait for detailed specifications, or micro management, from this guy. He will be the only brain in the team, and the bottleneck.

Counter-measures:

  • Allow mistakes. A team may be wrong when it decides something, which is OK. If you accept mistakes, you may have good surprises. What you consider as a mistake up-front might be a great solution.
  • Limit batch size. Small batch size means lower risk. Lower risk means higher resilience to mistakes.
  • Make sure vetoers coach before instead of using the card after. With great power comes great responsibility. Everybody’s job is to get useless. If you have the veto power, ask questions, like “what if”, “how will you know”, “how should we mitigate”… Get along with people, coach them.
  • Make sure the vetoers put vetos on actual issues with high risk and high impact, not potential issues, not actual issues that will be detected right on, not actual issues with low impact, not actual issues with known counter-measures.
  • Make sure your team is made up of opposite powers. This is the only guaranty to balance. If the vetoer shuts a silent minority, you have a smell. Only with attention can you detect it. If you have as much power as the vetoer, pay attention, so that veto only occurs in extreme situations.
  • If alignment is needed between teams, explain why and discuss about it. Note that alignment is far less needed than you may think.
  • Limit veto power.
  • Include people with veto power from the start of decisions (which is rarely possible, as powerful enough people are not available).
  • If nothing else can be done, remove the vetoer from the team. Every time I saw a necessary person get out of team, willingly or not, the outcome was surprisingly better. At least, you can run the experiment and see how it works.

The veto is a very dangerous tool. If you use it in your team, you must use it with extra care, only if all other tools failed, and only if the consequences of not using it are too serious. If you couldn’t convince people of your perspective before, or make sure they were able to take what you consider as the right decision on their own, you may consider it as a failure. But you shouldn’t use dictatorship to cheer up. Your team mates don’t have this tool.

Everybody has great ideas, and their own perspectives. Take benefit of that. Help the teams become autonomous, you’ll get great results.

Sweet WIP limit

Like many teams we adopted WIP limit. And like most of them we could see the benefits of this practice right away.

WIP stands for work in progress. Limiting WIP thus means limiting the number of items you work on at the same time within the team. When your team reaches the max number of items it can work on, it just stops starting new items.

By limiting the WIP we create slack. Which is exactly how I could finally introduce it in the team: “what if, when we reach the limit, we just did nothing?” Though it’s not as simple as this, fear of the void is the main issue to overcome when you want a team to adopt WIP limit: ultimately, when you reach the WIP limit, you stop working on the top priority items. Instead, you help to keep the items you started flowing towards their done doneness. But when you can’t help anymore, you just do nothing. Actually, the rules, when we reach the WIP limit, are:

  • You don’t start any additional item.
  • You help finishing started items.
  • If nothing can be done, you can do whatever you want, as long as it can be interrupted anytime to work on the project’s priorities.

So we created slack. And, as often described, the benefits to slack are huge:

  • Started items are done in a quicker way.
  • As a consequence, there is less uncertainty.
  • More reactivity.
  • Less interruptions.
  • Less WIP means less dependency issues within the team.
  • We have a tendency to pair more. Thus more quality, more tests, more refactoring.
  • More knowledge sharing because we help more each other.
  • We facilitate more other team’s flow.
  • We develop more tools that help us speeding up our flow.

Let’s look at the concrete stuff. We did it in a slightly different way than the academic way, or other teams on the project. Our flow is quite standard for a dev team.

Flow

Usually, you would have a WIP limit on Dev and Dev Done, and another one on Test. We tried something else. As we are a team of 4, we set a WIP limit of 3 around Dev, Dev Done, AND Test (actually it’s 2 backlog items + 1 bug at least major, but it doesn’t matter here).

Flow with WIP limit

We did this for several reasons:

  • As we have no dedicated tester in the project, we often test our backlog items ourselves. 
  • When a backlog item is rejected in test, it goes back to dev directly.

A WIP limit works because it’s applied to a production unit: a team. It’s normal to apply the WIP limit to everything the team covers. And even when testing isn’t in our scope, we are still supposed to take it into account, as the item being tested can still come back to dev. Testing doesn’t always answer yes. Kanban comes from manufacturing. In manufacturing, when an item isn’t valid, it is usually dumped, or sent to a specialized team. In our case, when an item isn’t valid, it consumes a slot to the dev team that produced it.

After a couple of iterations with this rule, I can confirm that the fear of the void is still the main impediment to overcome. We are cheating the WIP limit here and there, especially when managing bugs. We are still tweaking the WIP limit in order to make it fully functional.

I’ll keep you posted when I have some significant news. In the mean time, if you have any experience to share, please feel free to help.

Why technical user stories are a dead-end

As said in my previous post, our team works in the intermediate layers of the product. I didn’t get into details, but one of the reasons why we need horizontal slices is that we have a huge job to do to improve performance of a 4-year old very technical product. The task will take several months. We have a direction to go, but we don’t know how to get there in detail. We need to iterate on it anyway, as a tunnel is the biggest risk of development. The common way of doing it is to iterate on small user stories. This is where problems begin.

First of all, user stories start with a user, and a story. We have none of these. We just have a big technical tool to implement. If we wanted to write user stories for each iteration, with a relevant title, each of them would be something like “as any user, I would like the product to be more reactive, because fast is fun”. Maybe we could suffix titles with 1, 2, …37 to make a difference between them. Some would prefer stating technical stuff like “as Johnny the developer of XYZ module, I would like more kryptonite in the temporal convector, because I need a solar field”… and would call it a user story. You see my point.

And we need a good title for a user story… sorry… a backlog item. Because it’s the entry point for the PO, the customer, the
developers, the testers,… to understand WHAT is to be done and more important, WHY it is so (which is why I prefer beginning with the why in the title). Stating how we do stuff is irrelevant, because it’s only a dev matter. It doesn’t help understanding each other in any way. Which means we don’t have any clue about how we will agree on its doneness. In other words it’s a non-sense to commit on such a story. So we have a first dead-end: we need good descriptions , but we can’t provide them in a “user story” way.

The performance project is big because the new engine, that “caches” the default one, needs to implements a lot of rules of the default engine. And releasing it is not relevant until it implement enough of these rules. Dividing the work in rules has no sense, first because we need to build the basis of the new engine, which is already a big work. Secondly because implementing a rule in the new engine is not equivalent to implementing it in the default one. The algorithms, the engines, the approaches, are different. What is natural in one environment is a lot of work in the other one, and vice-versa. In addition, the work is so huge no one can wrap his brains around the godzillions of details we need to implement. Finally, we don’t know in detail the rules implemented by the default engine. Said another way, we continuously discover what needs to be done, along the way. Even in each backlog item, we are discovering so much we can’t estimate precisely enough at the beginning, or even make sure we agree on the same thing to be done. Not to mention longer term planning (which we will see in another post). Why denying it? Discovery is our everyday reality.

So we have no user, no story, no clear short-term scope, but we agree on the fact we need small items to iterate on. But wait! Agile has the solution: spikes! We timebox a short study, which we might iterate once or twice, and we know how to go on. But that’s what we already did for 3 or 4 iterations with a team of 4 developers, we are progressing, and we still don’t know clearly what we will do in the next iteration. So spikes might sound nice, but when you are at the bottom of a technical Everest, it needs to be taken to the next level. We are not picking a logging library among 3 candidates.

So following the basic agile rules, we’re stuck. But there is a solution we’ll talk about in the next post.

Code re-use can be as bad as code duplicate

Code duplicate is one of the top evils in software development. But as strange as it may sound, code re-use can be a pain in the eyes too. Let me explain.

Our team works on intermediary layers of the product. Yes, sometimes horizontal slices are the way to go. Our product is about monitoring. It’s made up of a home-made database, an intermediate layer that defines and computes analytics, a framework to define specific applications, and the applications built on top of the platform. Our team works on the analytics. The main thing here is that we build frameworks. It’s actually what I always did. The problem with frameworks is that you never can anticipate every use case. They are here to provide tools, not end-user features.

For this kind of code, BDD-like tools are difficult to apply. BDD needs you to know precisely enough what you’re going for when you commit on a backlog item. And it requires to come up with a limited set of features to test. With such technical code, though, you can hardly state the new required behaviors as user stories. And the main thing you want is the whole to remain stable and efficient. Before validating new features, you want to test non regression. After a while, it becomes difficult to test it manually. And automated tests are not perfect by nature: the more stuff to test you embed in the automated test, the less maintainable and fast the campaign will be. You thus make trade offs, which test is all about. You can hardly test a re-used tool correctly.

We can even get down another step. A piece of code that is re-used by a flip load of modules is a nightmare to maintain because we never really know how it’s used. Even if you run a set of unit (or automated in any way) tests you rightfully trust, you are not sure the piece of code is used the way you test. It’s particularly true for code that’s been here for a while: when an additional module uses your code, it’a miracle if the guy implementing the calling code thinks about adding the automated test that guarantees the called code does what he expects. And please allow me to look somewhere else with a strange face if you tell me that 100% code coverage guarantees your test base is nuclear bomb proof.

A few years ago, when we were having regression issues on our code base, I was surprized to realize we wouldn’t have such problems if we had not shared a particular piece of code. It hurt a basic rule I had believed in for years. And then I realized I was only experiencing the need to think about another trade-off we always need to balance: sustainability vs sustainability.

Open source mindset to enable scaling

In order to scale agile, we need to bring several teams on the same page. We need them to share a business and user understanding, a vocabulary, pace, backlog, and emerging architecture. Teams must balance functionalities and architecture, business value and quality, across a common project.

When the number of teams increases, we often end up with teams more specialized on technical components than others. If we don’t, we sometimes have more issues getting a consensus from all teams about what needs to be done how on very sensitive pieces of code. At the end of the day, even if the agile manifesto discourages silos, software engineering is still a craft, and people are better at some stuff than others. The agile manifesto also encourages small teams, so you need to have some kind of borders between them.

So we might create silos… not a good idea. But projects already do it out there, and have success doing it. Have a look at open source projects that do very technical stuff about security, message queuers, databases, servers, and so on. They make components that are far from the final features delivered by projects that integrate them. Only a few people have a final word on what is developed. But in the end, they provide what integrators expect from them, and generally respond quickly enough to requests. You can reliably integrate them in order to provide features. And they build quality. Why does this work?

Because everything is transparent. They have wikis, tutorials, forums, jiras, githubs, they interact with their users, respond to requests, take suggestions into account, and make decisions and roadmaps clear to everyone. You can even get and debug the code, fix it, propose new features implementations, so that they are included in the next release. At the end of day, if the component is useful to the community, a team of a few people have proposals from dozens or hundreds of people.

Now let’s get back to our in-house multi-team project. As input, you have a business backlog. You refine requirements, split them across teams. When assigning them, you come up with direct requirements for core components. While teams study functional requirements (or develop them), they will also come up with requests for core components. At this point, you have 2 solutions:

  • either all teams can modify core components while developing functional stories, in which case you need to somehow align architectural decisions taken in several places,
  • or a few teams are responsible for core components consistency, in which case you must ensure that they are responsive enough to current and mid-term requirements, with a sustainable pace (sustainable pace includes quality code and just enough development).

To support the 1st option, you need to have the right people organized the right way. It can be tech leads within teams, with hands on, or super-brains above us common mortals. I never believed super-brains are the optimal solution, but apparently some organizations have the right profiles to make it work. And synchronizing tech leads might be difficult if you have too many teams, too much pressure, too many sites, different priorities between teams and so on. Which is why you end up with the 2nd solutions in some contexts.

So why not trying an open source way of working within your organization? Make your source code public to other teams, support them, make sure your PO takes their requirements into account in your backlog, for instance by organizing votes for features requests. And when you see you’re on the same page with other developers, you can even give them the right to check-in some code directly. This may seem obvious and simple to you, dear agilists. But phrasing it as an open source mindset will instantly ring a bell to the nerds you’ll be explaining this. And frameworks, i.e. governance models, already exist out there (apache, eclipse,…).

What do you think?