Introduction
Imagine a scenario like this: Tuesday morning, 5 AM. The air is charged with excitement as the final software release of the year is about to go live. All of a sudden: the pipeline failed. A feature that was not supposed to go live found its way to the release branch. The Lead Developer is already on vacation. You receive a call to help out. The issue is quickly identified and the pipeline is fixed. But then, a few hours later on production, chaos ensues. A small bug is reported by the client and a hotfix is rushed out. However, the hotfix only leads to more problems – caching issues on some parts of the infrastructure impair the productivity of the client team. Eventually, the issue can be resolved and you finish a chaotic work day with lots of interruptions.
Surely, this is an exaggerated scenario but I believe everyone has either experienced or heard about similar stories. While reading the above, you also might have identified some red flags and issues that you would address. But how can we ensure that everyone in the team learns from these chaotic situations and has the opportunity to improve in the future? Let’s delve into the world of After-Action Reports (AARs)!
After-Action Reports (AARs)
Originally used in military and emergency services, an After-Action Report is a powerful tool for reflection and learning. It is a structured review process for analyzing what happened, why it happened, and how it can be done better. Key elements include a clear description of events, analysis of the outcome, and actionable recommendations for future improvement. The important thing to mention is that this is not only used in case of failures but every time. Even from successful operations, we can still learn. Learn why it was successful, and what can be done even better next time. Another benefit is that it can be used as a sort of documentation over time. Every single action is reported and archived.
In medicine, a similar approach is used that is familiar to most developers as we adapted to use it after a project has ended: Post Mortems. In software development, popular examples include analyses of projects like Diablo and Doom (Diablo Postmortem, Doom Postmortem). While it’s fun to watch these for game development or IT projects, it has a much more sincere context in medicine. Post Mortems are used after (=post) someone dies (=mortem) in a surgery. This way, it can be analyzed what went wrong and if there was any chance it could have been avoided.
Structure
An effective AAR can be structured into three main components:
- The Context: Understanding the scope, original plans, goals, and involved parties.
- The Report: A chronological account of what occurred, deviations from expected results, successes, and areas for improvement.
- The Conclusion: Key learnings and actionable steps for future projects.
AAR applied to Software Development
Reflecting on our initial scenario, implementing an AAR could significantly alter future project outcomes. Let’s explore this:
Setting the stage
There are probably various questions the team could ask themselves, starting from clarifying the context in which the release was scheduled. Answering all or some of these questions can help explain the context and circumstances of the release:
- Could it have been released earlier with a more narrow scope?
- Should the release have waited for the Lead Developer’s return?
- What were the preparation steps for the team?
- Had the team managed similar-sized releases without the Lead Developer before?
- Why was it scheduled for 5 AM?
Understanding the roles of team members and the planned features, timeline, and unique aspects of the release adds further clarity.
The report
What happened in chronological order? Imagine having a clear log of actions from the preparation of the release branch, the pipelines, the fix of the pipelines, when the system was deployed, when the incident was reported, and when the fix was deployed. Perhaps the release was prepared too late. This would necessitate that the pipelines should have run the day before, or the client might have reported the incident after working hours. All these are information to create a clearer picture of the situation. With this, you can analyze the whole situation a lot better and ask better questions about improving it in the future.
What went well? It’s important to not only state the negatives and focus too much on the errors but also be aware of everything that went well. Maybe you learned something from the last AAR and implemented a change in process that worked perfectly. Mention it.
“A [hu]man must be big enough to admit his mistakes, smart enough to profit from them, and strong enough to correct them.”
– John C. Maxwell
Why did problems occur? There is one important thing I haven’t said before: AARs are not about finger-pointing! The idea of this process is to improve as a team. Therefore, it’s not healthy to blame individuals for their mistakes. In the end, we are all humans and make mistakes from time to time. At the same time, if it was you that made a mistake that could’ve been avoided and you want to draw your learnings from this, own it. Own your mistakes and commit to not repeating them in the future by letting your team know why they happened and how they can be avoided. But maybe they weren’t even problems. It could as well be that you noticed things that could be improved. For example, you identified that the pipelines are taking a long time because of one specific job that can be optimized or parallelized in the future.
Conclusion
What have we learned? Our exaggerated example above illustrates several missteps and opportunities for improvement. All of these should be written down. I believe you should focus more on realistic learning than hypothetical scenarios. Keep the circumstantial parameters in place! Of course, everything could’ve worked better when we assumed a completely different context in the beginning.
Action Steps. Learnings don’t help you a lot if you don’t put them into action. It’s the same as having a retrospective where all things have been mentioned and discussed and you know exactly what needs to improve but no one owned an action item and therefore nothing changed. Make sure that you add one or multiple owners to an action item, describe what needs to be done as the next step(s), and describe how you can measure if the task has been accomplished (similar to Acceptance Criteria in a ticket). You can also think about adding a timeline on the action item to be sure it is not simply added to the Backlog and never touched again.
Summary
In summary, After-Action Reports (AARs) are not just tools for reflection; they are catalysts for continuous improvement in software development. Through the structured analysis of events – successes and failures alike – AARs provide invaluable insights for future projects. They foster a culture of open communication and learning, which increases the accountability and trust in the team. By documenting actions, encouraging open communication, and fostering a culture of learning, AARs help teams evolve and excel. Embracing AARs in your processes is more than a best practice; it’s a commitment to excellence and a testament to a team’s dedication to growth.
I can only recommend everyone try it and think about new ways to involve AARs in their critical processes. Have you tried it? What were your experiences with this approach? Send me a message, and let me know about your experiences!