Production Support and Scrum

How should Scrum teams plan for support?

28 March 2008

Geoff Watts
Inspect & Adapt

Teams adopting Scrum not only have to deal with the normal project complexities of prioritisation, estimation, and turning product backlog items into potentially deployable increments of functionality, they also often have to support a system in production or address bugs that come back to them during development. How do we track and prioritize these support activities? How do we handle emergencies? Who performs production support?

Bug of the Day

Production support can be seen as a disruption to teams that just want to get on with things but is often very dear to the heart of the system users and, therefore, the Product Owner. It can be a tricky issue to handle as it adds an extra degree of complexity to the prioritization debate. Often teams will be asked to take an approach of “production support is a necessity, so just deal with it.” After all, if the system is down, what is the point in adding more features to it? The priority must be to get it back up and running straight away.

However, this approach to production support can’t be planned—just dealing with it as it comes up can lead us away from the most appropriate decisions as we get caught up with the “bug of the day” scenario. It’s easy to lose sight of our vision and strategic plan by just dealing with whatever the latest system problem is.

Scrum asks us to prioritize our effort, sometimes making difficult decisions to create maximum business value as early as possible in order to help us achieve our strategic goals. One of those difficult decisions may be to question the relative value of fixing bugs against adding new functionality. Just like our initial instinct when faced with a user story that helps us comply with a new law might be “we have to do it, it’s the law,” there might be a scenario where, from a business perspective, the fine for not complying with the law is less than the value obtained by implementing an alternative story. This is a difficult decision, but arguably the right one for the company. Applying this theory to the area of production support, we might prefer to live with an inconvenience and have a new feature than fix that problem right away. We should at least consider the relative merits.

If we follow this route, there is a perceived risk that our bugs will never get fixed. I understand that concern, but in my experience, that risk doesn’t become a reality. There still remains the concern of how we plan for production support. I have seen teams deal with the issues of production support in different ways.

First Steps

The most common solution teams employ initially is to effectively have two backlogs—one for development features and one for production support issues. The Product Owner sets a guideline ratio for planning whereby the team will take, for example, 70 percent from the development backlog and 30 percent from the support backlog. This is arguably not much different than the team just reducing their capacity to account for support tasks and is effectively hiding the issue, shying away from the conversation around priority and increasing the risk of spending effort on sub-optimal work.

In this scenario, because production support is usually not in the easily estimatable form of user stories and often crops up during the sprint, teams will have a burndown for their development stories and a burnup for their production support. The team can then review their feature/bug ratio in the daily scrum thus allowing the team to raise issues and tradeoffs with the Product Owner. There is a risk of burning through all production support before the sprint ends but the shorter the sprint, the smaller the potential impact of such a situation.

Bugs as Feature Requests

In the above scenario, teams placed production support items onto a product backlog, with an associated business value and size estimate. This begs the question, why split the product backlog into two? By combining the two backlogs, we can explicitly confront the issue of whether we are doing the “right” things, which is ultimately preferable in my opinion. While this involves more complexity from a Product Owner point of view, it allows the team to concentrate on what’s important. The opportunity is still there for the team to pick synergistic items to maximise overall value, which leads us to another interesting way of dealing with production issues.

A number of teams add existing production issues as acceptance criteria of functional user stories so that when a team opens up, or touches, that area of the system we have some pre-existing tests for when that feature is “done.” The added benefit here is that some of those bugs that might never have gotten fixed if prioritised on their own get addressed while the team is focussed on the most important features. Ideally, any new problem will be defined in terms of acceptance tests that need to pass; these will then become part of the evolutionary design documentation of the system.

The key here is to try to avoid getting into an argument over whether something constitutes production support, a fault in development, or a change request. This is easier said than done but teams that can absorb this discussion rather than draw absolute lines will be the more effective and productive teams. Finding out that the acceptance criteria pass but the feature isn’t really potentially deployable should neither be an opportunity for the Product Owner to introduce scope creep nor the development team to hide behind a requirement spec (“you asked for this”) but rather an opportunity to be pragmatic and, most importantly, learn about what we did so we can improve the system and our test coverage for future stories.

This is additional valuable information. If production support issues come up, it usually means we missed something (perhaps an acceptance criterion) in our initial development. This is an opportunity for us to increase the test coverage of our system – a vital part of the team inspecting and adapting and maintaining (and increasing) the integrity of the system.

Emergencies

We still have the issue of what to do with the emergencies that crop up and aren’t part of the product backlog. The ScrumMaster or Product Owner should assess whether the issue is an actual emergency.  By working in sprints we are reducing the wait time before we can plan to work on these items, often resulting in those “critical items” becoming not quite so critical.

If the issue is a true emergency, the Product Owner should have the authority to play the “emergency card,” as long as he is aware of the costs of doing so— not completing the items we planned to and, potentially, jeopardizing the sprint goal. If this happens frequently, then it might be worth considering a maintenance sprint to clear up some of the technical debt that might be causing a lot of these problems. Another option is also to shorten the sprints, thereby reducing the potential waste in these scenarios.

Who Does It?

The other issue of planning for production support is deciding who will do it. Production support issues are considered “boring,” so there is often a reluctance to sign up for them. Assigning someone (or even a pair) as the “support person” is not a popular idea and having a “support team” causes an unnecessary split along with the associated confusion. There are a number of benefits to not splitting the team, not least of which is the synergy of keeping all efforts within a sprint and, probably most importantly, the sense of responsibility the team has for a system that it not only develops but also maintains—the team is acutely aware of the impact of getting things wrong.

Some teams will rotate the support role either on a sprint-by-sprint basis or weekly basis, coupled with a rule of “if you start it, you finish it.” This can also have the secondary benefits of expanding the overall knowledge of the system for all team members and increasing cross-functionality within the self-organising team. Doing this is difficult and will take more time if the team members are highly specialised in their particular skillsets or areas of the system but, arguably, this is a risk that could benefit from being managed proactively anyway.

Different options are available for teams to deal with the issue of production support and, although there is no “right way,” I have seen the biggest benefits accrued by teams that look at production support and feature requests with equivalence. I am very encouraged when a team is willing to take the challenge of confronting the prioritization issues of bringing support issues on a par with the development backlog items, thus sticking to one product backlog. These teams look for opportunities to improve the system as they work on it and use production support issues as already-written acceptance tests. This approach not only gives us the greater likelihood of doing the “right” things but also shows the team is up for the potentially difficult decisions during its journey towards a more agile way of working.

Article Rating

Current rating: 5 (1 ratings)

Comments

Sherrie Polk, CSM, 3/31/2008 4:19:19 PM
Geoff- Thanks - I was talking to my manager about these very issues this afternoon! This gives me a few more ideas on how to handle the production vs. support work.
Mike Lowery, CSP,CSM,CSPO, 4/1/2008 5:00:48 PM
Great article Geoff it's an ongoing issue for many teams.
When the teams I worked with came up against this issue, we did the following.
1. Reduced our velocity / capacity to have some support slack.
2. Major bugs / oversights were treated as new stories.
3. Each team member had a task card in the sprint with the support hours they had committed too on that card.
4. When they burnt the time to Zero we asked other team members to help using their support commitment.
5. If we could not transfer the support or ran out of commitment we went to the Product Owner for them to make the operational choice of fixing the issue or producing new features.
This worked well for us.
James Peckham, CSP,CSM,CSPO, 4/4/2008 7:26:55 AM
This sounds just like the discussion we had. We ended up taking route #1 because our product owner couldn't handle prioritizing production issues... they always stayed at the bottom. Definitely a bandaid, but it seems to be working sortof decently. We're still 'calibrating' to figure out how many 'points' per iteration of production we do.
Giovani Salvador, CSM, 4/30/2008 2:17:23 PM
In the team I am currently work we have to rotate support activities. when I am on support, my availability may be reduced for that sprint. For example, in the sprint planning meeting we can plan that my availability for support issues would require 1 hour per day so we reduce my hours for that specific sprint in one hour for the time I am on support. It seems a magic number but we have some facts about it. If it takes more than that (support activities) someone else can take my task or we move to the next sprint the remaining tasks of user stories if client agrees.
Harald Walker, CSM, 5/4/2008 4:39:21 PM
My team also had to deal with such a situation. After a long time of development the product went into production while development continues. Support issues are being managed in Jira and once they have been verified and accepted we add them to our normal Scrum backlog. Bugs are usually combined in one story as they are often too small to represent a full user story. Which bugs make it into the next Sprint depends on various factors like the priority of the bug (as specified in Jira), commitments towards customers and related user stories. Estimating bug fixing tasks can of course be tricky due to the nasty nature of bugs but on average it seems to work. Many reported bugs are actually change or feature requests and will be added as normal prioritized backlog items. For customer support (the developers are 3rd line support) and emergency bug fixing we reduced the velocity and in the worst case it might effect the scope of the sprint. Emergency bug fixing usually also means that we are facing extra maintenance releases (independent of the regular Sprint iterations and planned releases), which is something I would prefer to avoid but sometimes it just can't wait.
Elizabeth Johnson, CSM, 5/14/2008 11:12:01 AM
This is a great article! The issue that we have is that most of our support issues and/or bugs are not related to the
project that we are currently working on. Therefore, the product owner for our scrum team has nothing to do with those
issues and isn't responsible for prioritizing them. So what we end up doing is what some of the other people have
mentioned - we pad some time for production support issues (emergencies only.) The others go onto a backlog and are
worked on during a Maintenance sprint. We have also talked about having a scrum team dedicated to Maintenance/support
issues. Has anyone done or heard of anything like this? We're interested in your ideas/comments/experiences.
Tom Reynolds, CSP,CSM, 7/15/2008 9:56:02 AM
An interesting article and it is not an easy one to solve.

I currently have this problem on my existing project; we are in an in-store proto-type situation with a live trading pilot due shortly. We currently have two code streams, one is the maintenance branch (the software that will go to pilot) and we have our tip code where we continue to work on new developments for delivery in our on-going sprints.

Defects found in proto-type are prioritised by the product owner including whether a fix is required for pilot or not. If itΓÇÖs not required for pilot then it gets picked up in subsequent sprints, if it is required for pilot we fix in the maintenance branch as well as our tip code.

All defects are addressed by the scrum team who have built the product so they retain ownership and responsibility. We do however reduce our velocity for new developments to allow for support and fix activities. At the same time we are actively introducing measures identified in our retrospectives to improve quality upfront to reduce our support burden moving forwards. The scrum team will handle all support issues until after pilot sign-off when the team will then hand live support over to our support team. The scrum team will become 4th line support at this stage but still retain all responsibility for rectifying any software defects found in a live site, this responsibility is never removed from the team.
Manohar Venkataraman, CSM, 3/2/2010 2:56:11 PM
As QA Manager I've been tasked to do some research on Production Support roles, and fortunately stumbled on this page. I'm actually looking for a more granular approach, i.e. end user needs one point of contact (Level 1). We have to decide whether we need to hire someone for that or use an existing Infrastructure person, in which case they will need a knowledge base they can start and maintain regarding expected issues. If the situation appears either critical or unknown, then it gets escalated to Level 2. Who is level 2? I'm leaning towards a rotation of the Business Analysts who can determine whether the issue is critical or can wait until the next/future iteration. If its critical, then I think we just have to bite the bullet and release a hotfix built on the current Production branch, but I don't really see a more efficient way to do things.

You must Login or Signup to comment.