Develop an Availability and Capacity Management Plan
- Buy Link or Shortcode: {j2store}500|cart{/j2store}
- Parent Category Name: Availability & Capacity Management
- Parent Category Link: /availability-and-capacity-management
- It is crucial for capacity managers to provide capacity in advance of need to maximize availability.
- In an effort to ensure maximum uptime, organizations are overprovisioning (an average of 59% for compute, and 48% for storage). With budget pressure mounting (especially on the capital side), the cost of this approach can’t be ignored.
- Half of organizations have experienced capacity-related downtime, and almost 60% wait more than three months for additional capacity.
Our Advice
Critical Insight
- All too often capacity management is left as an afterthought. The best capacity managers bake capacity management into their organization’s business processes, becoming drivers of value.
- Communication is key. Build bridges between your organization’s silos, and involve business stakeholders in a dialog about capacity requirements.
Impact and Result
- Map business metrics to infrastructure component usage, and use your organization’s own data to forecast demand.
- Project future needs in line with your hardware lifecycle. Never suffer availability issues as a result of a lack of capacity again.
- Establish infrastructure as a driver of business value, not a “black hole” cost center.
Develop an Availability and Capacity Management Plan Research & Tools
Start here – read the Executive Brief
Read our concise Executive Brief to find out why you should build a capacity management plan, review Info-Tech’s methodology, and understand the four ways we can support you in completing this project.Besides the small introduction, subscribers and consulting clients within this management domain have access to:
- Develop an Availability and Capacity Management Plan – Phases 1-4
1. Conduct a business impact analysis
Determine the most critical business services to ensure availability.
- Develop an Availability and Capacity Management Plan – Phase 1: Conduct a Business Impact Analysis
- Business Impact Analysis Tool
2. Establish visibility into core systems
Craft a monitoring strategy to gather usage data.
- Develop an Availability and Capacity Management Plan – Phase 2: Establish Visibility into Core Systems
- Capacity Snapshot Tool
3. Solicit and incorporate business needs
Integrate business stakeholders into the capacity management process.
- Develop an Availability and Capacity Management Plan – Phase 3: Solicit and Incorporate Business Needs
- Capacity Plan Template
4. Identify and mitigate risks
Identify and mitigate risks to your capacity and availability.
- Develop an Availability and Capacity Management Plan – Phase 4: Identify and Mitigate Risks
[infographic]
Workshop: Develop an Availability and Capacity Management Plan
Workshops offer an easy way to accelerate your project. If you are unable to do the project yourself, and a Guided Implementation isn't enough, we offer low-cost delivery of our project workshops. We take you through every phase of your project and ensure that you have a roadmap in place to complete your project successfully.
1 Conduct a Business Impact Analysis
The Purpose
Determine the most important IT services for the business.
Key Benefits Achieved
Understand which services to prioritize for ensuring availability.
Activities
1.1 Create a scale to measure different levels of impact.
1.2 Evaluate each service by its potential impact.
1.3 Assign a criticality rating based on the costs of downtime.
Outputs
RTOs/RPOs
List of gold systems
Criticality matrix
2 Establish Visibility Into Core Systems
The Purpose
Monitor and measure usage metrics of key systems.
Key Benefits Achieved
Capture and correlate data on business activity with infrastructure capacity usage.
Activities
2.1 Define your monitoring strategy.
2.2 Implement your monitoring tool/aggregator.
Outputs
RACI chart
Capacity/availability monitoring strategy
3 Develop a Plan to Project Future Needs
The Purpose
Determine how to project future capacity usage needs for your organization.
Key Benefits Achieved
Data-based, systematic projection of future capacity usage needs.
Activities
3.1 Analyze historical usage trends.
3.2 Interface with the business to determine needs.
3.3 Develop a plan to combine these two sources of truth.
Outputs
Plan for soliciting future needs
Future needs
4 Identify and Mitigate Risks
The Purpose
Identify potential risks to capacity and availability.
Develop strategies to ameliorate potential risks.
Key Benefits Achieved
Proactive approach to capacity that addresses potential risks before they impact availability.
Activities
4.1 Identify capacity and availability risks.
4.2 Determine strategies to address risks.
4.3 Populate and review completed capacity plan.
Outputs
List of risks
List of strategies to address risks
Completed capacity plan
Further reading
Develop an Availability and Capacity Management Plan
Manage capacity to increase uptime and reduce costs.
ANALYST PERSPECTIVE
The cloud changes the capacity manager’s job, but it doesn’t eliminate it.
"Nobody doubts the cloud’s transformative power. But will its ascent render “capacity manager” an archaic term to be carved into the walls of datacenters everywhere for future archaeologists to puzzle over? No. While it is true that the cloud has fundamentally changed how capacity managers do their jobs , the process is more important than ever. Managing capacity – and, by extent, availability – means minimizing costs while maximizing uptime. The cloud era is the era of unlimited capacity – and of infinite potential costs. If you put the infinity symbol on a purchase order… well, it’s probably not a good idea. Manage demand. Manage your capacity. Manage your availability. And, most importantly, keep your stakeholders happy. You won’t regret it."
Jeremy Roberts,
Consulting Analyst, Infrastructure Practice
Info-Tech Research Group
Availability and capacity management transcend IT
This Research Is Designed For:
✓ CIOs who want to increase uptime and reduce costs
✓ Infrastructure managers who want to deliver increased value to the business
✓ Enterprise architects who want to ensure stability of core IT services
✓ Dedicated capacity managers
This Research Will Help You:
✓ Develop a list of core services
✓ Establish visibility into your system
✓ Solicit business needs
✓ Project future demand
✓ Set SLAs
✓ Increase uptime
✓ Optimize spend
This Research Will Also Assist:
✓ Project managers
✓ Service desk staff
This Research Will Help Them:
✓ Plan IT projects
✓ Better manage availability incidents caused by lack of capacity
Executive summary
Situation
- IT infrastructure leaders are responsible for ensuring that the business has access to the technology needed to keep the organization humming along. This requires managing capacity and availability.
- Dependencies go undocumented. Services are provided on an ad hoc basis, and capacity/availability are managed reactively.
Complication
- Organizations are overprovisioning an average of 59% for compute, and 48% for storage. This is expensive. With budget pressure mounting, the cost of this approach can’t be ignored.
- Lead time to respond to demand is long. Half of organizations have experienced capacity-related downtime, and almost 60% wait 3+ months for additional capacity. (451 Research, 3)
Resolution
- Conduct a business impact analysis to determine which of your services are most critical, and require active capacity management that will reap more in benefits than it produces in costs.
- Establish visibility into your system. You can’t track what you can’t see, and you can’t see when you don’t have proper monitoring tools in place.
- Develop an understanding of business needs. Use a combination of historical trend analyses and consultation with line of business and project managers to separate wants from needs. Overprovisioning used to be necessary, but is no longer required.
- Project future needs in line with your hardware lifecycle. Never suffer availability issues as a result of a lack of capacity again.
Info-Tech Insight
- Components are critical. The business doesn’t care about components. You, however, are not so lucky…
- Ask what the business is working on, not what they need. If you ask them what they need, they’ll tell you – and it won’t be cheap. Find out what they’re going to do, and use your expertise to service those needs.
- Cloud shmoud. The role of the capacity manager is changing with the cloud, but capacity management is as important as ever.
Save money and drive efficiency with an effective availability and capacity management plan
Overprovisioning happens because of the old style of infrastructure provisioning (hardware refresh cycles) and because capacity managers don’t know how much they need (either as a result of inaccurate or nonexistent information).
According to 451 Research, 59% of enterprises have had to wait 3+ months for new capacity. It is little wonder, then, that so many opt to overprovision. Capacity management is about ensuring that IT services are available, and with lead times like that, overprovisioning can be more attractive than the alternative. Fortunately there is hope. An effective availability and capacity management plan can help you:
- Identify your gold systems
- Establish visibility into them
- Project your future capacity needs
Balancing overprovisioning and spending is the capacity manager’s struggle.
Availability and capacity management go together like boots and feet
Availability and capacity are not the same, but they are related and can be effectively managed together as part of a single process.
If an IT department is unable to meet demand due to insufficient capacity, users will experience downtime or a degradation in service. To be clear, capacity is not the only factor in availability – reliability, serviceability, etc. are significant as well. But no organization can effectively manage availability without paying sufficient attention to capacity.
"Availability Management is concerned with the design, implementation, measurement and management of IT services to ensure that the stated business requirements for availability are consistently met."
– OGC, Best Practice for Service Delivery, 12
"Capacity management aims to balance supply and demand [of IT storage and computing services] cost-effectively…"
– OGC, Business Perspective, 90
Integrate the three levels of capacity management
Successful capacity management involves a holistic approach that incorporates all three levels.
Business | The highest level of capacity management, business capacity management, involves predicting changes in the business’ needs and developing requirements in order to make it possible for IT to adapt to those needs. Influx of new clients from a failed competitor. |
---|---|
Service | Service capacity management focuses on ensuring that IT services are monitored to determine if they are meeting pre-determined SLAs. The data gathered here can be used for incident and problem management. Increased website traffic. |
Component | Component capacity management involves tracking the functionality of specific components (servers, hard drives, etc.), and effectively tracking their utilization and performance, and making predictions about future concerns. Insufficient web server compute. |
The C-suite cares about business capacity as part of the organization’s strategic planning. Service leads care about their assigned services. IT infrastructure is concerned with components, but not for their own sake. Components mean services that are ultimately designed to facilitate business.
A healthcare organization practiced poor capacity management and suffered availability issues as a result
CASE STUDY
Industry: Healthcare
Source: Interview
New functionalities require new infrastructure
There was a project to implement an elastic search feature. This had to correlate all the organization’s member data from an Oracle data source and their own data warehouse, and pool them all into an elastic search index so that it could be used by the provider portal search function. In estimating the amount of space needed, the infrastructure team assumed that all the data would be shared in a single place. They didn’t account for the architecture of elastic search in which indexes are shared across multiple nodes and shards are often split up separately.
Beware underestimating demand and hardware sourcing lead times
As a result, they vastly underestimated the amount of space that was needed and ended up short by a terabyte. The infrastructure team frantically sourced more hardware, but the rush hardware order arrived physically damaged and had to be returned to the vendor.
Sufficient budget won’t ensure success without capacity planning
The project’s budget had been more than sufficient to pay for the extra necessary capacity, but because a lack of understanding of the infrastructure impact resulted in improper forecasting, the project ended up stuck in a standstill.
Manage availability and keep your stakeholders happy
If you run out of capacity, you will inevitably encounter availability issues like downtime and performance degradation . End users do not like downtime, and neither do their managers.
There are three variables that are monitored, measured, and analyzed as part of availability management more generally (Valentic).
- Uptime:
The availability of a system is the percentage of time the system is “up,” (and not degraded) which can be calculated using the following formula: uptime/(uptime + downtime) x 100%. The more components there are in a system, the lower the availability, as a rule.
- Reliability:
The length of time a component/service can go before there is an outage that brings it down, typically measured in hours.
- Maintainability:
The amount of time it takes for a component/service to be restored in the event of an outage, also typically measured in hours.
Enter the cloud: changes in the capacity manager role
There can be no doubt – the rise of the public cloud has fundamentally changed the nature of capacity management.
Features of the public cloud | Implications for capacity management |
---|---|
Instant, or near-instant, instantiation | Lead times drop; capacity management is less about ensuring equipment arrives on time. |
Pay-as-you go services | Capacity no longer needs to be purchased in bulk. Pay only for what you use and shut down instances that are no longer necessary. |
Essentially unlimited scalability | Potential capacity is infinite, but so are potential costs. |
Offsite hosting | Redundancy, but at the price of the increasing importance of your internet connection. |
Vendors will sell you the cloud as a solution to your capacity/availability problems
Traditionally, increases in capacity have come in bursts as a reaction to availability issues. This model inevitably results in overprovisioning, driving up costs. Access to the cloud changes the equation. On-demand capacity means that, ideally, nobody should pay for unused capacity.
Reality check: even in the cloud era, capacity management is necessary
You will likely find vendors to nurture the growth of a gap between your expectations and reality. That can be damaging.
The cloud reality does not look like the cloud ideal. Even with the ostensibly elastic cloud, vendors like the consistency that longer-term contracts offer. Enter reserved instances: in exchange for lower hourly rates, vendors offer the option to pay a fee for a reserved instance. Usage beyond the reserved will be billed at a higher hourly rate. In order to determine where that line should be drawn, you should engage in detailed capacity planning. Unfortunately, even when done right, this process will result in some overprovisioning, though it does provide convenience from an accounting perspective. The key is to use spot instances where demand is exceptional and bounded. Example: A university registration server that experiences exceptional demand at the start of term but at no other time.
Use best practices to optimize your cloud resources
Even in the era of elasticity, capacity planning is crucial. Spot instances – the spikes in the graph above – are more expensive, but if your capacity needs vary substantially, reserving instances for all of the space you need can cost even more money. Efficiently planning capacity will help you draw this line.
Evaluate business impact; not all systems are created equal
Limited resources are a reality. Detailed visibility into every single system is often not feasible and could be too much information.
Simple and effective. Sometimes a simple display can convey all of the information necessary to manage critical systems. In cars it is important to know your speed, how much fuel is in the tank, and whether or not you need to change your oil/check your engine.
Where to begin?! Specialized information is sometimes necessary, but it can be difficult to navigate.
Take advantage of a business impact analysis to define and understand your critical services
Ideally, downtime would be minimal. In reality, though, downtime is a part of IT life. It is important to have realistic expectations about its nature and likelihood.
STEP 1 |
STEP 2 |
STEP 3 |
STEP 4 |
STEP 5 |
---|---|---|---|---|
Record applications and dependencies Utilize your asset management records and document the applications and systems that IT is responsible for managing and recovering during a disaster. |
Define impact scoring scale Ensure an objective analysis of application criticality by establishing a business impact scale that applies to all applications. |
Estimate impact of downtime Leverage the scoring criteria from the previous step and establish an estimated impact of downtime for each application. |
Identify desired RTO and RPO Define what the RTOs/RPOs should be based on the impact of a business interruption and the tolerance for downtime and data loss. |
Determine current RTO/RPO Conduct tabletop planning and create a flowchart of your current capabilities. Compare your current state to the desired state from the previous step. |
Info-Tech Insight
According to end users, every system is critical and downtime is intolerable. Of course, once they see how much totally eliminating downtime can cost, they might change their tune. It is important to have this discussion to separate the critical from the less critical – but still important – services.
Establish visibility into critical systems
You may have seen “If you can’t measure it, you can’t manage it” or a variation thereof floating around the internet. This adage is consumable and makes sense…doesn’t it?
"It is wrong to suppose that if you can’t measure it, you can’t manage it – a costly myth."
– W. Edwards Deming, statistician and management consultant, author of The New Economics
While it is true that total monitoring is not absolutely necessary for management, when it comes to availability and capacity – objectively quantifiable service characteristics – a monitoring strategy is unavoidable. Capturing fluctuations in demand, and adjusting for those fluctuations, is among the most important functions of a capacity manager, even if hovering over employees with a stopwatch is poor management.
Solicit needs from line of business managers
Unless you head the world’s most involved IT department (kudos if you do) you’re going to have to determine your needs from the business.
Do |
Do not |
---|---|
✓ Develop a positive relationship with business leaders responsible for making decisions. ✓ Make yourself aware of ongoing and upcoming projects. ✓ Develop expertise in organization-specific technology. ✓ Make the business aware of your expenses through chargebacks or showbacks. ✓ Use your understanding of business projects to predict business needs; do not rely on business leaders’ technical requests alone. |
X Be reactive. X Accept capacity/availability demands uncritically. X Ask line of business managers for specific computing requirements unless they have the technical expertise to make informed judgments. X Treat IT as an opaque entity where requests go in and services come out (this can lead to irresponsible requests). |
Demand: manage or be managed
You might think you can get away with uncritically accepting your users’ demands, but this is not best practice. If you provide it, they will use it.
The company meeting
“I don’t need this much RAM,” the application developer said, implausibly. Titters wafted above the assembled crowd as her IT colleagues muttered their surprise. Heads shook, eyes widened. In fact, as she sat pondering her utterance, the developer wasn’t so sure she believed it herself. Noticing her consternation, the infrastructure manager cut in and offered the RAM anyway, forestalling the inevitable crisis that occurs when seismic internal shifts rock fragile self-conceptions. Until next time, he thought.
"Work expands as to fill the resources available for its completion…"
– C. Northcote Parkinson, quoted in Klimek et al.
Combine historical data with the needs you’ve solicited to holistically project your future needs
Predicting the future is difficult, but when it comes to capacity management, foresight is necessary.
Critical inputs
In order to project your future needs, the following inputs are necessary.
- Usage trends: While it is true that past performance is no indication of future demand, trends are still a good way to validate requests from the business.
- Line of business requests: An understanding of the projects the business has in the pipes is important for projecting future demand.
- Institutional knowledge: Read between the lines. As experts on information technology, the IT department is well-equipped to translate needs into requirements.
Follow best practice guidelines to maximize the efficiency of your availability and capacity management process
Understand how the key frameworks relate and interact
BA104: Manage availability and capacity
- Current state assessment
- Forecasting based on business requirements
- Risk assessment of planning and implementation of requirements
Availability management
- Determine business requirements
- Match requirements to capabilities
- Address any mismatch between requirements and capabilities in a cost-effective manner
Capacity management
- Monitoring services and components
- Tuning for efficiency
- Forecasting future requirements
- Influencing demand
- Producing a capacity plan
Availability and capacity management
- Conduct a business impact analysis
- Establish visibility into critical systems
- Solicit and incorporate business needs
- Identify and mitigate risks
Disaster recovery and business continuity planning are forms of availability management
The scope of this project is managing day-to-day availability, largely but not exclusively, in the context of capacity. For additional important information on availability, see the following Info-Tech projects.
- Develop a Business Continuity Plan
If your focus is on ensuring process continuity in the event of a disaster.
- Establish a Program to Enable Effective Performance Monitoring
If your focus is on flow mapping and transaction monitoring as part of a plan to engage APM vendors.
- Create a Right-Sized Disaster Recovery Plan
If your focus is on hardening your IT systems against major events.
Info-Tech’s approach to availability and capacity management is stakeholder-centered and cloud ready
Phase 1: Conduct a business impact analysis |
Phase 2: Establish visibility into core systems |
Phase 3: Solicit and incorporate business needs |
Phase 4: Identify and mitigate risks |
---|---|---|---|
1.1 Conduct a business impact analysis 1.2 Assign criticality ratings to services |
2.1 Define your monitoring strategy 2.2 Implement monitoring tool/aggregator |
3.1 Solicit business needs 3.2 Analyze data and project future needs |
4.1 Identify and mitigate risks |
Deliverables |
|||
|
|
|
|
Info-Tech offers various levels of support to best suit your needs
DIY Toolkit
“Our team has already made this critical project a priority, and we have the time and capability, but some guidance along the way would be helpful.”
Guided Implementation
“Our team knows that we need to fix a process, but we need assistance to determine where to focus. Some check-ins along the way would help keep us on track.”
Workshop
“We need to hit the ground running and get this project kicked off immediately. Our team has the ability to take this over once we get a framework and strategy in place.”
Consulting
“Our team does not have the time or the knowledge to take this project on. We need assistance through the entirety of this project.”
Diagnostics and consistent frameworks used throughout all four options
Availability & capacity management – project overview
Conduct a business impact analysis |
Establish visibility into core systems |
Solicit and incorporate business needs |
Identify and | |
---|---|---|---|---|
Best-Practice Toolkit |
1.1 Create a scale to measure different levels of impact 1.2 Assign criticality ratings to services |
2.1 Define your monitoring strategy 2.2 Implement your monitoring tool/aggregator |
3.1 Solicit business needs and gather data 3.2 Analyze data and project future needs |
4.1 Identify and mitigate risks |
Guided Implementations |
Call 1: Conduct a business impact analysis | Call 1: Discuss your monitoring strategy |
Call 1: Develop a plan to gather historical data; set up plan to solicit business needs Call 2: Evaluate data sources |
Call 1: Discuss possible risks and strategies for risk mitigation Call 2: Review your capacity management plan |
Onsite Workshop |
Module 1: Conduct a business impact analysis |
Module 2: Establish visibility into core systems |
Module 3: Develop a plan to project future needs |
Module 4: Identify and mitigate risks |
Phase 1 Results:
|
Phase 2 Results:
|
Phase 3 Results:
|
Phase 4 Results:
|
Workshop overview
Contact your account representative or email Workshops@InfoTech.com for more information.
Workshop Day 1 |
Workshop Day 2 |
Workshop Day 3 |
Workshop Day 4 | |
---|---|---|---|---|
Conduct a business |
Establish visibility into |
Solicit and incorporate business needs |
Identify and mitigate risks |
|
Activities |
1.1 Conduct a business impact analysis 1.2 Create a list of critical dependencies 1.3 Identify critical sub-components 1.4 Develop best practices to negotiate SLAs |
2.1 Determine indicators for sub-components 2.2 Establish visibility into components 2.3 Develop strategies to ameliorate visibility issues |
3.1 Gather relevant business-level data 3.2 Gather relevant service-level data 3.3 Analyze historical trends 3.4 Build a list of business stakeholders 3.5 Directly solicit requirements from the business 3.6 Map business needs to technical requirements 3.7 Identify inefficiencies and compare historical data |
|
Deliverables |
|
|
|
|
PHASE 1
Conduct a Business Impact Analysis
Step 1.1: Conduct a business impact analysis
This step will walk you through the following activities:
- Record applications and dependencies in the Business Impact Analysis Tool.
- Define a scale to estimate the impact of various applications’ downtime.
- Estimate the impact of applications’ downtime.
This involves the following participants:
- Capacity manager
- Infrastructure team
Outcomes of this step
- Estimated impact of downtime for various applications
Execute a business impact analysis (BIA) as part of a broader availability plan
1.1a Business Impact Analysis Tool
Business impact analyses are an invaluable part of a broader IT strategy. Conducting a BIA benefits a variety of processes, including disaster recovery, business continuity, and availability and capacity management
STEP 1 |
STEP 2 |
STEP 3 |
STEP 4 |
STEP 5 |
---|---|---|---|---|
Record applications and dependencies Utilize your asset management records and document the applications and systems that IT is responsible for managing and recovering during a disaster. |
Define impact scoring scale Ensure an objective analysis of application criticality by establishing a business impact scale that applies to all applications. |
Estimate impact of downtime Leverage the scoring criteria from the previous step and establish an estimated impact of downtime for each application. |
Identify desired RTO and RPO Define what the RTOs/RPOs should be based on the impact of a business interruption and the tolerance for downtime and data loss. |
Determine current RTO/RPO Conduct tabletop planning and create a flowchart of your current capabilities. Compare your current state to the desired state from the previous step. |
Info-Tech Insight
Engaging in detailed capacity planning for an insignificant service draws time and resources away from more critical capacity planning exercises. Time spent tracking and planning use of the ancient fax machine in the basement is time you’ll never get back.
Control the scope of your availability and capacity management planning project with a business impact analysis
Don’t avoid conducting a BIA because of a perception that it’s too onerous or not necessary. If properly managed, as described in this blueprint, the BIA does not need to be onerous and the benefits are tangible.
A BIA enables you to identify appropriate spend levels, continue to drive executive support, and prioritize disaster recovery planning for a more successful outcome. For example, an Info-Tech survey found that a BIA has a significant impact on setting appropriate recovery time objectives (RTOs) and appropriate spending.
Terms
No BIA: lack of a BIA, or a BIA bases solely on the perceived importance of IT services.
BIA: based on a detailed evaluation or estimated dollar impact of downtime.
Source: Info-Tech Research Group; N=70
Select the services you wish to evaluate with the Business Impact Analysis Tool
1.1b 1 hour
In large organizations especially, collating an exhaustive list of applications and services is going to be onerous. For the purposes of this project, a subset should suffice.
Instructions
- Gather a diverse group of IT staff and end users in a room with a whiteboard.
- Solicit feedback from the group. Questions to ask:
- What services do you regularly use? What do you see others using? (End users)
- Which service inspires the greatest number of service calls? (IT)
- What services are you most excited about? (Management)
- What services are the most critical for business operations? (Everybody)
Input
- Applications/services
Output
- Candidate applications for the business impact analysis
Materials
- Whiteboard
- Markers
Participants
- Infrastructure manager
- Enterprise architect
- Application owners
- End users
Info-Tech Insight
Include a variety of services in your analysis. While it might be tempting to jump ahead and preselect important applications, don’t. The process is inherently valuable, and besides, it might surprise you.
Record the applications and dependencies in the BIA tool
1.1c Use tab 1 of the Business Impact Analysis Tool
- In the Application/System column, list the applications identified for this pilot as well as the Core Infrastructure category. Also indicate the Impact on the Business and Business Owner.
- List the dependencies for each application in the appropriate columns:
- Hosted On-Premises (In-House) – If the physical equipment is in a facility you own, record it here, even if it is managed by a vendor.
- Hosted by a Co-Lo/MSP – List any dependencies hosted by a co-lo/MSP vendor.
- Cloud (includes "as a Service”) – List any dependencies hosted by a cloud vendor.
Note: If there are no dependencies for a particular category, leave it blank.
Example
ID is optional. It is a sequential number by default.
In-House, Co-Lo/MSP, and Cloud dependencies; leave blank if not applicable.
Add notes as applicable – e.g. critical support services.
Define a scoring scale to estimate different levels of impact
1.1d Use tab 2 of the Business Impact Analysis Tool
Modify the Business Impact Scales headings and Overall Criticality Rating terminology to suit your organization. For example, if you don’t have business partners, use that column to measure a different goodwill impact or just ignore that column in this tool (i.e. leave it blank). Estimate the different levels of potential impact (where four is the highest impact and zero is no impact) and record these in the Business Impact Scales columns.
Estimate the impact of downtime for each application
1.1e Use tab 3 of the Business Impact Analysis Tool
In the BIA tab columns for Direct Costs of Downtime, Impact on Goodwill, and Additional Criticality Factors, use the drop-down menu to assign a score of zero to four based on levels of impact defined in the Scoring Criteria tab. For example, if an organization’s ERP is down, and that affects call center sales operations (e.g. ability to access customer records and process orders), the impact might be as described below:
- Loss of Revenue might score a two or three depending on the proportion of overall sales lost due to the downtime.
- The Impact on Customers might be a one or two depending on the extent that existing customers might be using the call center to purchase new products or services, and are frustrated by the inability to process orders.
- The Legal/Regulatory Compliance and Health or Safety Risk might be a zero.
On the other hand, if payroll processing is down, this may not impact revenue, but it certainly impacts internal goodwill and productivity.
Rank service criticality: gold, silver, and bronze
Gold
Mission critical services. An outage is catastrophic in terms of cost or public image/goodwill. Example: trading software at a financial institution.
Silver
Important to daily operations, but not mission critical. Example: email services at any large organization.
Bronze
Loss of these services is an inconvenience more than anything, though they do serve a purpose and will be missed if they are never brought back online. Example: ancient fax machines.
Info-Tech Best Practice
Info-Tech recommends gold, silver, and bronze because of this typology’s near universal recognition. If you would prefer a particular designation (it might help with internal comprehension), don’t hesitate to use that one instead.
Use the results of the business impact analysis to sort systems based on their criticality
1.1f 1 hour
Every organization has its own rules about how to categorize service importance. For some (consumer-facing businesses, perhaps) reputational damage may trump immediate costs.
Instructions
- Gather a group of key stakeholders and project the completed Business Impact Analysis Tool onto a screen for them.
- Share the definitions of gold, silver, and bronze services with them (if they are not familiar), and begin sorting the services by category,
- How long would it take to notice if a particular service went out?
- How important are the non-quantifiable damages that could come with an outage?
Input
- Results of the business impact analysis exercise
Output
- List of gold, silver, and bronze systems
Materials
- Projector
- Business Impact Analysis Tool
- Capacity Plan Template
Participants
- Infrastructure manager
- Enterprise architect
Leverage the rest of the BIA tool as part of your disaster recovery planning
Disaster recovery planning is a critical activity, and while it is a sort of availability management, it is beyond this project’s scope. You can complete the business impact analysis (including RTOs and RPOs) for the complete disaster recovery package.
See Info-Tech’s Create a Right-Sized Disaster Recovery Plan blueprint for instructions on how to complete your business impact analysis.
Step 1.2: Assign criticality ratings to services
This step will walk you through the following activities:
- Create a list of dependencies for your most important applications.
- Identify important sub-components.
- Use best practices to develop and negotiate SLAs.
This involves the following participants:
- Capacity manager
- Infrastructure team
Outcomes of this step
- List of dependencies of most important applications
- List of important sub-components
- SLAs based on best practices
Determine the base unit of the capacity you’re looking to purchase
Not every IT organization should approach capacity the same way. Needs scale, and larger organizations will inevitably deal in larger quantities.
Large cloud provider |
Local traditional business |
---|---|
|
|
"Cloud capacity management is not exactly the same as the ITIL version because ITIL has a focus on the component level. I actually don’t do that, because if I did I’d go crazy. There’s too many components in a cloud environment."
– Richie Mendoza, IT Consultant, SMITS Inc.
Consider the relationship between component capacity and service capacity
End users’ thoughts about IT are based on what they see. They are, in other words, concerned with service availability: does the organization have the ability to provide access to needed services?
Service
- CRM
- ERP
Component
- Switch
- SMTP server
- Archive database
- Storage
"You don’t ask the CEO or the guy in charge ‘What kind of response time is your requirement?’ He doesn’t really care. He just wants to make sure that all his customers are happy."
– Todd Evans, Capacity and Performance Management SME, IBM.
One telco solved its availability issues by addressing component capacity issues
CASE STUDY
Industry: Telecommunications
Source: Interview
Coffee and Wi-Fi – a match made in heaven
In tens of thousands of coffee shops around the world, patrons make ample use of complimentary Wi-Fi. Wi-Fi is an important part of customers’ coffee shop experience, whether they’re online to check their email, do a YouTube, or update their Googles. So when one telco that provided Wi-Fi access for thousands of coffee shops started encountering availability issues, the situation was serious.
Wi-Fi, whack-a-mole, and web woes
The team responsible for resolving the issue took an ad hoc approach to resolving complaints, fixing issues as they came up instead of taking a systematic approach.
Resolution
Looking at the network as a whole, the capacity manager took a proactive approach by using data to identify and rank the worst service areas, and then directing the team responsible to fix those areas in order of the worst first, then the next worst, and so on. Soon the availability of Wi-Fi service was restored across the network.
Create a list of dependencies for your most important applications
1.2a 1.5 hours
Instructions
- Work your way down the list of services outlined in step 1, starting with your gold systems. During the first iteration of this exercise select only 3-5 of your most important systems.
- Write the name of each application on a sticky note or at the top of a whiteboard (leaving ample space below for dependency mapping).
- In the first tier below the application, include the specific services that the general service provides.
- This will vary based on the service in question, but an example for email is sending, retrieving, retrieving online, etc.
Input
- List of important applications
Output
- List of critical dependencies
Materials
- Whiteboard
- Markers
- Sticky notes
Participants
- Infrastructure manager
- Enterprise architect
Info-Tech Insight
Dependency mapping can be difficult. Make sure you don’t waste effort creating detailed dependency maps for relatively unimportant services.
Dependency mapping can be difficult. Make sure you don’t waste effort creating detailed dependency maps for relatively unimportant services.
Ride sharing cannot work, at least not at maximum effectiveness, without these constituent components. When one or more of these components are absent or degraded, the service will become unavailable. This example illustrates some challenges of capacity management; some of these components are necessary, but beyond the ride-sharing company’s control.
Leverage a sample dependency tree for a common service
Info-Tech Best Practice
Email is an example here not because it is necessarily a “gold system,” but because it is common across industries. This is a useful exercise for any service, but it can be quite onerous, so it should be conducted on the most important systems first.
Separate the wheat from the chaff; identify important sub-components and separate them from unimportant ones
1.2b 1.5 hours
Use the bottom layer of the pyramid drawn in step 1.2a for a list of important sub-components.
Instructions
- Record a list of the gold services identified in the previous activity. Leave space next to each service for sub-components.
- Go through each relevant sub-component. Highlight those that are critical and could reasonably be expected to cause problems.
- Has this sub-component caused a problem in the past?
- Is this sub-component a bottleneck?
- What could cause this component to fail? Is it such an occurrence feasible?
Input
- List of important applications
Output
- List of critical dependencies
Materials
- Whiteboard
- Markers
Participants
- Infrastructure manager
- Enterprise architect
Understand availability commitments with SLAs
With the rise of SaaS, cloud computing, and managed services, critical services and their components are increasingly external to IT.
- IT’s lack of access to the internal working of services does not let them off the hook for performance issues (as much as that might be the dream).
- Vendor management is availability management. Use the dependency map drawn earlier in this phase to highlight the components of critical services that rely on capacity that cannot be managed internally.
- For each of these services ensure that an appropriate SLA is in place. When acquiring new services, ensure that the vendor SLA meets business requirements.
In terms of service provision, capacity management is a form of availability management. Not all availability issues are capacity issues, but the inverse is true.
Info-Tech Insight
Capacity issues will always cause availability issues, but availability issues are not inherently capacity issues. Availability problems can stem from outages unrelated to capacity (e.g. power or vendor outages).
Use best practices to develop and negotiate SLAs
1.2c 20 minutes per service
When signing contracts with vendors, you will be presented with an SLA. Ensure that it meets your requirements.
- Use the business impact analysis conducted in this project’s first step to determine your requirements. How much downtime can you tolerate for your critical services?
- Once you have been presented with an SLA, be sure to scour it for tricks. Remember, just because a vendor offers “five nines” of availability doesn’t mean that you’ll actually get that much uptime. It could be that the vendor is comfortable eating the cost of downtime or that the contract includes provisions for planned maintenance. Whether or not the vendor anticipated your outage does little to mitigate the damage an outage can cause to your business, so be careful of these provisions.
- Ensure that the person ultimately responsible for the SLA (the approver) understands the limitations of the agreement and the implications for availability.
Input
- List of external component dependencies
Output
- SLA requirements
Materials
- Whiteboard
- Markers
Participants
- Infrastructure manager
- Enterprise architect
Info-Tech Insight
Vendors are sometimes willing to eat the cost of violating SLAs if they think it will get them a contract. Be careful with negotiation. Just because the vendor says they can do something doesn’t make it true.
Negotiate internal SLAs using Info-Tech’s rigorous process
Talking past each other can drive misalignment between IT and the business, inconveniencing all involved. Quantify your needs through an internal SLA as part of a comprehensive availability management plan.
See Info-Tech’s Improve IT-Business Alignment Through an Internal SLA blueprint for instructions on why you should develop internal SLAs and the potential benefits they bring.
If you want additional support, have our analysts guide you through this phase as part of an Info-Tech workshop.
Book a workshop with our Info-Tech analysts:
- To accelerate this project, engage your IT team in an Info-Tech workshop with an Info-Tech analyst team.
- Info-Tech analysts will join you and your team onsite at your location or welcome you to Info-Tech’s historic Toronto office to participate in an innovative onsite workshop.
- Contact your account manager (www.infotech.com/account), or email Workshops@InfoTech.com for more information.
The following are sample activities that will be conducted by Info-Tech analysts with your team:
1.2
Create a list of dependencies for your most important applications
Using the results of the business impact analysis, the analyst will guide workshop participants through a dependency mapping exercise that will eventually populate the Capacity Plan Template.
Phase 1 Guided Implementation
Call 1-888-670-8889 or email GuidedImplementations@InfoTech.com for more information.
Complete these steps on your own, or call us to complete a guided implementation. A guided implementation is a series of 2-3 advisory calls that help you execute each phase of a project. They are included in most advisory memberships.
Guided Implementation 1: Conduct a business impact analysis Proposed Time to Completion: 1 week | |
---|---|
Step 1.1: Create a scale to measure different levels of impact Review your findings with an analyst Discuss how you arrived at the rating of your critical systems and their dependencies. Consider whether your external SLAs are appropriate. Then complete these activities…
With these tools & templates: Business Impact Analysis Tool |
Step 1.2: Assign criticality ratings to services Review your findings with an analyst Discuss how you arrived at the rating of your critical systems and their dependencies. Consider whether your external SLAs are appropriate. Then complete these activities…
With these tools & templates: Capacity Snapshot Tool |
Phase 1 Results & Insights:
|
PHASE 2
Establish Visibility Into Core Systems
Step 2.1: Define your monitoring strategy
This step will walk you through the following activities:
- Determine the indicators you should be tracking for each sub-component.
This involves the following participants:
- Capacity manager
- Infrastructure team
Outcomes of this step
- List of indicators to track for each sub-component
Data has its significance—but also its limitations
The rise of big data can be a boon for capacity managers, but be warned: not all data is created equal. Bad data can lead to bad decisions – and unemployed capacity managers.
Your findings are only as good as your data. Remember: garbage in, garbage out. There are three characteristics of good data:*
- Accuracy: is the data exact and correct? More detail and confidence is better.
- Reliability: is the data consistent? In other words, if you run the same test twice will you get the same results?
- Validity: is the information gleaned believable and relevant?
*National College of Teaching & Leadership, “Reliability and Validity”
"Data is king. Good data is absolutely essential to [the capacity manager] role."
– Adrian Blant, Independent Capacity Consultant, IT Capability Solutions
Info-Tech Best Practice
Every organization’s data needs are different; your data needs are going to be dictated by your services, delivery model, and business requirements. Make sure you don’t confuse volume with quality, even if others in your organization make that mistake.
Take advantage of technology to establish visibility into your systems
Managing your availability and capacity involves important decisions about what to monitor and how thresholds should be set.
- Use the list of critical applications developed through the business impact analysis and the list of components identified in the dependency mapping exercise to produce a plan for effectively monitoring component availability and capacity.
- The nature of IT service provision – the multitude of vendors providing hardware and services necessary for even simple IT services to work effectively – means that it is unlikely that capacity management will be visible through a single pane of glass. In other words, “email” and “CRM” don’t have a defined capacity. It always depends.
- Establishing visibility into systems involves identifying what needs to be tracked for each component.
Too much monitoring can be as bad as the inverse
In 2013, a security breach at US retailer Target compromised more than 70 million customers’ data. The company received an alert, but it was thought to be a false positive because the monitoring system produced so many false and redundant alerts. As a result of the daily deluge, staff did not respond to the breach in time.
Info-Tech Insight
Don’t confuse monitoring with management. While establishing visibility is a crucial step, it is only part of the battle. Move on to this project’s next phase to explore opportunities to improve your capacity/availability management process.
Determine the indicators you should be tracking for each sub-component
2.1a Tab 3 of the Capacity Snapshot Tool
It is nearly impossible to overstate the importance of data to the process of availability and capacity management. But the wrong data will do you no good.
Instructions
- Open the Capacity Snapshot Tool to tab 2. The tool should have been populated in step 1.2 as part of the component mapping exercise.
- For each service, determine which metric(s) would most accurately tell the component’s story. Consider the following questions when completing this activity (you may end up with more than one metric):
- How would the component’s capacity be measured (storage space, RAM, bandwidth, vCPUs)?
- Is the metric in question actionable?
Info-Tech Insight
Bottlenecks are bad. Use the Capacity Snapshot Tool (or another tool like it) to ensure that when the capacity manager leaves (on vacation, to another role, for good) the knowledge that they have accumulated does not leave as well.
Understand the limitations of this approach
Although we’ve striven to make it as easy as possible, this process will inevitably be cumbersome for organizations with a complicated set of software, hardware, and cloud services.
Tracking every single component in significant detail will produce a lot of noise for each bit of signal. The approach outlined here addresses that concern in two ways:
- A focus on gold services
- A focus on sub-components that have a reasonable likelihood of being problematic in the future.
Despite this effort, however, managing capacity at the component level is a daunting task. Ultimately, tools provided by vendors like SolarWinds and AppDynamics will fill in some of the gaps. Nevertheless, an understanding of the conceptual framework underlying availability and capacity management is valuable.
Step 2.2: Implement your monitoring tool/aggregator
This step will walk you through the following activities:
- Clarify visibility.
- Determine whether or not you have sufficiently granular visibility.
- Develop strategies to .any visibility issues.
This involves the following participants:
- Capacity manager
- Infrastructure team
- Applications personnel
Outcomes of this step
- Method for measuring and monitoring critical sub-components
Companies struggle with performance monitoring because 95% of IT shops don’t have full visibility into their environments
CASE STUDY
Industry: Financial Services
Source: AppDynamics
Challenge
- Users are quick to provide feedback when there is downtime or application performance degradation.
- The challenge for IT teams is that while they can feel the pain, they don’t have visibility into the production environment and thus cannot identify where the pain is coming from.
- The most common solution that organizations rely on is leveraging the log files for issue diagnosis. However, this method is slow and often unable to pinpoint the problem areas, leading to delays in problem resolution.
Solution
- Application and infrastructure teams need to work together to develop infrastructure flow maps and transaction profiles.
- These diagrams will highlight the path that each transaction travels across your infrastructure.
- Ideally at this point, teams will also capture latency breakdowns across every tier that the business transaction flows through.
- This will ultimately kick start the baselining process.
Results
- Ninety-five percent of IT departments don’t have full visibility into their production environment. As a result, a slow business transaction will often require a war-room approach where SMEs from across the organization gather to troubleshoot.
- Having visibility into the production environment through infrastructure flow mapping and transaction profiling will help IT teams pinpoint problems.
- At the very least, teams will be able to identify common problem areas and expedite the root-cause analysis process.
Source: “Just how complex can a Login Transaction be? Answer: Very!,” AppDynamics
Monitor your critical sub-components
Establishing a monitoring plan for your capacity involves answering two questions: can I see what I need to see, and can I see it with sufficient granularity?
- Having the right tool for the job is an important step towards effective capacity and availability management.
- Application performance management tools (APMs) are essential to the process, but they tend to be highly specific and vertically oriented, like using a microscope.
- Some product families can cover a wider range of capacity monitoring functions (SolarWinds, for example). It is still important, however, to codify your monitoring needs.
"You don’t use a microscope to monitor an entire ant farm, but you might use many microscopes to monitor specific ants."
– Fred Chagnon, Research Director, Infrastructure Practice, Info-Tech Research Group
Monitor your sub-components: clarify visibility
2.2a Tab 2 of the Capacity Snapshot Tool
The next step in capacity management is establishing whether or not visibility (in the broad sense) is available into critical sub-components.
Instructions
- Open the Capacity Snapshot Tool and record the list of sub-components identified in the previous step.
- For each sub-component answer the following question:
- Do I have easy access to the information I need to monitor to ensure this component remains available?
- What tool provides the information? Where can it be found?
Monitor your sub-components; determine whether or not you have sufficient granular visibility
2.2b Tab 2 of the Capacity Snapshot Tool
Like ideas and watches, not all types of visibility are created equal. Ensure that you have access to the right information to make capacity decisions.
Instructions
- For each of the sub-components clarify the appropriate level of granularity for the visibility gained to be useful. In the case of storage, for example, is raw usage (in gigabytes) sufficient, or do you need a breakdown of what exactly is taking up the space? The network might be more complicated.
- Record the details of this ideation in the adjacent column.
- Select “Yes” or “No” from the drop-down menu to track the status of each sub-component.
For most mobile phone users, this breakdown is sufficient. For some, more granularity might be necessary.
Info-Tech Insight
Make note of monitoring tools and strategies. If anything changes, be sure to re-evaluate the visibility status. An outdated spreadsheet can lead to availability issues if management is unaware of looming problems.
Develop strategies to ameliorate any visibility issues
2.2c 1 hour
The Capacity Snapshot Tool color-codes your components by status. Green – visibility and granularity are both sufficient; yellow – visibility exists, though not at sufficient granularity; and red – visibility does not exist at all.
Instructions
- Write each of the yellow and red sub-components on a whiteboard or piece of chart paper.
- Brainstorm amelioration strategies for each of the problematic sub-components.
- Does the current monitoring tool have sufficient functionality?
- Does it need to be further configured/customized?
- Do we need a whole new tool?
Input
- Sub-components
- Capacity Snapshot Tool
Output
- Amelioration strategies
Materials
- Whiteboard
- Markers
- Capacity Snapshot Tool
Participants
- Infrastructure manager
Info-Tech Best Practice
It might be that there is no amelioration strategy. Make note of this difficulty and highlight it as part of the risk section of the Capacity Plan Template.
See Info-Tech’s projects on storage and network modernization for additional details
Leverage other products for additional details on how to modernize your network and storage services.
The process of modernizing the network is fraught with vestigial limitations. Develop a program to gather requirements and plan.
As part of the blueprint, Modernize Enterprise Storage, the Modernize Enterprise Storage Workbook includes a section on storage capacity planning.
If you want additional support, have our analysts guide you through this phase as part of an Info-Tech workshop.
Book a workshop with our Info-Tech analysts:
- To accelerate this project, engage your IT team in an Info-Tech workshop with an Info-Tech analyst team.
- Info-Tech analysts will join you and your team onsite at your location or welcome you to Info-Tech’s historic Toronto office to participate in an innovative onsite workshop.
- Contact your account manager (www.infotech.com/account), or email Workshops@InfoTech.com for more information.
The following are sample activities that will be conducted by Info-Tech analysts with your team:
2.2
Develop strategies to ameliorate visibility issues
The analyst will guide workshop participants in brainstorming potential solutions to visibility issues and record them in the Capacity Snapshot Tool.
Phase 2 Guided Implementation
Call 1-888-670-8889 or email GuidedImplementations@InfoTech.com for more information.
Complete these steps on your own, or call us to complete a guided implementation. A guided implementation is a series of 2-3 advisory calls that help you execute each phase of a project. They are included in most advisory memberships.
Guided Implementation 2: Establish visibility into core systems Proposed Time to Completion: 3 weeks | |
---|---|
Step 2.1: Define your monitoring strategy Review your findings with an analyst Discuss your monitoring strategy and ensure you have sufficient visibility for the needs of your organization. Then complete these activities…
With these tools & templates:
|
Step 2.2: Implement your monitoring tool/aggregator Review your findings with an analyst Discuss your monitoring strategy and ensure you have sufficient visibility for the needs of your organization. Then complete these activities…
With these tools & templates:
|
Phase 2 Results & Insights:
|
PHASE 3
Solicit and Incorporate Business Needs
Step 3.1: Solicit business needs and gather data
This step will walk you through the following activities:
- Build relationships with business stakeholders.
- Analyze usage data and identify trends.
- Correlate usage trends with business needs.
This involves the following participants:
- Capacity manager
- Infrastructure team members
- Business stakeholders
Outcomes of this step
- System for involving business stakeholders in the capacity planning process
- Correlated data on business level, service level, and infrastructure level capacity usage
Summarize your capacity planning activities in the Capacity Plan Template
The availability and capacity management summary card pictured here is a handy way to capture the results of the activities undertaken in the following phases. Note its contents carefully, and be sure to record specific outputs where appropriate. One such card should be completed for each of the gold services identified in the project’s first phase. Make note of the results of the activities in the coming phase, and populate the Capacity Snapshot Tool. These will help you populate the tool.
Info-Tech Best Practice
The Capacity Plan Template is designed to be a part of a broader mapping strategy. It is not a replacement for a dedicated monitoring tool.
Analyze historical trends as a crucial source of data
The first place to look for information about your organization is not industry benchmarks or your gut (though those might both prove useful).
- Where better to look than internally? Use the data you’ve gathered from your APM tool or other sources to understand your historical capacity needs and to highlight any periods of unavailability.
- Consider monitoring the status of the capacity of each of your crucial components. The nature of this monitoring will vary based on the component in question. It can range from a rough Excel sheet all the way to a dedicated application performance monitoring tool.
"In all cases the very first thing to do is to look at trending…The old adage is ‘you don’t steer a boat by its wake,’ however it’s also true that if something is growing at, say, three percent a month and it has been growing at three percent a month for the last twelve months, there’s a fairly good possibility that it’s going to carry on going in that direction."
– Mike Lynch, Consultant, CapacityIQ
Gather relevant data at the business level
3.1a 2 hours per service
A holistic approach to capacity management involves peering beyond the beaded curtain partitioning IT from the rest of the organization and tracking business metrics.
Instructions
- Your service/application owners know how changes in business activities impact their systems. Business level capacity management involves responding to those changes. Ask service/application owners what changes will impact their capacity. Examples include:
- Business volume (net new customers, number of transactions)
- Staff changes (new hires, exits, etc.)
Input
- Brainstorming
- List of gold services
Output
- Business level data
Materials
- In-house solution or commercial tool
Participants
- Capacity manager
- Application/service owners
Gather relevant data at the service level
3.1b 2 hours per service
One level of abstraction down is the service level. Service level capacity management, recall that service level capacity management is about ensuring that IT is meeting SLAs in its service provision.
Instructions
- There should be internal SLAs for each service IT offers. (If not, that’s a good place to start. See Info-Tech’s research on the subject.) Prod each of your service owners for information on the metrics that are relevant for their SLAs. Consider the following:
- Peak hours, requests per second, etc.
- This will usually include some APM data.
Input
- Brainstorming
- List of gold services
Output
- Service level data
Materials
- In-house solution or commercial tool
Participants
- Capacity manager
- Application/service owners
Leverage the visibility into your infrastructure components and compare all of your data over time
You established visibility into your components in the second phase of this project. Use this data, and that gathered at the business and service levels, to begin analyzing your demand over time.
- Different organizations will approach this issue differently. Those with a complicated service catalog and a dedicated capacity manager might employ a tool like TeamQuest. If your operation is small, or you need to get your availability and capacity management activities underway as quickly as possible, you might consider using a simple spreadsheet software like Excel.
- If you choose the latter option, select a level of granularity (monthly, weekly, etc.) and produce a line graph in Excel.
- Example: Employee count (business metric)
Jan |
Feb |
Mar |
Apr |
May |
June |
July |
---|---|---|---|---|---|---|
74 |
80 |
79 |
83 |
84 |
100 |
102 |
Note: the strength of this approach is that it is easy to visualize. Use the same timescale to facilitate simple comparison.
Manage, don’t just monitor; mountains of data need to be turned into information
Information lets you make a decision. Understand the questions you don’t need to ask, and ask the right ones.
"Often what is really being offered by many analytics solutions is just more data or information – not insights."
– Brent Dykes, Director of Data Strategy, Domo
Info-Tech Best Practice
You can have all the data in the world and absolutely nothing valuable to add. Don’t fall for this trap. Use the activities in this phase to structure your data collection operation and ensure that your organization’s availability and capacity management plan is data driven.
Analyze historical trends and track your services’ status
3.1c Tab 3 of the Capacity Snapshot Tool
At-a-glance – it’s how most executives consume all but the most important information. Create a dashboard that tracks the status of your most important systems.
Instructions
- Consult infrastructure leaders for information about lead times for new capacity for relevant sub-components and include that information in the tool.
- Look to historical lead times. (How long does it traditionally take to get more storage?)
- If you’re not sure, contact an in-house expert, or speak to your vendor
This tool collates and presents information gathered from other sources. It is not a substitute for a performance monitoring tool.
Build a list of key business stakeholders
3.1d 10 minutes
Stakeholder analysis is crucial. Lines of authority can be diffuse. Understand who needs to be involved in the capacity management process early on.
Instructions
- With the infrastructure team, brainstorm a group of departments, roles, and people who may impact demand on capacity.
- Go through the list with your team and identify stakeholders from two groups:
- Line of business: who in the business makes use of the service?
- Application owner: who in IT is responsible for ensuring the service is up?
Input
- Gold systems
- Personnel Information
Output
- List of key business stakeholders
Materials
- Whiteboard
- Markers
Participants
- Capacity manager
- Infrastructure staff
Info-Tech Best Practice
Consider which departments are most closely aligned with the business processes that fuel demand. Prioritize those that have the greatest impact. Consider the stakeholders who will make purchasing decisions for increasing infrastructure capacity.
Organize stakeholder meetings
3.1e 10 hours
Establishing a relationship with your stakeholders is a necessary step in managing your capacity and availability.
Instructions
- Gather as many of the stakeholders identified in the previous activity as you can and present information on availability and capacity management
- If you can’t get everyone in the same room, a virtual meeting or even an email blast could get the job done.
- Consider highlighting the trade-offs between cost and availability.
Input
- List of business stakeholders
- Hard work
Output
- Working relationship, trust
- Regular meetings
Materials
- Work ethic
- Executive brief
Participants
- Capacity manager
- Business stakeholders
Info-Tech Insight
The best capacity managers develop new business processes that more closely align their role with business stakeholders. Building these relationships takes hard work, and you must first earn the trust of the business.
Bake stakeholders into the planning process
3.1f Ongoing
Convince, don’t coerce. Stakeholders want the same thing you do. Bake them into the planning process as a step towards this goal.
- Develop a system to involve stakeholders regularly in the capacity planning process.
- Your system will vary depending on the structure and culture of your organization.
- See the case study on the following slide for ideas.
- It may be as simple as setting a recurring reminder in your own calendar to touch base with stakeholders.
- Ensure stakeholders have reasonable expectations about IT’s available resources, the costs of providing capacity, and the lead times required to source additional needed capacity.
Input
- List of business stakeholders
- Ideas
Output
- Capacity planning process that involves stakeholders
Materials
- Meeting rooms
Participants
- Capacity manager
- Business stakeholders
- Infrastructure team
A capacity manager in financial services wrangled stakeholders and produced results
CASE STUDY
Industry: Financial Services
Source: Interview
In financial services, availability is king
In the world of financial services, availability is absolutely crucial. High-value trades occur at all hours, and any institution that suffers outages runs the risk of losing tens of thousands of dollars, not to mention reputational damage.
People know what they want, but sometimes they have to be herded
While line of business managers and application owners understand the value of capacity management, it can be difficult to establish the working relationship necessary for a fruitful partnership.
Proactively building relationships keeps services available
He built relationships with all the department heads on the business side, and all the application owners.
- He met with department heads quarterly.
- He met with application owners and business liaisons monthly.
He established a steering committee for capacity.
He invited stakeholders to regular capacity planning meetings.
- The first half of each meeting was high-level outlook, such as business volume and IT capacity utilization, and included stakeholders from other departments.
- The second half of the meeting was more technical, serving the purpose for the infrastructure team.
He scheduled lunch and learn sessions with business analysts and project managers.
- These are the gatekeepers of information, and should know that IT needs to be involved when things come down the pipeline.
Step 3.2: Analyze data and project future needs
This step will walk you through the following activities:
- Solicit needs from the business.
- Map business needs to technical requirements, and technical requirements to infrastructure requirements.
- Identify inefficiencies in order to remedy them.
- Compare the data across business, component, and service levels, and project your capacity needs.
This involves the following participants:
- Capacity manager
- Infrastructure team members
- Business stakeholders
Outcomes of this step
- Model of how business processes relate to technical requirements and their demand on infrastructure
- Method for projecting future demand for your organization’s infrastructure
- Comparison of current capacity usage to projected demand
“Nobody tells me anything!” – the capacity manager’s lament
Sometimes “need to know” doesn’t register with sales or marketing. Nearly every infrastructure manager can share a story about a time when someone has made a decision that has critically impacted IT infrastructure without letting anyone in IT in on the “secret.”
In brief
Imagine working for a media company as an infrastructure capacity manager. Now imagine that the powers that be have decided to launch a content-focused web service. Seems like something they would do, right? Now imagine you find out about it the same way the company’s subscribers do. This actually happened – and it shouldn’t have. But a similar lack of alignment makes this a real possibility for any organization. If you don’t establish a systematic plan for soliciting and incorporating business requirements, prepare to lose a chunk of your free time. The business should never be able to say, in response to “nobody tells me anything,” “nobody asked.”
Pictured: an artist’s rendering of the capacity manager in question.
Directly solicit requirements from the business
3.2a 30 minutes per stakeholder
Once you’ve established, firmly, that everyone’s on the same team, meet individually with the stakeholders to assess capacity.
Instructions
- Schedule a one-on-one meeting with each line of business manager (stakeholders identified in 3.1). Ideally this will be recurring.
- Experienced capacity managers suggest doing this monthly.
- What are some upcoming major initiatives?
- Is the department going to expand or contract in a noticeable way?
- Have customers taken to a particular product more than others?
Input
- Stakeholder opinions
Output
- Business requirements
Materials
- Whiteboard
- Markers
Participants
- Capacity manager
- Infrastructure staff
Info-Tech Insight
Sometimes line of business managers will evade or ignore you when you come knocking. They do this because they don’t know and they don’t want to give you the wrong information. Explain that a best guess is all you can ask for and allay their fears.
Below, you will find more details about what to look for when soliciting information from the line of business manager you’ve roped into your scheme.
- Consider the following:
- Projected sales pipeline
- Business growth
- Seasonal cycles
- Marketing campaigns
- New applications and features
- New products and services
Directly solicit requirements from the business (optional)
3.2a 1 hour
IT staff and line of business staff come with different skillsets. This can lead to confusion, but it doesn’t have to. Develop effective information solicitation techniques.
Instructions
- Gather your IT staff in a room with a whiteboard. As a group, select a gold service/line of business manager you would like to use as a “practice dummy.”
- Have everyone write down a question they would ask of the line of business representative in a hypothetical business/service capacity discussion.
- As a group discuss the merits of the questions posed:
- Are they likely to yield productive information?
- Are they too vague or specific?
- Is the person in question likely to know the answer?
- Is the information requested a guarded trade secret?
Input
- Workshop participants’ ideas
Output
- Interview skills
Materials
- Whiteboard
- Markers
- Sticky notes
Participants
- Capacity manager
- Infrastructure staff
Map business needs to technical requirements, and technical requirements to infrastructure requirements
3.2b 5 hours
When it comes to mapping technical requirements, IT alone has the ability to effectively translate business needs.
Instructions
- Use your notes from stakeholder meetings to assess the impact of any changes on gold systems.
- For each system brainstorm with infrastructure staff (and any technical experts as necessary) about what the information gleaned from stakeholder discussions. Consider the following discussion points:
- How has demand for the service been trending? Does it match what the business is telling us?
- Have we had availability issues in the past?
- Has the business been right with their estimates in the past?
- E.g. how much RAM does a new email user require?
Input
- Business needs
Output
- Technical and infrastructure requirements
Materials
- Whiteboard
- Markers
Participants
- Capacity manager
- Infrastructure staff
Info-Tech Insight
Adapt the analysis to the needs of your organization. One capacity manager called the one-to-one mapping of business process to infrastructure demand the Holy Grail of capacity management. If this level of precision isn’t attainable, develop your own working estimates using the higher-level data
Avoid putting too much faith in the cloud as a solution to your problem
Has the rise of on-demand, functionally unlimited services eliminated the need for capacity and availability management?
Capacity management The role of the capacity manager is changing, but it still has a purpose. Consider this:
|
Availability management Ensuring services are available is still IT’s wheelhouse, even if that means a shift to a brokerage model:
|
Info-Tech Insight
The cloud comes at the cost of detailed performance data. Sourcing a service through an SLA with a third party increases the need to perform your own performance testing of gold level applications. See performance monitoring.
Beware Parkinson’s law
A consequence of our infinite capacity for creativity, people have the enviable skill of making work. In 1955, C. Northcote Parkinson pointed out this fact in The Economist . What are the implications for capacity management?
"It is a commonplace observation that work expands so as to fill the time available for its completion. Thus, an elderly lady of leisure can spend the entire day in writing and despatching a postcard to her niece at Bognor Regis. An hour will be spent in finding the postcard, another in hunting for spectacles, half-an-hour in a search for the address, an hour and a quarter in composition, and twenty minutes in deciding whether or not to take an umbrella when going to the pillar-box in the next street."
C. Northcote Parkinson, The Economist, 1955
Info-Tech Insight
If you give people lots of capacity, they will use it. Most shops are overprovisioned, and in some cases that’s throwing perfectly good money away. Don’t be afraid to prod if someone requests something that doesn’t seem right.
Optimally align demand and capacity
When it comes to managing your capacity, look for any additional efficiencies.
Questions to ask:
- Are there any infrastructure services that are not being used to their full potential, sitting idle, or allocated to non-critical or zombie functions?
- Are you managing your virtual servers? If, for example, you experience a seasonal spike in demand, are you leaving virtual machines running after the fact?
- Do your organization’s policies and your infrastructure setup allow for the use of development resources for production during periods of peak demand?
- Can you make organizational or process changes in order to satisfy demand more efficiently?
In brief
Who isn’t a sports fan? Big games mean big stakes for pool participants and armchair quarterbacks—along with pressure on the network as fans stream games from their work computers. One organization suffered from this problem, and, instead of taking a hardline and banning all streams, opted to stream the game on a large screen in a conference room where those interested could work for its duration. This alleviated strain on the network and kept staff happy.
Shutting off an idle cloud to cut costs
CASE STUDY
Industry:Professional Services
Source:Interview
24/7 AWS = round-the-clock costs
A senior developer realized that his development team had been leaving AWS instances running without any specific reason.
Why?
The development team appreciated the convenience of an always-on instance and, because the people spinning them up did not handle costs, the problem wasn’t immediately apparent.
Resolution
In his spare time over the course of a month, the senior developer wrote a program to manage the servers, including shutting them down during times when they were not in use and providing remote-access start-up when required. His team alone saved $30,000 in costs over the next six months, and his team lead reported that it would have been more than worth paying the team to implement such a project on company time.
Identify inefficiencies in order to remediate them
3.2c 20 minutes per service
Instructions
- Gather the infrastructure team together and discuss existing capacity and demand. Use the inputs from your data analysis and stakeholder meetings to set the stage for your discussion.
- Solicit ideas about potential inefficiencies from your participants:
- Are VMs effectively allocated? If you need 7 VMs to address a spike, are those VMs being reallocated post-spike?
- Are developers leaving instances running in the cloud?
- Are particular services massively overprovisioned?
- What are the biggest infrastructure line items? Are there obvious opportunities for cost reduction there?
Input
- Gold systems
- Data inputs
Output
- Inefficiencies
Materials
- Whiteboard
- Markers
Participants
- Capacity manager
- Infrastructure staff
Info-Tech Insight
The most effective capacity management takes a holistic approach and looks at the big picture in order to find ways to eliminate unnecessary infrastructure usage, or to find alternate or more efficient sources of required capacity.
Dodging the toll troll by rerouting traffic
CASE STUDY
Industry:Telecommunications
Source: Interview
High-cost lines
The capacity manager at a telecommunications provider mapped out his firm’s network traffic and discovered they were using a number of VP circuits (inter building cross connects) that were very expensive on the scale of their network.
Paying the toll troll
These VP circuits were supplying needed network services to the telecom provider’s clients, so there was no way to reduce this demand.
Resolution
The capacity manager analyzed where the traffic was going and compared this to the cost of the lines they were using. After performing the analysis, he found he could re-route much of the traffic away from the VP circuits and save on costs while delivering the same level of service to their users.
Compare the data across business, component, and service levels, and project your capacity needs
3.2d 2 hour session/meeting
Make informed decisions about capacity. Remember: retain all documentation. It might come in handy for the justification of purchases.
Instructions
- Using either a dedicated tool or generic spreadsheet software like Excel or Sheets, evaluate capacity trends. Ask the following questions:
- Are there times when application performance degraded, and the service level was disrupted?
- Are there times when certain components or systems neared, reached, or exceeded available capacity?
- Are there seasonal variations in demand?
- Are there clear trends, such as ongoing growth of business activity or the usage of certain applications?
- What are the ramifications of trends or patterns in relation to infrastructure capacity?
Compare current capacity to your projections
3.2e Section 5 of the Capacity Plan Template
Capacity management (and, by extension, availability management) is a combination of two balancing acts: cost against capacity and supply and demand.*
Instructions
- Compare your projections with your reality. You already know whether or not you have enough capacity given your lead times. But do you have too much? Compare your sub-component capacity projections to your current state.
- Highlight any outliers. Is there a particular service that is massively overprovisioned?
- Evaluate the reasons for the overprovisioning.
- Is the component critically important?
- Did you get a great deal on hardware?
- Is it an oversight?
*Office of Government Commerce 2001, 119.
In brief
The fractured nature of the capacity management space means that every organization is going to have a slightly different tooling strategy. No vendor has dominated, and every solution requires some level of customization. One capacity manager (a cloud provider, no less!) relayed a tale about a capacity management Excel sheet programmed with 5,000+ lines of code. As much work as that is, a bespoke solution is probably unavoidable.
If you want additional support, have our analysts guide you through this phase as part of an Info-Tech workshop.
Book a workshop with our Info-Tech analysts:
- To accelerate this project, engage your IT team in an Info-Tech workshop with an Info-Tech analyst team.
- Info-Tech analysts will join you and your team onsite at your location or welcome you to Info-Tech’s historic Toronto office to participate in an innovative onsite workshop.
- Contact your account manager (www.infotech.com/account), or email Workshops@InfoTech.com for more information.
The following are sample activities that will be conducted by Info-Tech analysts with your team:
3.2
Map business needs to technical requirements and technical requirements to infrastructure requirements
The analyst will guide workshop participants in using their organization’s data to map out the relationships between applications, technical requirements, and the underlying infrastructure usage.
Phase 3 Guided Implementation
Call 1-888-670-8889 or email GuidedImplementations@InfoTech.com for more information.
Complete these steps on your own, or call us to complete a guided implementation. A guided implementation is a series of 2-3 advisory calls that help you execute each phase of a project. They are included in most advisory memberships.
Guided Implementation 3: Solicit and incorporate business needs Proposed Time to Completion: 2 weeks | |
---|---|
Step 3.1: Solicit business needs and gather data Review your findings with an analyst Discuss the effectiveness of your strategies to involve business stakeholders in the planning process and your methods of data collection and analysis. Then complete these activities…
With these tools & templates: Capacity Plan Template |
Step 3.2: Analyze data and project future needs Review your findings with an analyst Discuss the effectiveness of your strategies to involve business stakeholders in the planning process and your methods of data collection and analysis. Then complete these activities…
With these tools & templates: Capacity Snapshot Tool Capacity Plan Template |
Phase 3 Results & Insights:
|
PHASE 4
Identify and Mitigate Risks
Step 4.1: Identify and mitigate risks
This step will walk you through the following activities:
- Identify potential risks.
- Determine strategies to mitigate risks.
- Complete your capacity management plan.
This involves the following participants:
- Capacity manager
- Infrastructure team members
- Business stakeholders
Outcomes of this step
- Strategies for reducing risks
- Capacity management plan
Understand what happens when capacity/availability management fails
- Services become unavailable. If availability and capacity management are not constantly practiced, an inevitable consequence is downtime or a reduction in the quality of that service. Critical sub-component failures can knock out important systems on their own.
- Money is wasted. In response to fears about availability, it’s entirely possible to massively overprovision or switch entirely to a pay-as-you-go model. This, unfortunately, brings with it a whole host of other problems, including overspending. Remember: infinite capacity means infinite potential cost.
- IT remains reactive and is unable to contribute more meaningfully to the organization. If IT is constantly putting out capacity/availability-related fires, there is no room for optimization and activities to increase organizational maturity. Effective availability and capacity management will allow IT to focus on other work.
Mitigate availability and capacity risks
Availability: how often a service is usable (that is to say up and not too degraded to be effective). Consequences of reduced availability can include financial losses, impacted customer goodwill, and reduced faith in IT more generally.
Causes of availability issues:
- Poor capacity management – a service becomes unavailable when there is insufficient supply to meet demand. This is the result of poor capacity management.
- Scheduled maintenance – services go down for maintenance with some regularity. This needs to be baked into service-level negotiations with vendors.
- Vendor outages – sometimes vendors experience unplanned outages. There is typically a contract provision that covers unplanned outages, but that doesn’t change the fact that your service will be interrupted.
Capacity: a particular component’s/service’s/business’ wiggle room. In other words, its usage ceiling.
Causes of capacity issues:
- Poor demand management – allowing users to run amok without any regard for how capacity is sourced and paid for.
- Massive changes in legitimate demand – more usage means more demand.
- Poor capacity planning – predictable changes in demand that go unaddressed can lead to capacity issues.
Add additional potential causes of availability and capacity risks as needed
4.1a 30 minutes
Availability and capacity issues can stem from a number of different causes. Include a list in your availability and capacity management plan.
Instructions
- Gather the group together. Go around the room and have participants provide examples of incidents and problems that have been the result of availability and capacity issues.
- Pose questions to the group about the source of those availability and capacity issues.
- What could have been done differently to avoid these issues?
- Was the availability/capacity issue a result of a faulty internal/external SLA?
Input
- Capacity Snapshot Tool results
Output
- Additional sources of availability and capacity risks
Materials
- Capacity Plan Template
Participants
- Capacity manager
- Infrastructure staff
Info-Tech Insight
Availability and capacity problems result in incidents, critical incidents, and problems. These are addressed in a separate project (incident and problem management), but information about common causes can streamline that process.
Identify capacity risks and mitigate them
4.1b 30 minutes
Based on your understanding of your capacity needs (through written SLAs and informal but regular meetings with the business) highlight major risks you foresee.
Instructions
- Make a chart with two columns on a whiteboard. They should be labelled “risk” and “mitigation” respectively.
- Record risks to capacity you have identified in earlier activities.
- Refer to the Capacity Snapshot Tool for components that are highlighted in red and yellow. These are specific components that present special challenges. Identify the risk(s) in as much detail as possible. Include service and business risks as well.
- Examples: a marketing push will put pressure on the web server; a hiring push will require more Office 365 licenses; a downturn in registration will mean that fewer VMs will be required to run the service.
Input
- Capacity Snapshot Tool results
Output
- Inefficiencies
Materials
- Whiteboard
- Markers
Participants
- Capacity manager
- Infrastructure staff
Info-Tech Insight
It’s an old adage, but it checks out: don’t come to the table armed only with problems. Be a problem solver and prove IT’s value to the organization.
Identify capacity risks and mitigate them (cont.)
4.1b 1.5 hours
Instructions (cont.)
- Begin developing mitigation strategies. Options for responding to known capacity risks fall into one of two camps:
- Acceptance: responding to the risk is costlier than acknowledging its existence without taking any action. For gold systems, acceptance is typically not acceptable.
- Mitigation: limiting/reducing, eliminating, or transferring risk (Herrera) comprise the sort of mitigation discussed here.
- Limiting/reducing: taking steps to improve the capacity situation, but accepting some level of risk (spinning up a new VM, pushing back on demands from the business, promoting efficiency).
- Eliminating: the most comprehensive (and most expensive) mitigation strategy, elimination could involve purchasing a new server or, at the extreme end, building a new datacenter.
- Transfer: “robbing Peter to pay Paul,” in the words of capacity manager Todd Evans, is one potential way to limit your exposure. Is there a less critical service that can be sacrificed to keep your gold service online?
Input
- Capacity Snapshot Tool results
Output
- Capacity risk mitigations
Materials
- Whiteboard
- Markers
Participants
- Capacity manager
- Infrastructure staff
Info-Tech Insight
It’s an old adage, but it checks out: don’t come to the table armed only with problems. Be a problem solver and prove IT’s value to the organization.
Identify availability risks and mitigate them
4.1c 30 minutes
While capacity management is a form of availability management, it is not the only form. In this activity, outline the specific nature of threats to availability.
Instructions
- Make a chart with two columns on a whiteboard. They should be labelled “risk” and “mitigation” respectively.
- Begin brainstorming general availability risks based on the following sources of information/categories:
- Vendor outages
- Disaster recovery
- Historical availability issues
Input
- Capacity Snapshot Tool results
Output
- Availability risks and mitigations
Materials
- Whiteboard
- Markers
Participants
- Capacity manager
- Infrastructure staff
Info-Tech Best Practice
A dynamic central repository is a good way to ensure that availability issues stemming from a variety of causes are captured and mitigated.
Identify availability risks and mitigate them (cont.)
4.1c 1.5 hours
Although it is easier said than done, identifying potential mitigations is a crucial part of availability management as an activity.
Instructions (cont.)
- Begin developing mitigation strategies. Options for responding to known capacity risks fall into one of two camps:
- Acceptance – responding to the risk is costlier than taking it on. Some unavailability is inevitable, between maintenance and unscheduled downtime. Record this, though it may not require immediate action.
- Mitigation strategies:
- Limiting/reducing – taking steps to increase availability of critical systems. This could include hot spares for unreliable systems or engaging a new vendor.
- Eliminating – the most comprehensive (and most expensive) mitigation strategy. It could include selling.
- Transfer – “robbing Peter to pay Paul,” in the words of capacity manager Todd Evans, is one potential way to limit your exposure. Is there a less critical service that can be sacrificed to keep your gold service online?
Input
- Capacity Snapshot Tool results
Output
- Availability risks and mitigations
Materials
- Whiteboard
- Markers
Participants
- Capacity manager
- Infrastructure staff
Iterate on the process and present your completed availability and capacity management plan
The stakeholders consulted as part of the process will be interested in its results. Share them, either in person or through a collaboration tool.
The current status of your availability and capacity management plan should be on the agenda for every stakeholder meeting. Direct the stakeholders’ attention to the parts of the document that are relevant to them, and solicit their thoughts on the document’s accuracy. Over time you should get a pretty good idea of who among your stakeholder group is skilled at projecting demand, and who over- or underestimates, and by how much. This information will improve your projections and, therefore, your management over time.
Info-Tech Insight
Use the experience gained and the artifacts generated to build trust with the business. The meetings should be regular, and demonstrating that you’re actually using the information for good is likely to make hesitant participants in the process more likely to open up.
If you want additional support, have our analysts guide you through this phase as part of an Info-Tech workshop.
Book a workshop with our Info-Tech analysts:
- To accelerate this project, engage your IT team in an Info-Tech workshop with an Info-Tech analyst team.
- Info-Tech analysts will join you and your team onsite at your location or welcome you to Info-Tech’s historic Toronto office to participate in an innovative onsite workshop.
- Contact your account manager (www.infotech.com/account), or email Workshops@InfoTech.com for more information.
The following are sample activities that will be conducted by Info-Tech analysts with your team:
4.1
Identify capacity risks and mitigate them
The analyst will guide workshop participants in identifying potential risks to capacity and determining strategies for mitigating them.
Phase 4 Guided Implementation
Call 1-888-670-8889 or email GuidedImplementations@InfoTech.com for more information.
Complete these steps on your own, or call us to complete a guided implementation. A guided implementation is a series of 2-3 advisory calls that help you execute each phase of a project. They are included in most advisory memberships.
Guided Implementation 4: Identify and mitigate risks Proposed Time to Completion: 1 week |
---|
Step 4.1: Identify and mitigate risks Review your findings with an analyst
Then complete these activities…
With these tools & templates: Capacity Snapshot Tool Capacity Plan Template |
Phase 4 Results & Insights:
|
Insight breakdown
Insight 1
Components are critical to availability and capacity management.
The CEO doesn’t care about the SMTP server. She cares about meeting customer needs and producing profit. For IT capacity and availability managers, though, the devil is in the details. It only takes one faulty component to knock out a service. Keep track and keep the lights on.
Insight 2
Ask what the business is working on, not what they need.
If you ask them what they need, they’ll tell you – and it won’t be cheap. Find out what they’re going to do, and use your expertise to service those needs. Use your IT experience to estimate the impact of business and service level changes on the components that secure the availability you need.
Insight 3
Cloud shmoud.
The role of the capacity manager might be changing with the advent of the public cloud, but it has not disappeared. Capacity managers in the age of the cloud are responsible for managing vendor relationships, negotiating external SLAs, projecting costs and securing budgets, reining in prodigal divisions, and so on.
Summary of accomplishment
Knowledge Gained
- Impact of downtime on the organization
- Gold systems
- Key dependencies and sub-components
- Strategy for monitoring components
- Strategy for soliciting business needs
- Projected capacity needs
- Availability and capacity risks and mitigations
Processes Optimized
- Availability management
- Capacity management
Deliverables Completed
- Business Impact Analysis
- Capacity Plan Template
Project step summary
Client Project: Develop an Availability and Capacity Management Plan
- Conduct a business impact analysis
- Assign criticality ratings to services
- Define your monitoring strategy
- Implement your monitoring tool/aggregator
- Solicit business needs and gather data
- Analyze data and project future needs
- Identify and mitigate risks
Info-Tech Insight
This project has the ability to fit the following formats:
- Onsite workshop by Info-Tech Research Group consulting analysts.
- Do-it-yourself with your team.
- Remote delivery via Info-Tech Guided Implementation.
Research contributors and experts
Adrian Blant, Independent Capacity Consultant, IT Capability Solutions
Adrian has over 15 years' experience in IT infrastructure. He has built capacity management business processes from the ground up, and focused on ensuring a productive dialogue between IT and the business.
James Zhang, Senior Manager Disaster Recovery, AIG Technology
James has over 20 years' experience in IT and 10 years' experience in capacity management. Throughout his career, he has focused on creating new business processes to deliver value and increase efficiency over the long term.
Mayank Banerjee, CTO, Global Supply Chain Management, HelloFresh
Mayank has over 15 years' experience across a wide range of technologies and industries. He has implemented highly automated capacity management processes as part of his role of owning and solving end-to-end business problems.
Mike Lynch, Consultant, CapacityIQ
Mike has over 20 years' experience in IT infrastructure. He takes a holistic approach to capacity management to identify and solve key problems, and has developed automated processes for mapping performance data to information that can inform business decisions.
Paul Waguespack, Manager of Application Systems Engineering, Tufts Health Plan
Paul has over 10 years' experience in IT. He has specialized in implementing new applications and functionalities throughout their entire lifecycle, and integrating with all aspects of IT operations.
Richie Mendoza, IT Consultant, SMITS Inc.
Richie has over 10 years' experience in IT infrastructure. He has specialized in using demand forecasting to guide infrastructure capacity purchasing decisions, to provide availability while avoiding costly overprovisioning.
Rob Thompson, President, IT Tools & Process
Rob has over 30 years’ IT experience. Throughout his career he has focused on making IT a generator of business value. He now runs a boutique consulting firm.
Todd Evans, Capacity and Performance Management SME, IBM
Todd has over 20 years' experience in capacity and performance management. At Kaiser Permanente, he established a well-defined mapping of the businesses workflow processes to technical requirements for applications and infrastructure.
Bibliography
451 Research. “Best of both worlds: Can enterprises achieve both scalability and control when it comes to cloud?” 451 Research, November 2016. Web.
Allen, Katie. “Work Also Shrinks to Fit the Time Available: And We Can Prove It.” The Guardian. 25 Oct. 2017.
Amazon. “Amazon Elastic Compute Cloud.” Amazon Web Services. N.d. Web.
Armandpour, Tim. “Lies Vendors Tell about Service Level Agreements and How to Negotiate for Something Better.” Network World. 12 Jan 2016.
“Availability Management.” ITIL and ITSM World. 2001. Web.
Availability Management Plan Template. Purple Griffon. 30 Nov. 2012. Web.
Bairi, Jayachandra, B., Murali Manohar, and Goutam Kumar Kundu. “Capacity and Availability Management by Quantitative Project Management in the IT Service Industry.” Asian Journal on Quality 13.2 (2012): 163-76. Web.
BMC Capacity Optimization. BMC. 24 Oct 2017. Web.
Brooks, Peter, and Christa Landsberg. Capacity Management in Today’s IT Environment. MentPro. 16 Aug 2017. Web.
"Capacity and Availability Management." CMMI Institute. April 2017. Web.
Capacity and Availability Management. IT Quality Group Switzerland. 24 Oct. 2017. Web.
Capacity and Performance Management: Best Practices White Paper. Cisco. 4 Oct. 2005. Web.
"Capacity Management." Techopedia.
“Capacity Management Forecasting Best Practices and Recommendations.” STG. 26 Jan 2015. Web.
Capacity Management from the Ground up. Metron. 24 Oct. 2017. Web.
Capacity Management in the Modern Datacenter. Turbonomic. 25 Oct. 2017. Web.
Capacity Management Maturity Assessing and Improving the Effectiveness. Metron. 24 Oct. 2017. Web.
“Capacity Management Software.” TeamQuest. 24 Oct 2017. Web,
Capacity Plan Template. Purainfo. 11 Oct 2012. Web.
“Capacity Planner—Job Description.” Automotive Industrial Partnership. 24 Oct. 2017. Web.
Capacity Planning. CDC. Web. Aug. 2017.
"Capacity Planning." TechTarget. 24 Oct 2017. Web.
“Capacity Planning and Management.” BMC. 24 Oct 2017. Web.
"Checklist Capacity Plan." IT Process Wiki. 24 Oct. 2017. Web.
Dykes, Brent. “Actionable Insights: The Missing Link Between Data and Business Value.” Forbes. April 26, 2016. Web.
Evolved Capacity Management. CA Technologies. Oct. 2013. Web.
Francis, Ryan. “False positives still cause threat alert fatigue.” CSO. May 3, 2017. Web.
Frymire, Scott. "Capacity Planning vs. Capacity Analytics." ScienceLogic. 24 Oct. 2017. Web.
Glossary. Exin. Aug. 2017. Web.
Herrera, Michael. “Four Types of Risk Mitigation and BCM Governance, Risk and Compliance.” MHA Consulting. May 17, 2013.
Hill, Jon. How to Do Capacity Planning. TeamQuest. 24 Oct. 2017. Web.
“How to Create an SLA in 7 Easy Steps.” ITSM Perfection. 25 Oct. 2017. Web.
Hunter, John. “Myth: If You Can’t Measure It: You Can’t Manage It.” W. Edwards Deming Institute Blog. 13 Aug 2015. Web.
IT Service Criticality. U of Bristol. 24 Oct. 2017. Web.
"ITIL Capacity Management." BMC's Complete Guide to ITIL. BMC Software. 22 Dec. 2016. Web.
“Just-in-time.” The Economist. 6 Jul 2009. Web.
Kalm, Denise P., and Marv Waschke. Capacity Management: A CA Service Management Process Map. CA. 24 Oct. 2017. Web.
Klimek, Peter, Rudolf Hanel, and Stefan Thurner. “Parkinson’s Law Quantified: Three Investigations in Bureaucratic Inefficiency.” Journal of Statistical Mechanics: Theory and Experiment 3 (2009): 1-13. Aug. 2017. Web.
Landgrave, Tim. "Plan for Effective Capacity and Availability Management in New Systems." TechRepublic. 10 Oct. 2002. Web.
Longoria, Gina. “Hewlett Packard Enterprise Goes After Amazon Public Cloud in Enterprise Storage.” Forbes. 2 Dec. 2016. Web.
Maheshwari, Umesh. “Understanding Storage Capacity.” NimbleStorage. 7 Jan. 2016. Web.
Mappic, Sandy. “Just how complex can a Login Transaction be? Answer: Very!” Appdynamics. Dec. 11 2011. Web.
Miller, Ron. “AWS Fires Back at Larry Ellison’s Claims, Saying It’s Just Larry Being Larry.” Tech Crunch. 2 Oct. 2017. Web.
National College for Teaching & Leadership. “The role of data in measuring school performance.” National College for Teaching & Leadership. N.d. Web,
Newland, Chris, et al. Enterprise Capacity Management. CETI, Ohio State U. 24 Oct. 2017. Web.
Office of Government Commerce . Best Practice for Service Delivery. London: Her Majesty’s Stationery Office, 2001.
Office of Government Commerce. Best Practice for Business Perspective: The IS View on Delivering Services to the Business. London: Her Majesty’s Stationery Office, 2004.
Parkinson, C. Northcote. “Parkinson’s Law.” The Economist. 19 Nov. 1955. Web.
“Parkinson’s Law Is Proven Again.” Financial Times. 25 Oct. 2017. Web.
Paul, John, and Chris Hayes. Performance Monitoring and Capacity Planning. VM Ware. 2006. Web.
“Reliability and Validity.” UC Davis. N.d. Web.
"Role: Capacity Manager." IBM. 2008. Web.
Ryan, Liz. “‘If You Can’t Measure It, You Can’t Manage It’: Not True.” Forbes. 10 Feb. 2014. Web.
S, Lalit. “Using Flexible Capacity to Lower and Manage On-Premises TCO.” HPE. 23 Nov. 2016. Web.
Snedeker, Ben. “The Pros and Cons of Public and Private Clouds for Small Business.” Infusionsoft. September 6, 2017. Web.
Statement of Work: IBM Enterprise Availability Management Service. IBM. Jan 2016. Web.
“The Road to Perfect AWS Reserved Instance Planning & Management in a Nutshell.” Botmetric. 25 Oct. 2017. Web.
Transforming the Information Infrastructure: Build, Manage, Optimize. Asigra. Aug. 2017. Web.
Valentic, Branimir. "Three Faces of Capacity Management." ITIL/ISO 20000 Knowledge Base. Advisera. 24 Oct. 2017. Web.
"Unify IT Performance Monitoring and Optimization." IDERA. 24 Oct. 2017. Web.
"What is IT Capacity Management?" Villanova U. Aug. 2017. Web.
Wolstenholme, Andrew. Final internal Audit Report: IT Availability and Capacity (IA 13 519/F). Transport For London. 23 Feb. 2015. Web.