Besides the small introduction, subscribers and consulting clients within this management domain have access to:
Determine the most critical business services to ensure availability.
Craft a monitoring strategy to gather usage data.
Integrate business stakeholders into the capacity management process.
Identify and mitigate risks to your capacity and availability.
[infographic]
Workshops offer an easy way to accelerate your project. If you are unable to do the project yourself, and a Guided Implementation isn't enough, we offer low-cost delivery of our project workshops. We take you through every phase of your project and ensure that you have a roadmap in place to complete your project successfully.
Determine the most important IT services for the business.
Understand which services to prioritize for ensuring availability.
1.1 Create a scale to measure different levels of impact.
1.2 Evaluate each service by its potential impact.
1.3 Assign a criticality rating based on the costs of downtime.
RTOs/RPOs
List of gold systems
Criticality matrix
Monitor and measure usage metrics of key systems.
Capture and correlate data on business activity with infrastructure capacity usage.
2.1 Define your monitoring strategy.
2.2 Implement your monitoring tool/aggregator.
RACI chart
Capacity/availability monitoring strategy
Determine how to project future capacity usage needs for your organization.
Data-based, systematic projection of future capacity usage needs.
3.1 Analyze historical usage trends.
3.2 Interface with the business to determine needs.
3.3 Develop a plan to combine these two sources of truth.
Plan for soliciting future needs
Future needs
Identify potential risks to capacity and availability.
Develop strategies to ameliorate potential risks.
Proactive approach to capacity that addresses potential risks before they impact availability.
4.1 Identify capacity and availability risks.
4.2 Determine strategies to address risks.
4.3 Populate and review completed capacity plan.
List of risks
List of strategies to address risks
Completed capacity plan
"Nobody doubts the cloud’s transformative power. But will its ascent render “capacity manager” an archaic term to be carved into the walls of datacenters everywhere for future archaeologists to puzzle over? No. While it is true that the cloud has fundamentally changed how capacity managers do their jobs , the process is more important than ever. Managing capacity – and, by extent, availability – means minimizing costs while maximizing uptime. The cloud era is the era of unlimited capacity – and of infinite potential costs. If you put the infinity symbol on a purchase order… well, it’s probably not a good idea. Manage demand. Manage your capacity. Manage your availability. And, most importantly, keep your stakeholders happy. You won’t regret it."
Jeremy Roberts,
Consulting Analyst, Infrastructure Practice
Info-Tech Research Group
✓ CIOs who want to increase uptime and reduce costs
✓ Infrastructure managers who want to deliver increased value to the business
✓ Enterprise architects who want to ensure stability of core IT services
✓ Dedicated capacity managers
✓ Develop a list of core services
✓ Establish visibility into your system
✓ Solicit business needs
✓ Project future demand
✓ Set SLAs
✓ Increase uptime
✓ Optimize spend
✓ Project managers
✓ Service desk staff
✓ Plan IT projects
✓ Better manage availability incidents caused by lack of capacity
According to 451 Research, 59% of enterprises have had to wait 3+ months for new capacity. It is little wonder, then, that so many opt to overprovision. Capacity management is about ensuring that IT services are available, and with lead times like that, overprovisioning can be more attractive than the alternative. Fortunately there is hope. An effective availability and capacity management plan can help you:
Balancing overprovisioning and spending is the capacity manager’s struggle.
If an IT department is unable to meet demand due to insufficient capacity, users will experience downtime or a degradation in service. To be clear, capacity is not the only factor in availability – reliability, serviceability, etc. are significant as well. But no organization can effectively manage availability without paying sufficient attention to capacity.
"Availability Management is concerned with the design, implementation, measurement and management of IT services to ensure that the stated business requirements for availability are consistently met."
– OGC, Best Practice for Service Delivery, 12
"Capacity management aims to balance supply and demand [of IT storage and computing services] cost-effectively…"
– OGC, Business Perspective, 90
Business | The highest level of capacity management, business capacity management, involves predicting changes in the business’ needs and developing requirements in order to make it possible for IT to adapt to those needs. Influx of new clients from a failed competitor. |
---|---|
Service | Service capacity management focuses on ensuring that IT services are monitored to determine if they are meeting pre-determined SLAs. The data gathered here can be used for incident and problem management. Increased website traffic. |
Component | Component capacity management involves tracking the functionality of specific components (servers, hard drives, etc.), and effectively tracking their utilization and performance, and making predictions about future concerns. Insufficient web server compute. |
The C-suite cares about business capacity as part of the organization’s strategic planning. Service leads care about their assigned services. IT infrastructure is concerned with components, but not for their own sake. Components mean services that are ultimately designed to facilitate business.
Industry: Healthcare
Source: Interview
New functionalities require new infrastructure
There was a project to implement an elastic search feature. This had to correlate all the organization’s member data from an Oracle data source and their own data warehouse, and pool them all into an elastic search index so that it could be used by the provider portal search function. In estimating the amount of space needed, the infrastructure team assumed that all the data would be shared in a single place. They didn’t account for the architecture of elastic search in which indexes are shared across multiple nodes and shards are often split up separately.
Beware underestimating demand and hardware sourcing lead times
As a result, they vastly underestimated the amount of space that was needed and ended up short by a terabyte. The infrastructure team frantically sourced more hardware, but the rush hardware order arrived physically damaged and had to be returned to the vendor.
Sufficient budget won’t ensure success without capacity planning
The project’s budget had been more than sufficient to pay for the extra necessary capacity, but because a lack of understanding of the infrastructure impact resulted in improper forecasting, the project ended up stuck in a standstill.
There are three variables that are monitored, measured, and analyzed as part of availability management more generally (Valentic).
The availability of a system is the percentage of time the system is “up,” (and not degraded) which can be calculated using the following formula: uptime/(uptime + downtime) x 100%. The more components there are in a system, the lower the availability, as a rule.
The length of time a component/service can go before there is an outage that brings it down, typically measured in hours.
The amount of time it takes for a component/service to be restored in the event of an outage, also typically measured in hours.
Features of the public cloud | Implications for capacity management |
---|---|
Instant, or near-instant, instantiation | Lead times drop; capacity management is less about ensuring equipment arrives on time. |
Pay-as-you go services | Capacity no longer needs to be purchased in bulk. Pay only for what you use and shut down instances that are no longer necessary. |
Essentially unlimited scalability | Potential capacity is infinite, but so are potential costs. |
Offsite hosting | Redundancy, but at the price of the increasing importance of your internet connection. |
Traditionally, increases in capacity have come in bursts as a reaction to availability issues. This model inevitably results in overprovisioning, driving up costs. Access to the cloud changes the equation. On-demand capacity means that, ideally, nobody should pay for unused capacity.
The cloud reality does not look like the cloud ideal. Even with the ostensibly elastic cloud, vendors like the consistency that longer-term contracts offer. Enter reserved instances: in exchange for lower hourly rates, vendors offer the option to pay a fee for a reserved instance. Usage beyond the reserved will be billed at a higher hourly rate. In order to determine where that line should be drawn, you should engage in detailed capacity planning. Unfortunately, even when done right, this process will result in some overprovisioning, though it does provide convenience from an accounting perspective. The key is to use spot instances where demand is exceptional and bounded. Example: A university registration server that experiences exceptional demand at the start of term but at no other time.
Even in the era of elasticity, capacity planning is crucial. Spot instances – the spikes in the graph above – are more expensive, but if your capacity needs vary substantially, reserving instances for all of the space you need can cost even more money. Efficiently planning capacity will help you draw this line.
Simple and effective. Sometimes a simple display can convey all of the information necessary to manage critical systems. In cars it is important to know your speed, how much fuel is in the tank, and whether or not you need to change your oil/check your engine.
Where to begin?! Specialized information is sometimes necessary, but it can be difficult to navigate.
STEP 1 |
STEP 2 |
STEP 3 |
STEP 4 |
STEP 5 |
---|---|---|---|---|
Record applications and dependencies Utilize your asset management records and document the applications and systems that IT is responsible for managing and recovering during a disaster. |
Define impact scoring scale Ensure an objective analysis of application criticality by establishing a business impact scale that applies to all applications. |
Estimate impact of downtime Leverage the scoring criteria from the previous step and establish an estimated impact of downtime for each application. |
Identify desired RTO and RPO Define what the RTOs/RPOs should be based on the impact of a business interruption and the tolerance for downtime and data loss. |
Determine current RTO/RPO Conduct tabletop planning and create a flowchart of your current capabilities. Compare your current state to the desired state from the previous step. |
According to end users, every system is critical and downtime is intolerable. Of course, once they see how much totally eliminating downtime can cost, they might change their tune. It is important to have this discussion to separate the critical from the less critical – but still important – services.
"It is wrong to suppose that if you can’t measure it, you can’t manage it – a costly myth."
– W. Edwards Deming, statistician and management consultant, author of The New Economics
While it is true that total monitoring is not absolutely necessary for management, when it comes to availability and capacity – objectively quantifiable service characteristics – a monitoring strategy is unavoidable. Capturing fluctuations in demand, and adjusting for those fluctuations, is among the most important functions of a capacity manager, even if hovering over employees with a stopwatch is poor management.
Do |
Do not |
---|---|
✓ Develop a positive relationship with business leaders responsible for making decisions. ✓ Make yourself aware of ongoing and upcoming projects. ✓ Develop expertise in organization-specific technology. ✓ Make the business aware of your expenses through chargebacks or showbacks. ✓ Use your understanding of business projects to predict business needs; do not rely on business leaders’ technical requests alone. |
X Be reactive. X Accept capacity/availability demands uncritically. X Ask line of business managers for specific computing requirements unless they have the technical expertise to make informed judgments. X Treat IT as an opaque entity where requests go in and services come out (this can lead to irresponsible requests). |
The company meeting
“I don’t need this much RAM,” the application developer said, implausibly. Titters wafted above the assembled crowd as her IT colleagues muttered their surprise. Heads shook, eyes widened. In fact, as she sat pondering her utterance, the developer wasn’t so sure she believed it herself. Noticing her consternation, the infrastructure manager cut in and offered the RAM anyway, forestalling the inevitable crisis that occurs when seismic internal shifts rock fragile self-conceptions. Until next time, he thought.
"Work expands as to fill the resources available for its completion…"
– C. Northcote Parkinson, quoted in Klimek et al.
Critical inputs
In order to project your future needs, the following inputs are necessary.
If your focus is on ensuring process continuity in the event of a disaster.
If your focus is on flow mapping and transaction monitoring as part of a plan to engage APM vendors.
If your focus is on hardening your IT systems against major events.
Phase 1: Conduct a business impact analysis |
Phase 2: Establish visibility into core systems |
Phase 3: Solicit and incorporate business needs |
Phase 4: Identify and mitigate risks |
---|---|---|---|
1.1 Conduct a business impact analysis 1.2 Assign criticality ratings to services |
2.1 Define your monitoring strategy 2.2 Implement monitoring tool/aggregator |
3.1 Solicit business needs 3.2 Analyze data and project future needs |
4.1 Identify and mitigate risks |
Deliverables |
|||
|
|
|
|
“Our team has already made this critical project a priority, and we have the time and capability, but some guidance along the way would be helpful.”
“Our team knows that we need to fix a process, but we need assistance to determine where to focus. Some check-ins along the way would help keep us on track.”
“We need to hit the ground running and get this project kicked off immediately. Our team has the ability to take this over once we get a framework and strategy in place.”
“Our team does not have the time or the knowledge to take this project on. We need assistance through the entirety of this project.”
Conduct a business impact analysis |
Establish visibility into core systems |
Solicit and incorporate business needs |
Identify and | |
---|---|---|---|---|
Best-Practice Toolkit |
1.1 Create a scale to measure different levels of impact 1.2 Assign criticality ratings to services |
2.1 Define your monitoring strategy 2.2 Implement your monitoring tool/aggregator |
3.1 Solicit business needs and gather data 3.2 Analyze data and project future needs |
4.1 Identify and mitigate risks |
Guided Implementations |
Call 1: Conduct a business impact analysis | Call 1: Discuss your monitoring strategy |
Call 1: Develop a plan to gather historical data; set up plan to solicit business needs Call 2: Evaluate data sources |
Call 1: Discuss possible risks and strategies for risk mitigation Call 2: Review your capacity management plan |
Onsite Workshop |
Module 1: Conduct a business impact analysis |
Module 2: Establish visibility into core systems |
Module 3: Develop a plan to project future needs |
Module 4: Identify and mitigate risks |
Phase 1 Results:
|
Phase 2 Results:
|
Phase 3 Results:
|
Phase 4 Results:
|
Contact your account representative or email Workshops@InfoTech.com for more information.
Workshop Day 1 |
Workshop Day 2 |
Workshop Day 3 |
Workshop Day 4 | |
---|---|---|---|---|
Conduct a business |
Establish visibility into |
Solicit and incorporate business needs |
Identify and mitigate risks |
|
Activities |
1.1 Conduct a business impact analysis 1.2 Create a list of critical dependencies 1.3 Identify critical sub-components 1.4 Develop best practices to negotiate SLAs |
2.1 Determine indicators for sub-components 2.2 Establish visibility into components 2.3 Develop strategies to ameliorate visibility issues |
3.1 Gather relevant business-level data 3.2 Gather relevant service-level data 3.3 Analyze historical trends 3.4 Build a list of business stakeholders 3.5 Directly solicit requirements from the business 3.6 Map business needs to technical requirements 3.7 Identify inefficiencies and compare historical data |
|
Deliverables |
|
|
|
|
Business impact analyses are an invaluable part of a broader IT strategy. Conducting a BIA benefits a variety of processes, including disaster recovery, business continuity, and availability and capacity management
STEP 1 |
STEP 2 |
STEP 3 |
STEP 4 |
STEP 5 |
---|---|---|---|---|
Record applications and dependencies Utilize your asset management records and document the applications and systems that IT is responsible for managing and recovering during a disaster. |
Define impact scoring scale Ensure an objective analysis of application criticality by establishing a business impact scale that applies to all applications. |
Estimate impact of downtime Leverage the scoring criteria from the previous step and establish an estimated impact of downtime for each application. |
Identify desired RTO and RPO Define what the RTOs/RPOs should be based on the impact of a business interruption and the tolerance for downtime and data loss. |
Determine current RTO/RPO Conduct tabletop planning and create a flowchart of your current capabilities. Compare your current state to the desired state from the previous step. |
Engaging in detailed capacity planning for an insignificant service draws time and resources away from more critical capacity planning exercises. Time spent tracking and planning use of the ancient fax machine in the basement is time you’ll never get back.
A BIA enables you to identify appropriate spend levels, continue to drive executive support, and prioritize disaster recovery planning for a more successful outcome. For example, an Info-Tech survey found that a BIA has a significant impact on setting appropriate recovery time objectives (RTOs) and appropriate spending.
Terms
No BIA: lack of a BIA, or a BIA bases solely on the perceived importance of IT services.
BIA: based on a detailed evaluation or estimated dollar impact of downtime.
In large organizations especially, collating an exhaustive list of applications and services is going to be onerous. For the purposes of this project, a subset should suffice.
Instructions
Input
Output
Materials
Participants
Include a variety of services in your analysis. While it might be tempting to jump ahead and preselect important applications, don’t. The process is inherently valuable, and besides, it might surprise you.
Note: If there are no dependencies for a particular category, leave it blank.
Example
ID is optional. It is a sequential number by default.
In-House, Co-Lo/MSP, and Cloud dependencies; leave blank if not applicable.
Add notes as applicable – e.g. critical support services.
Modify the Business Impact Scales headings and Overall Criticality Rating terminology to suit your organization. For example, if you don’t have business partners, use that column to measure a different goodwill impact or just ignore that column in this tool (i.e. leave it blank). Estimate the different levels of potential impact (where four is the highest impact and zero is no impact) and record these in the Business Impact Scales columns.
In the BIA tab columns for Direct Costs of Downtime, Impact on Goodwill, and Additional Criticality Factors, use the drop-down menu to assign a score of zero to four based on levels of impact defined in the Scoring Criteria tab. For example, if an organization’s ERP is down, and that affects call center sales operations (e.g. ability to access customer records and process orders), the impact might be as described below:
On the other hand, if payroll processing is down, this may not impact revenue, but it certainly impacts internal goodwill and productivity.
Mission critical services. An outage is catastrophic in terms of cost or public image/goodwill. Example: trading software at a financial institution.
Important to daily operations, but not mission critical. Example: email services at any large organization.
Loss of these services is an inconvenience more than anything, though they do serve a purpose and will be missed if they are never brought back online. Example: ancient fax machines.
Info-Tech recommends gold, silver, and bronze because of this typology’s near universal recognition. If you would prefer a particular designation (it might help with internal comprehension), don’t hesitate to use that one instead.
Every organization has its own rules about how to categorize service importance. For some (consumer-facing businesses, perhaps) reputational damage may trump immediate costs.
Instructions
Input
Output
Materials
Participants
See Info-Tech’s Create a Right-Sized Disaster Recovery Plan blueprint for instructions on how to complete your business impact analysis.
Large cloud provider |
Local traditional business |
---|---|
|
|
"Cloud capacity management is not exactly the same as the ITIL version because ITIL has a focus on the component level. I actually don’t do that, because if I did I’d go crazy. There’s too many components in a cloud environment."
– Richie Mendoza, IT Consultant, SMITS Inc.
Service
Component
"You don’t ask the CEO or the guy in charge ‘What kind of response time is your requirement?’ He doesn’t really care. He just wants to make sure that all his customers are happy."
– Todd Evans, Capacity and Performance Management SME, IBM.
Industry: Telecommunications
Source: Interview
Coffee and Wi-Fi – a match made in heaven
In tens of thousands of coffee shops around the world, patrons make ample use of complimentary Wi-Fi. Wi-Fi is an important part of customers’ coffee shop experience, whether they’re online to check their email, do a YouTube, or update their Googles. So when one telco that provided Wi-Fi access for thousands of coffee shops started encountering availability issues, the situation was serious.
Wi-Fi, whack-a-mole, and web woes
The team responsible for resolving the issue took an ad hoc approach to resolving complaints, fixing issues as they came up instead of taking a systematic approach.
Resolution
Looking at the network as a whole, the capacity manager took a proactive approach by using data to identify and rank the worst service areas, and then directing the team responsible to fix those areas in order of the worst first, then the next worst, and so on. Soon the availability of Wi-Fi service was restored across the network.
Instructions
Input
Output
Materials
Participants
Dependency mapping can be difficult. Make sure you don’t waste effort creating detailed dependency maps for relatively unimportant services.
Ride sharing cannot work, at least not at maximum effectiveness, without these constituent components. When one or more of these components are absent or degraded, the service will become unavailable. This example illustrates some challenges of capacity management; some of these components are necessary, but beyond the ride-sharing company’s control.
Email is an example here not because it is necessarily a “gold system,” but because it is common across industries. This is a useful exercise for any service, but it can be quite onerous, so it should be conducted on the most important systems first.
Use the bottom layer of the pyramid drawn in step 1.2a for a list of important sub-components.
Instructions
Input
Output
Materials
Participants
In terms of service provision, capacity management is a form of availability management. Not all availability issues are capacity issues, but the inverse is true.
Capacity issues will always cause availability issues, but availability issues are not inherently capacity issues. Availability problems can stem from outages unrelated to capacity (e.g. power or vendor outages).
When signing contracts with vendors, you will be presented with an SLA. Ensure that it meets your requirements.
Input
Output
Materials
Participants
Vendors are sometimes willing to eat the cost of violating SLAs if they think it will get them a contract. Be careful with negotiation. Just because the vendor says they can do something doesn’t make it true.
See Info-Tech’s Improve IT-Business Alignment Through an Internal SLA blueprint for instructions on why you should develop internal SLAs and the potential benefits they bring.
1.2
Create a list of dependencies for your most important applications
Using the results of the business impact analysis, the analyst will guide workshop participants through a dependency mapping exercise that will eventually populate the Capacity Plan Template.
Complete these steps on your own, or call us to complete a guided implementation. A guided implementation is a series of 2-3 advisory calls that help you execute each phase of a project. They are included in most advisory memberships.
Guided Implementation 1: Conduct a business impact analysis Proposed Time to Completion: 1 week | |
---|---|
Step 1.1: Create a scale to measure different levels of impact Review your findings with an analyst Discuss how you arrived at the rating of your critical systems and their dependencies. Consider whether your external SLAs are appropriate. Then complete these activities…
With these tools & templates: Business Impact Analysis Tool |
Step 1.2: Assign criticality ratings to services Review your findings with an analyst Discuss how you arrived at the rating of your critical systems and their dependencies. Consider whether your external SLAs are appropriate. Then complete these activities…
With these tools & templates: Capacity Snapshot Tool |
Phase 1 Results & Insights:
|
Your findings are only as good as your data. Remember: garbage in, garbage out. There are three characteristics of good data:*
*National College of Teaching & Leadership, “Reliability and Validity”
"Data is king. Good data is absolutely essential to [the capacity manager] role."
– Adrian Blant, Independent Capacity Consultant, IT Capability Solutions
Every organization’s data needs are different; your data needs are going to be dictated by your services, delivery model, and business requirements. Make sure you don’t confuse volume with quality, even if others in your organization make that mistake.
Too much monitoring can be as bad as the inverse
In 2013, a security breach at US retailer Target compromised more than 70 million customers’ data. The company received an alert, but it was thought to be a false positive because the monitoring system produced so many false and redundant alerts. As a result of the daily deluge, staff did not respond to the breach in time.
Info-Tech Insight
Don’t confuse monitoring with management. While establishing visibility is a crucial step, it is only part of the battle. Move on to this project’s next phase to explore opportunities to improve your capacity/availability management process.
It is nearly impossible to overstate the importance of data to the process of availability and capacity management. But the wrong data will do you no good.
Instructions
Bottlenecks are bad. Use the Capacity Snapshot Tool (or another tool like it) to ensure that when the capacity manager leaves (on vacation, to another role, for good) the knowledge that they have accumulated does not leave as well.
Tracking every single component in significant detail will produce a lot of noise for each bit of signal. The approach outlined here addresses that concern in two ways:
Despite this effort, however, managing capacity at the component level is a daunting task. Ultimately, tools provided by vendors like SolarWinds and AppDynamics will fill in some of the gaps. Nevertheless, an understanding of the conceptual framework underlying availability and capacity management is valuable.
Industry: Financial Services
Source: AppDynamics
Challenge
Solution
Results
Source: “Just how complex can a Login Transaction be? Answer: Very!,” AppDynamics
"You don’t use a microscope to monitor an entire ant farm, but you might use many microscopes to monitor specific ants."
– Fred Chagnon, Research Director, Infrastructure Practice, Info-Tech Research Group
The next step in capacity management is establishing whether or not visibility (in the broad sense) is available into critical sub-components.
Instructions
Like ideas and watches, not all types of visibility are created equal. Ensure that you have access to the right information to make capacity decisions.
Instructions
For most mobile phone users, this breakdown is sufficient. For some, more granularity might be necessary.
Make note of monitoring tools and strategies. If anything changes, be sure to re-evaluate the visibility status. An outdated spreadsheet can lead to availability issues if management is unaware of looming problems.
The Capacity Snapshot Tool color-codes your components by status. Green – visibility and granularity are both sufficient; yellow – visibility exists, though not at sufficient granularity; and red – visibility does not exist at all.
Instructions
Input
Output
Materials
Participants
It might be that there is no amelioration strategy. Make note of this difficulty and highlight it as part of the risk section of the Capacity Plan Template.
The process of modernizing the network is fraught with vestigial limitations. Develop a program to gather requirements and plan.
As part of the blueprint, Modernize Enterprise Storage, the Modernize Enterprise Storage Workbook includes a section on storage capacity planning.
2.2
Develop strategies to ameliorate visibility issues
The analyst will guide workshop participants in brainstorming potential solutions to visibility issues and record them in the Capacity Snapshot Tool.
Call 1-888-670-8889 or email GuidedImplementations@InfoTech.com for more information.
Complete these steps on your own, or call us to complete a guided implementation. A guided implementation is a series of 2-3 advisory calls that help you execute each phase of a project. They are included in most advisory memberships.
Guided Implementation 2: Establish visibility into core systems Proposed Time to Completion: 3 weeks | |
---|---|
Step 2.1: Define your monitoring strategy Review your findings with an analyst Discuss your monitoring strategy and ensure you have sufficient visibility for the needs of your organization. Then complete these activities…
With these tools & templates:
|
Step 2.2: Implement your monitoring tool/aggregator Review your findings with an analyst Discuss your monitoring strategy and ensure you have sufficient visibility for the needs of your organization. Then complete these activities…
With these tools & templates:
|
Phase 2 Results & Insights:
|
The availability and capacity management summary card pictured here is a handy way to capture the results of the activities undertaken in the following phases. Note its contents carefully, and be sure to record specific outputs where appropriate. One such card should be completed for each of the gold services identified in the project’s first phase. Make note of the results of the activities in the coming phase, and populate the Capacity Snapshot Tool. These will help you populate the tool.
The Capacity Plan Template is designed to be a part of a broader mapping strategy. It is not a replacement for a dedicated monitoring tool.
"In all cases the very first thing to do is to look at trending…The old adage is ‘you don’t steer a boat by its wake,’ however it’s also true that if something is growing at, say, three percent a month and it has been growing at three percent a month for the last twelve months, there’s a fairly good possibility that it’s going to carry on going in that direction."
– Mike Lynch, Consultant, CapacityIQ
A holistic approach to capacity management involves peering beyond the beaded curtain partitioning IT from the rest of the organization and tracking business metrics.
Instructions
Input
Output
Materials
Participants
One level of abstraction down is the service level. Service level capacity management, recall that service level capacity management is about ensuring that IT is meeting SLAs in its service provision.
Instructions
Input
Output
Materials
Participants
Jan |
Feb |
Mar |
Apr |
May |
June |
July |
---|---|---|---|---|---|---|
74 |
80 |
79 |
83 |
84 |
100 |
102 |
Note: the strength of this approach is that it is easy to visualize. Use the same timescale to facilitate simple comparison.
"Often what is really being offered by many analytics solutions is just more data or information – not insights."
– Brent Dykes, Director of Data Strategy, Domo
You can have all the data in the world and absolutely nothing valuable to add. Don’t fall for this trap. Use the activities in this phase to structure your data collection operation and ensure that your organization’s availability and capacity management plan is data driven.
At-a-glance – it’s how most executives consume all but the most important information. Create a dashboard that tracks the status of your most important systems.
Instructions
This tool collates and presents information gathered from other sources. It is not a substitute for a performance monitoring tool.
Stakeholder analysis is crucial. Lines of authority can be diffuse. Understand who needs to be involved in the capacity management process early on.
Instructions
Input
Output
Materials
Participants
Consider which departments are most closely aligned with the business processes that fuel demand. Prioritize those that have the greatest impact. Consider the stakeholders who will make purchasing decisions for increasing infrastructure capacity.
Establishing a relationship with your stakeholders is a necessary step in managing your capacity and availability.
Instructions
Input
Output
Materials
Participants
The best capacity managers develop new business processes that more closely align their role with business stakeholders. Building these relationships takes hard work, and you must first earn the trust of the business.
Convince, don’t coerce. Stakeholders want the same thing you do. Bake them into the planning process as a step towards this goal.
Input
Output
Materials
Participants
Industry: Financial Services
Source: Interview
In financial services, availability is king
In the world of financial services, availability is absolutely crucial. High-value trades occur at all hours, and any institution that suffers outages runs the risk of losing tens of thousands of dollars, not to mention reputational damage.
People know what they want, but sometimes they have to be herded
While line of business managers and application owners understand the value of capacity management, it can be difficult to establish the working relationship necessary for a fruitful partnership.
Proactively building relationships keeps services available
He built relationships with all the department heads on the business side, and all the application owners.
He established a steering committee for capacity.
He invited stakeholders to regular capacity planning meetings.
He scheduled lunch and learn sessions with business analysts and project managers.
Sometimes “need to know” doesn’t register with sales or marketing. Nearly every infrastructure manager can share a story about a time when someone has made a decision that has critically impacted IT infrastructure without letting anyone in IT in on the “secret.”
In brief
Imagine working for a media company as an infrastructure capacity manager. Now imagine that the powers that be have decided to launch a content-focused web service. Seems like something they would do, right? Now imagine you find out about it the same way the company’s subscribers do. This actually happened – and it shouldn’t have. But a similar lack of alignment makes this a real possibility for any organization. If you don’t establish a systematic plan for soliciting and incorporating business requirements, prepare to lose a chunk of your free time. The business should never be able to say, in response to “nobody tells me anything,” “nobody asked.”
Pictured: an artist’s rendering of the capacity manager in question.
Once you’ve established, firmly, that everyone’s on the same team, meet individually with the stakeholders to assess capacity.
Instructions
Input
Output
Materials
Participants
Sometimes line of business managers will evade or ignore you when you come knocking. They do this because they don’t know and they don’t want to give you the wrong information. Explain that a best guess is all you can ask for and allay their fears.
IT staff and line of business staff come with different skillsets. This can lead to confusion, but it doesn’t have to. Develop effective information solicitation techniques.
Instructions
Input
Output
Materials
Participants
When it comes to mapping technical requirements, IT alone has the ability to effectively translate business needs.
Instructions
Input
Output
Materials
Participants
Adapt the analysis to the needs of your organization. One capacity manager called the one-to-one mapping of business process to infrastructure demand the Holy Grail of capacity management. If this level of precision isn’t attainable, develop your own working estimates using the higher-level data
Capacity management The role of the capacity manager is changing, but it still has a purpose. Consider this:
|
Availability management Ensuring services are available is still IT’s wheelhouse, even if that means a shift to a brokerage model:
|
The cloud comes at the cost of detailed performance data. Sourcing a service through an SLA with a third party increases the need to perform your own performance testing of gold level applications. See performance monitoring.
"It is a commonplace observation that work expands so as to fill the time available for its completion. Thus, an elderly lady of leisure can spend the entire day in writing and despatching a postcard to her niece at Bognor Regis. An hour will be spent in finding the postcard, another in hunting for spectacles, half-an-hour in a search for the address, an hour and a quarter in composition, and twenty minutes in deciding whether or not to take an umbrella when going to the pillar-box in the next street."
C. Northcote Parkinson, The Economist, 1955
If you give people lots of capacity, they will use it. Most shops are overprovisioned, and in some cases that’s throwing perfectly good money away. Don’t be afraid to prod if someone requests something that doesn’t seem right.
Questions to ask:
In brief
Who isn’t a sports fan? Big games mean big stakes for pool participants and armchair quarterbacks—along with pressure on the network as fans stream games from their work computers. One organization suffered from this problem, and, instead of taking a hardline and banning all streams, opted to stream the game on a large screen in a conference room where those interested could work for its duration. This alleviated strain on the network and kept staff happy.
Industry:Professional Services
Source:Interview
24/7 AWS = round-the-clock costs
A senior developer realized that his development team had been leaving AWS instances running without any specific reason.
Why?
The development team appreciated the convenience of an always-on instance and, because the people spinning them up did not handle costs, the problem wasn’t immediately apparent.
Resolution
In his spare time over the course of a month, the senior developer wrote a program to manage the servers, including shutting them down during times when they were not in use and providing remote-access start-up when required. His team alone saved $30,000 in costs over the next six months, and his team lead reported that it would have been more than worth paying the team to implement such a project on company time.
Instructions
Input
Output
Materials
Participants
The most effective capacity management takes a holistic approach and looks at the big picture in order to find ways to eliminate unnecessary infrastructure usage, or to find alternate or more efficient sources of required capacity.
Industry:Telecommunications
Source: Interview
High-cost lines
The capacity manager at a telecommunications provider mapped out his firm’s network traffic and discovered they were using a number of VP circuits (inter building cross connects) that were very expensive on the scale of their network.
Paying the toll troll
These VP circuits were supplying needed network services to the telecom provider’s clients, so there was no way to reduce this demand.
Resolution
The capacity manager analyzed where the traffic was going and compared this to the cost of the lines they were using. After performing the analysis, he found he could re-route much of the traffic away from the VP circuits and save on costs while delivering the same level of service to their users.
Make informed decisions about capacity. Remember: retain all documentation. It might come in handy for the justification of purchases.
Instructions
Capacity management (and, by extension, availability management) is a combination of two balancing acts: cost against capacity and supply and demand.*
Instructions
In brief
The fractured nature of the capacity management space means that every organization is going to have a slightly different tooling strategy. No vendor has dominated, and every solution requires some level of customization. One capacity manager (a cloud provider, no less!) relayed a tale about a capacity management Excel sheet programmed with 5,000+ lines of code. As much work as that is, a bespoke solution is probably unavoidable.
3.2
Map business needs to technical requirements and technical requirements to infrastructure requirements
The analyst will guide workshop participants in using their organization’s data to map out the relationships between applications, technical requirements, and the underlying infrastructure usage.
Call 1-888-670-8889 or email GuidedImplementations@InfoTech.com for more information.
Complete these steps on your own, or call us to complete a guided implementation. A guided implementation is a series of 2-3 advisory calls that help you execute each phase of a project. They are included in most advisory memberships.
Guided Implementation 3: Solicit and incorporate business needs Proposed Time to Completion: 2 weeks | |
---|---|
Step 3.1: Solicit business needs and gather data Review your findings with an analyst Discuss the effectiveness of your strategies to involve business stakeholders in the planning process and your methods of data collection and analysis. Then complete these activities…
With these tools & templates: Capacity Plan Template |
Step 3.2: Analyze data and project future needs Review your findings with an analyst Discuss the effectiveness of your strategies to involve business stakeholders in the planning process and your methods of data collection and analysis. Then complete these activities…
With these tools & templates: Capacity Snapshot Tool Capacity Plan Template |
Phase 3 Results & Insights:
|
Availability: how often a service is usable (that is to say up and not too degraded to be effective). Consequences of reduced availability can include financial losses, impacted customer goodwill, and reduced faith in IT more generally.
Causes of availability issues:
Capacity: a particular component’s/service’s/business’ wiggle room. In other words, its usage ceiling.
Causes of capacity issues:
Availability and capacity issues can stem from a number of different causes. Include a list in your availability and capacity management plan.
Instructions
Input
Output
Materials
Participants
Availability and capacity problems result in incidents, critical incidents, and problems. These are addressed in a separate project (incident and problem management), but information about common causes can streamline that process.
Based on your understanding of your capacity needs (through written SLAs and informal but regular meetings with the business) highlight major risks you foresee.
Instructions
Input
Output
Materials
Participants
It’s an old adage, but it checks out: don’t come to the table armed only with problems. Be a problem solver and prove IT’s value to the organization.
Instructions (cont.)
Input
Output
Materials
Participants
It’s an old adage, but it checks out: don’t come to the table armed only with problems. Be a problem solver and prove IT’s value to the organization.
While capacity management is a form of availability management, it is not the only form. In this activity, outline the specific nature of threats to availability.
Instructions
Input
Output
Materials
Participants
A dynamic central repository is a good way to ensure that availability issues stemming from a variety of causes are captured and mitigated.
Although it is easier said than done, identifying potential mitigations is a crucial part of availability management as an activity.
Instructions (cont.)
Input
Output
Materials
Participants
The stakeholders consulted as part of the process will be interested in its results. Share them, either in person or through a collaboration tool.
The current status of your availability and capacity management plan should be on the agenda for every stakeholder meeting. Direct the stakeholders’ attention to the parts of the document that are relevant to them, and solicit their thoughts on the document’s accuracy. Over time you should get a pretty good idea of who among your stakeholder group is skilled at projecting demand, and who over- or underestimates, and by how much. This information will improve your projections and, therefore, your management over time.
Use the experience gained and the artifacts generated to build trust with the business. The meetings should be regular, and demonstrating that you’re actually using the information for good is likely to make hesitant participants in the process more likely to open up.
4.1
Identify capacity risks and mitigate them
The analyst will guide workshop participants in identifying potential risks to capacity and determining strategies for mitigating them.
Call 1-888-670-8889 or email GuidedImplementations@InfoTech.com for more information.
Complete these steps on your own, or call us to complete a guided implementation. A guided implementation is a series of 2-3 advisory calls that help you execute each phase of a project. They are included in most advisory memberships.
Guided Implementation 4: Identify and mitigate risks Proposed Time to Completion: 1 week |
---|
Step 4.1: Identify and mitigate risks Review your findings with an analyst
Then complete these activities…
With these tools & templates: Capacity Snapshot Tool Capacity Plan Template |
Phase 4 Results & Insights:
|
Components are critical to availability and capacity management.
The CEO doesn’t care about the SMTP server. She cares about meeting customer needs and producing profit. For IT capacity and availability managers, though, the devil is in the details. It only takes one faulty component to knock out a service. Keep track and keep the lights on.
Ask what the business is working on, not what they need.
If you ask them what they need, they’ll tell you – and it won’t be cheap. Find out what they’re going to do, and use your expertise to service those needs. Use your IT experience to estimate the impact of business and service level changes on the components that secure the availability you need.
Cloud shmoud.
The role of the capacity manager might be changing with the advent of the public cloud, but it has not disappeared. Capacity managers in the age of the cloud are responsible for managing vendor relationships, negotiating external SLAs, projecting costs and securing budgets, reining in prodigal divisions, and so on.
Client Project: Develop an Availability and Capacity Management Plan
This project has the ability to fit the following formats:
Adrian Blant, Independent Capacity Consultant, IT Capability Solutions
Adrian has over 15 years' experience in IT infrastructure. He has built capacity management business processes from the ground up, and focused on ensuring a productive dialogue between IT and the business.
James Zhang, Senior Manager Disaster Recovery, AIG Technology
James has over 20 years' experience in IT and 10 years' experience in capacity management. Throughout his career, he has focused on creating new business processes to deliver value and increase efficiency over the long term.
Mayank Banerjee, CTO, Global Supply Chain Management, HelloFresh
Mayank has over 15 years' experience across a wide range of technologies and industries. He has implemented highly automated capacity management processes as part of his role of owning and solving end-to-end business problems.
Mike Lynch, Consultant, CapacityIQ
Mike has over 20 years' experience in IT infrastructure. He takes a holistic approach to capacity management to identify and solve key problems, and has developed automated processes for mapping performance data to information that can inform business decisions.
Paul Waguespack, Manager of Application Systems Engineering, Tufts Health Plan
Paul has over 10 years' experience in IT. He has specialized in implementing new applications and functionalities throughout their entire lifecycle, and integrating with all aspects of IT operations.
Richie Mendoza, IT Consultant, SMITS Inc.
Richie has over 10 years' experience in IT infrastructure. He has specialized in using demand forecasting to guide infrastructure capacity purchasing decisions, to provide availability while avoiding costly overprovisioning.
Rob Thompson, President, IT Tools & Process
Rob has over 30 years’ IT experience. Throughout his career he has focused on making IT a generator of business value. He now runs a boutique consulting firm.
Todd Evans, Capacity and Performance Management SME, IBM
Todd has over 20 years' experience in capacity and performance management. At Kaiser Permanente, he established a well-defined mapping of the businesses workflow processes to technical requirements for applications and infrastructure.
451 Research. “Best of both worlds: Can enterprises achieve both scalability and control when it comes to cloud?” 451 Research, November 2016. Web.
Allen, Katie. “Work Also Shrinks to Fit the Time Available: And We Can Prove It.” The Guardian. 25 Oct. 2017.
Amazon. “Amazon Elastic Compute Cloud.” Amazon Web Services. N.d. Web.
Armandpour, Tim. “Lies Vendors Tell about Service Level Agreements and How to Negotiate for Something Better.” Network World. 12 Jan 2016.
“Availability Management.” ITIL and ITSM World. 2001. Web.
Availability Management Plan Template. Purple Griffon. 30 Nov. 2012. Web.
Bairi, Jayachandra, B., Murali Manohar, and Goutam Kumar Kundu. “Capacity and Availability Management by Quantitative Project Management in the IT Service Industry.” Asian Journal on Quality 13.2 (2012): 163-76. Web.
BMC Capacity Optimization. BMC. 24 Oct 2017. Web.
Brooks, Peter, and Christa Landsberg. Capacity Management in Today’s IT Environment. MentPro. 16 Aug 2017. Web.
"Capacity and Availability Management." CMMI Institute. April 2017. Web.
Capacity and Availability Management. IT Quality Group Switzerland. 24 Oct. 2017. Web.
Capacity and Performance Management: Best Practices White Paper. Cisco. 4 Oct. 2005. Web.
"Capacity Management." Techopedia.
“Capacity Management Forecasting Best Practices and Recommendations.” STG. 26 Jan 2015. Web.
Capacity Management from the Ground up. Metron. 24 Oct. 2017. Web.
Capacity Management in the Modern Datacenter. Turbonomic. 25 Oct. 2017. Web.
Capacity Management Maturity Assessing and Improving the Effectiveness. Metron. 24 Oct. 2017. Web.
“Capacity Management Software.” TeamQuest. 24 Oct 2017. Web,
Capacity Plan Template. Purainfo. 11 Oct 2012. Web.
“Capacity Planner—Job Description.” Automotive Industrial Partnership. 24 Oct. 2017. Web.
Capacity Planning. CDC. Web. Aug. 2017.
"Capacity Planning." TechTarget. 24 Oct 2017. Web.
“Capacity Planning and Management.” BMC. 24 Oct 2017. Web.
"Checklist Capacity Plan." IT Process Wiki. 24 Oct. 2017. Web.
Dykes, Brent. “Actionable Insights: The Missing Link Between Data and Business Value.” Forbes. April 26, 2016. Web.
Evolved Capacity Management. CA Technologies. Oct. 2013. Web.
Francis, Ryan. “False positives still cause threat alert fatigue.” CSO. May 3, 2017. Web.
Frymire, Scott. "Capacity Planning vs. Capacity Analytics." ScienceLogic. 24 Oct. 2017. Web.
Glossary. Exin. Aug. 2017. Web.
Herrera, Michael. “Four Types of Risk Mitigation and BCM Governance, Risk and Compliance.” MHA Consulting. May 17, 2013.
Hill, Jon. How to Do Capacity Planning. TeamQuest. 24 Oct. 2017. Web.
“How to Create an SLA in 7 Easy Steps.” ITSM Perfection. 25 Oct. 2017. Web.
Hunter, John. “Myth: If You Can’t Measure It: You Can’t Manage It.” W. Edwards Deming Institute Blog. 13 Aug 2015. Web.
IT Service Criticality. U of Bristol. 24 Oct. 2017. Web.
"ITIL Capacity Management." BMC's Complete Guide to ITIL. BMC Software. 22 Dec. 2016. Web.
“Just-in-time.” The Economist. 6 Jul 2009. Web.
Kalm, Denise P., and Marv Waschke. Capacity Management: A CA Service Management Process Map. CA. 24 Oct. 2017. Web.
Klimek, Peter, Rudolf Hanel, and Stefan Thurner. “Parkinson’s Law Quantified: Three Investigations in Bureaucratic Inefficiency.” Journal of Statistical Mechanics: Theory and Experiment 3 (2009): 1-13. Aug. 2017. Web.
Landgrave, Tim. "Plan for Effective Capacity and Availability Management in New Systems." TechRepublic. 10 Oct. 2002. Web.
Longoria, Gina. “Hewlett Packard Enterprise Goes After Amazon Public Cloud in Enterprise Storage.” Forbes. 2 Dec. 2016. Web.
Maheshwari, Umesh. “Understanding Storage Capacity.” NimbleStorage. 7 Jan. 2016. Web.
Mappic, Sandy. “Just how complex can a Login Transaction be? Answer: Very!” Appdynamics. Dec. 11 2011. Web.
Miller, Ron. “AWS Fires Back at Larry Ellison’s Claims, Saying It’s Just Larry Being Larry.” Tech Crunch. 2 Oct. 2017. Web.
National College for Teaching & Leadership. “The role of data in measuring school performance.” National College for Teaching & Leadership. N.d. Web,
Newland, Chris, et al. Enterprise Capacity Management. CETI, Ohio State U. 24 Oct. 2017. Web.
Office of Government Commerce . Best Practice for Service Delivery. London: Her Majesty’s Stationery Office, 2001.
Office of Government Commerce. Best Practice for Business Perspective: The IS View on Delivering Services to the Business. London: Her Majesty’s Stationery Office, 2004.
Parkinson, C. Northcote. “Parkinson’s Law.” The Economist. 19 Nov. 1955. Web.
“Parkinson’s Law Is Proven Again.” Financial Times. 25 Oct. 2017. Web.
Paul, John, and Chris Hayes. Performance Monitoring and Capacity Planning. VM Ware. 2006. Web.
“Reliability and Validity.” UC Davis. N.d. Web.
"Role: Capacity Manager." IBM. 2008. Web.
Ryan, Liz. “‘If You Can’t Measure It, You Can’t Manage It’: Not True.” Forbes. 10 Feb. 2014. Web.
S, Lalit. “Using Flexible Capacity to Lower and Manage On-Premises TCO.” HPE. 23 Nov. 2016. Web.
Snedeker, Ben. “The Pros and Cons of Public and Private Clouds for Small Business.” Infusionsoft. September 6, 2017. Web.
Statement of Work: IBM Enterprise Availability Management Service. IBM. Jan 2016. Web.
“The Road to Perfect AWS Reserved Instance Planning & Management in a Nutshell.” Botmetric. 25 Oct. 2017. Web.
Transforming the Information Infrastructure: Build, Manage, Optimize. Asigra. Aug. 2017. Web.
Valentic, Branimir. "Three Faces of Capacity Management." ITIL/ISO 20000 Knowledge Base. Advisera. 24 Oct. 2017. Web.
"Unify IT Performance Monitoring and Optimization." IDERA. 24 Oct. 2017. Web.
"What is IT Capacity Management?" Villanova U. Aug. 2017. Web.
Wolstenholme, Andrew. Final internal Audit Report: IT Availability and Capacity (IA 13 519/F). Transport For London. 23 Feb. 2015. Web.