Operational Changes When Transitioning to Cloud Computing

When an organization migrates or adds a cloud service to a traditional data center (or managed service), there must be distinct changes made to the Concept of Operations (CONOPS).  This doesn’t relate as much to a public cloud service, since the provider handles most of these functions for you; the CONOPS changes are really for private cloud models, where the customer is involved with most operations.  The operational topics below are arranged in ITIL (Information Technology Infrastructure Library) format and nomenclature, to match the many organizations that have adopted ITIL as their service management model.

Note: All organizations and cloud services are unique; therefore there is no realistic way to capture every possible change that may need to be addressed.  The areas below are based on significant customer experience, Federal Government and commercial organizations.

Request Management

Ordering of cloud services will be done through the cloud management system, usually a web-based portal.  This portal includes a service catalog of all available offerings and options.  All orders, cancellation orders, and usage tracking for billing purposes will be handled within this system.  Legacy methods for consumers to order services will normally be retired, with this service catalog becoming the new method for ordering services – even if money does not change hands as in some private or communication clouds.

There may be a link from the cloud management portal to a traditional support ticketing system to allow customers to request assistance.  These will be handled in the same manner as any other legacy user request/support ticket.

Incident Management

A common change to incident management will be the monitoring of additional event logs within the cloud services.  Since the majority of cloud compute provisioning will be performed in an automated fashion, careful tracking of the event logs and creation of alerts will be essential to detect any failures in the automated processes.  For example, you could run out of available memory or storage space – something you should be proactively monitoring for – and therefore all new orders for virtual machines would fail.  As the cloud manager, you will have two areas to monitor and manage incidents:

  • Cloud Infrastructure.  You must manage the cloud infrastructure itself – meaning the data center facilities, server farms, storage, networks, security, and applications.
  • Customer services.  You will also need to detect when a new virtual machine or other resource is automatically provisioned, so that you can begin monitoring it immediately.  Since these services will be used by customers, you may be managing virtual instances of a server or how many resources a customer is using for billing purposes.

Change Management

Due to online customer ordering, approval workflow, and automated provisioning systems within the cloud service, change control will be significantly affected, and will need to adapt existing processes.

  • New Virtual Machines.  When a customer places an order, the VM(s) will be automatically provisioned.  All of them will be based on pre-approved and security certified operating system images, applications, and patch levels.  The cloud service portal should be programmed to automatically generate a change control request, with a completed status, upon every successfully automated provisioning event.  Any exceptions or errors in the automated provisioning process will be handled through alerts, and generate a “completed” change ticket once the VM is online.
  • Changes to Servers/Hosts.   Customer requested changes will follow normal change control procedures already in place.   Routine maintenance, updates, security patches and new software revisions will also follow existing change control procedures.   One typical exclusion is the Dev/Test service, since these are often VMs provisioned behind a firewall to keep non-certified development applications isolated from production networks.  With this service, VMs do not require change control in order to allow the developers to do their job without change control slowing them down.
  • Updates to Common Operating Environment (COE). Cloud services automatically deploy templates or build-images of standard configurations.  These COE templates will be created by the cloud provider or customer with all updates, patches, and security certifications completed.  These COE images can be automatically deployed within the cloud environment without going through the typical manual accreditation process for each server.   The cloud provider is usually required to update COEs at least every 6 months to keep the catalog of available operating systems and applications up-to-date.  All updated COEs will again go through the manual security approval process, and then can be ordered and deployed using the cloud’s automated systems.
  • Server (VM) Add to Network and Domain.  As each machine is automatically provisioned, the it will automatically be added to the network domain.  This will be an automated process, but the specific steps required, as well as change and security control policies involved, need to be adjusted to allow this to occur; this process of joining the domain typically required manual security approval in the past.
  • User/Admin Permissions to New Servers/Hosts.  Similar to the above “Server(VM) Add to Network and Domain,” as new machines are automatically added to the network, permission to log into the new operating system will be granted to the cloud management system, usually using a Service Account.  Specific steps to automate this process, and adjustments to the existing security processes will need to be made to accommodate this automated process.
  • Network Configuration Requests.  Every VM based server has a pre-configured network configuration.  In the case of an individual machine – physical or virtual machine – standard OS and applications are installed that require outbound initiation of traffic within the production network, and possibly to the Internet.
    • All network configuration, load balancing, or firewall change requests follow existing procedures.  When possible, the cloud management self-service control panel will allow customers to configure some of this by themselves, although advanced network changes will need to go through normal change control and possibly security approval.
    • In some VM templates or COEs, there may be multiple servers deployed as part of a COE.  For example, a complex COE may include one or more database servers, middleware application servers, and possibly front-end web servers; this collection of VMs is called a Platform.  In these situations, the virtual machines have already been configured – as part of the overall platform package – to communicate with each other via the virtual networking built into the VM hypervisor.  In the given example, only the front-end web servers would have a production network address, while all other servers are essentially “hidden” within the virtual machine network enclave.
    • Customers may submit requests to have production firewalls, load balancers, or other network systems custom configured for their needs.  When evaluating these requests, the cloud provider should always default to making the changes within the hypervisor virtual network environment.  If that is not sufficient, he may consider changing physical data center switches, routes, and firewalls; many of the requests can be handled using virtual networking settings within the hypervisor tool.
  • Virtual Machine Configuration Changes.  Customers may have the ability to upgrade or downgrade their virtual machine CPU, memory, or disk space within the cloud management portal.  Changing this configuration requires a reboot of the customer’s virtual machines, but no loss of data.
    • If a customer requests manually through a support ticket, the cloud provider will make this change using the cloud management software, so that billing and new VM configurations are automatically updated. Do not make changes to the back-end hypervisor directly, or the cloud management system will have no knowledge of that change.
    • Manually changing the VM configurations is NOT the appropriate process; billing and configuration management will not be aware of the new settings, and the downstream asset and change control databases will not be updated.
  • Release Management.  All VM templates,COEs, and software will be fully tested in an offline lab or staging network, then quality checked and security approved  before any changes to production cloud service is scheduled.
    • Updates to Common Operating Environment (COE).  Cloud compute services automatically deploy templates or build-images of standard operating environments.  These COE templates will be created by the cloud provider or customer with all updates, patches, and security certifications completed.  These COE images can be automatically deployed within the cloud environment without going through the typical manual security process for each server.  The cloud provider is usually required to update COEs at least every 6 months to keep the catalog of available operating systems and applications up-to-date.  All updated or new COEs will again go through the manual security approval process before they can be ordered and deployed using the cloud’s automated systems.
    • Customers must be provided with advanced notice – 10 days, for example – before any changes or updates are made to already deployedcustomerVMs.  Customers may “opt out” of any planned upgrade within this window if they believe it will negatively impact their project, timeline, or code stability.
      • It is the cloud provider’s goal to keep all new and existing VMs up-to-date; therefore, the cloud provider should adequately document the need, importance, testing results, and impact of each upgrade to encourage customer adoption of the new updates.

Configuration Management

The cloud management system will automatically populate the configuration management database of all VMs as part of the automated provisioning process.  Since cloud compute services can be ordered, approved, and automatically deployed at any time and day the customer desires, this automated update to configuration management is critical.

  • Changes to the actual VM servers and software should be treated differently than to customer-owned virtual machines.  Normally the cloud provider upgrades their server farm, then in a different maintenance window, schedules any necessary customer upgrades.
  • VMs running within the Dev/Test sub-networks do not require the same level of configuration management as production VMs because they are sandboxes for developers to work in.  There is little point in enforcing strict change and configuration management and only slows down the developer’s efforts.  Only when the VMs are deployed into production must they begin to follow all change, security, and configuration management policies.

IT Asset Management

All existing procedures for Asset Management will be followed; however, the automation within the cloud management platform will automatically update asset databases.  This automatic real-time update is often part of Federal Government IT security requirements.

  • As the number of customer orders increases, additional physical blade servers and SAN storage will be required; capacity planning and monitoring is critical to success.  As new servers or storage is added, the Asset Management system will be updated as per normal procedures.
  • Virtual machines running within DTaaS (pre-production) and IaaS/PaaS (production) networks must have all assets tracked, including the VM itself and potentially applications contained within VMs.

Service Desk Function

Most cloud providers – certainly public cloud ones – do not provide tier 1 user support; customers normally provide this function, or contract a third-party.  The cloud provider manages all devices and software within their cloud service, and customers typically manage only their applications or VMs.  However, issues can be escalated to the cloud provider through the management portal, email, or telephone depending on the offering, terms and conditions.

  • It should be noted that customers within DTaaS may attempt to submit tickets relating to software development programs or problems found in their custom applications; each of these are development issues that should be handled by the customer’s development staff, and not the cloud provider.

Service Level Management

While the cloud provider normally establishes Service levels,  customers may request additional or more enhanced SLAs.  Accepting the modified terms is ultimately up to the cloud provider — normally public providers do not change their SLAs, but this is a primary benefit to a private deployments.  The provider should provide customers with some form of reporting mechanisms, such as:

  • Online “dashboards” as well as monthly “manual” reports included with invoices.
  • Utilization metrics shown on dashboard, showing VM CPU utilization levels, memory, disk, network and disk throughput, and uptime: all examples of what is normally measured and reported on.
  • Billing history, and all reports and metering should be shown per customer and department.

Availability Management

Inclusion of availability statistics should be included in the cloud management portal for customer visibility.  See also Service Level Management

Capacity Management

Constant monitoring of the cloud compute servers and storage systems is required.  Since ordering and provisioning is done automatically 24×7, it is easy for the system to run out of available physical servers or storage, thus causing a failure in future provisioning new orders.  There is lead-time required to purchase, install, configure, and certify any new equipment, so monitoring and establishing alert thresholds is critical; the cloud provider needs sufficient time to add capacity.  The cloud provider could over-purchase capacity that remains idle until utilized, but this costs money to procure, power, and cool – costs which get passed on to customers.  It is far preferable to have a reasonable amount of extra capacity, with rapid replenishment plans in place.

  • It should be noted that several sizes of VMs are available to the customer; therefore, the more often a “large” or “extra-large” VM is ordered, the more CPU, memory, and disk are allocated.  This means fewer VMs will fit on a physical blade server, and the cloud provider will need to add additional capacity sooner.
  • Note that capacity management needs to consider that the following technologies are deployed in the cloud environment, affecting methods and calculations for capacity planning:
    • All cloud compute physical servers normally run a hypervisor product such as VMware or Microsoft Hyper-V.  These servers and VMs boot from a storage area network SAN, and have no local hard drives.
    • Thin provisioning is commonly used throughout the SAN, thus you need to carefully calculate actual disk usage versus what has been sold and what is remaining in capacity.
    • Thin provisioning free space reclamation may be a scheduled, not automatic, process to run.  Automatic is preferable, but not all SAN system support it.
    • If over-subscription of CPU or memory was calculated within the hypervisor configuration, monitoring of system performance and capacity is even more critical.
    • Useable capacity on the SAN does not include additional space to hold any daily backups or snapshots,  so actual useable capacity will be 25-50% higher.
    • Consider having the SAN supplier provide a “utility storage” agreement, whereby they stage additional SAN capacity at the cloud provider data centers but do not charge them until it is utilized.  This shares the costs and risk of managing extra storage capacity between the cloud provider and their SAN vendor.

IT Service Continuity

Most cloud services are deployed with multiple server farms and data centers to provide continuous operations, even during maintenance or disasters.  If you are hosting your own cloud service, there are numerous technologies and products that can facilitate the load balancing, failover, and continuity functions.

Financial Management

Depending on the cloud provider and how billing occurs, customers may need to adapt the way they procure the services, amortize IT assets, and manage their budgets.  Customers may be able to use ongoing operational funding instead of capital funding to procure cloud services, as was earlier discussed.

Customers may desire to establish pools of funding, or purchase orders, so that individual cloud service orders (called subscriptions) are charged against this pool of money.  To avoid the finance or procurement department being involved in every micro-transaction and subscription, these pools of money have proven to be a more acceptable financial management technique.

Security Management

Security management will be significantly involved in the certification of a private cloud offering.  VM templates or Common Operating Environments (COEs) will need to be pre-certified by security teams, so users can order services at any time and have the automated cloud management launch everything immediately.  This pre-certification is often the most significant change to the way organizations run today, but is critical to the automation of a cloud saving time and money for the end-customer.

Security will also be involved with any networking change controls or custom COEs created or requested.  Internal network changes between VMs in the cloud environment also need to be approved by security, unless the network settings are part of a pre-approved COE, in which case security has already approved.

Monitoring and scanning of all physical servers and customer virtual machines must be continuously performed; data scanning of all new VMs to safeguard against sensitive data loss may also be necessary.  The key to success here is to use the cloud management system to automatically add new compute devices to the monitoring systems, so security personnel are immediately aware of new systems, and the monitoring can begin immediately.

Technical Support

For Dev/Test services, no developer-level support is available from most cloud providers.  All cloud compute VMs are managed up to the OS level, with customers managing only their application development.

The cloud provider technical support staff will need to become familiar with hypervisor and cloud management systems in order to conduct normal operations, troubleshooting, patching, and upgrades.

Advertisements


Categories: cloud computing

Tags: , , , , , , , ,

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: