Nick Walter, Principal Architect
Bandwidth, Bandwidth, Bandwidth
In my role with House of Brick, I have the opportunity to assist many of our clients with migrating their business-critical databases to the public cloud. While every client (and every migration plan) is unique, I’ve observed a trend of common mistakes or misconceptions about the process of migrating to the cloud that occur again and again among my clients. Lately, I’ve taken to reviewing these common pitfalls right at the beginning of migration planning, which has been successful in preventing my clients from experiencing issues that slow down or derail cloud migration plans. I thought it would be negligent not to share all these issues with readers of our House of Brick blog, so today I present the second blog in my series on common database cloud migration pitfalls.
In part two, we focus on the topic of bandwidth, specifically bandwidth between on-premises data centers and public cloud platforms. There are a variety of means I’ve seen clients use to procure such bandwidth, whether using IPSEC VPN tunnels, AWS DirectConnect, or Azure ExpressRoute. Regardless of the means of cloud connection, I have yet to see a client properly forecast the necessary bandwidth for a cloud migration. Everyone underestimates the bandwidth needed, and my goal today is to explain how and why, so others can avoid making a similar mistake.
Data Copy Bandwidth
In order to explain why so many organizations make mistakes when estimating their bandwidth needs, let’s first analyze what bandwidth gets used for in an organization undertaking a cloud migration. The first, and most obvious bandwidth need for a cloud connection, is data copy bandwidth. During a cloud migration, large amounts of data must be copied from an on-premises data center into the cloud. This includes entire virtual machines or databases, as well as supporting data sets for cloud file storage. Many IT organization think of this bandwidth need, and only of this need, during public cloud migration planning. They then perform a simplistic calculation of the data footprint of their servers to be migrated, divided by the desired migration timing, and arrive at a needed bandwidth rate. This a valid calculation if the cloud connection is limited to just a one-time migration of data. But it never is.
Application Interoperation Traffic
The second bandwidth need, which a minority of shops anticipate, is application interoperation traffic. Application stacks that have been migrated to the cloud will often need to perform transactions or requests against application stacks that are still on-premises, and vice versa. Sometimes this is only during an interim state, when some application stacks have been migrated but others are awaiting their turn. In other cases, it is a permanent state of affairs with some migrated to the cloud, while others are destined to remain on premises permanently. Regardless, this is a traffic source that few IT organizations account for in their forecasts. Yet it’s vital traffic to protect. In many cases, applications trying to interoperate across a cloud connection, already nearly choked with migration traffic, interpret the unsurprising slowness of their communication as a timeout or failure, and thus cause entire application stacks to fail in a less-than-graceful fashion.
The third, and often entirely overlooked, source of cloud connection bandwidth consumption is caused by what I refer to as operational tools. These are applications loaded onto virtual servers in the cloud for operational purposes, which don’t directly tie into the main application purpose. In this category, I include things like virus scanners, backup tools, remote connectivity tools, log centralizing tools, security and compliance scanners, etc. Many of these tools want to reach out to a centralized master service and upload data or status information. The bandwidth needs of these tools vary, but in some cases, like those of backup tools or log centralizers, the bandwidth needs can be very high indeed. Thus, an IT organization that already accounted for the other bandwidth needs I’ve outlined, may find their cloud connections choked and their application stacks reporting perceived failures, when these operational tool bandwidth needs hit after the first wave of application migrations to the cloud.
Outbound Internet Traffic
The final, and often most unanticipated, source of bandwidth usage on cloud connections is outbound Internet traffic. Many IT organizations put policies and network designs in place for reasons of security and compliance, which forbid their public cloud-based application stacks from reaching out directly to the public Internet. Instead, they prefer that all Internet-bound network requests from cloud resources first flow back across their cloud connection to their on-premises network, and then flow outbound from there. From a security perspective, this is a great way to ensure that applications in the cloud are subject to identical network and security controls as on-premises applications. From a cloud connection perspective however, this is a great way to ensure that bandwidth is exhausted and the cloud migration is perceived to be riddled with issues and failures. This can be a very large source of bandwidth at unpredictable times, as server or application administrators can innocently attempt to download Operating System or application updates on cloud-based systems without realizing that outbound Internet traffic is vying for the same bandwidth as all the other purposes I’ve already outlined.
In the end, an IT organization that takes all of these factors into account can adequately forecast and provision bandwidth to ensure that a cloud migration is not only successful, but perceived as problem-free and smooth by the rest of the organization. I strongly advocate for all organizations looking into large-scale cloud migrations to consider not only their bandwidth needs, but also allocate extra bandwidth on a short-term basis to meet the needs of a migration. Being able to, logically or physically, split the regular day-to-day operational traffic of application interoperation and operational tools into a different network path than is being used for bulk migration of data or servers, will lead to a much better experience overall.