(This post originally appeared in The New Stack on June 8th of last year. We’re reprising it here because it is as relevant now as it was then, if not more so.)
Building the infrastructure to support cloud computing now accounts for more than a third of all IT spending worldwide, according to research from IDC. With spending anticipated to reach nearly $500 billion in 2023, organizations need to be mindful that all this investment doesn’t lead to redundant and disparate efforts that are impossible to govern. Many organizations are now just beginning to experience challenges related to running container orchestration platforms in production. We’re going to walk through these challenges — explaining why they are happening and how developers can simplify the processes to avoid failure in production on Day 2.
The Complexity of Managing Containers in Production
As organizations continue to scale and shift their operations to a hybrid mix of on-prem, cloud, and edge infrastructure, the rapid deployment of Kubernetes clusters and workloads is creating a new challenge. While some teams are building these clusters on a standard distribution, other teams are building their own Kubernetes stack and management tools. This often results in dozens of clusters that are deployed and managed independently throughout the organization with very little uniformity, making for a complex DevOps landscape, increased maintenance costs and lost business opportunities.
Organizations are also facing challenges when scaling their cloud architectures. With multiple bespoke Kubernetes stacks comes the management of the various point-solutions required to handle security, observability, upgrades and other Day-2 operations tasks. This complexity has a tangible cost on organizations, as time, resources and money are being poured into redundant efforts for operations and maintenance. Aside from the initial spending on container platforms, tooling and additional services, the overhead needed to maintain multiple stacks built from loosely coupled open source components increase significantly, as a typical production stack consists of over a dozen components. Each component has its own release schedule, and compatibility issues are common when new versions are released. Because many organizations don’t have the capabilities to automatically upgrade their homegrown solutions without disrupting workloads, they avoid upgrading, which increases their risk from security issues and bugs in outdated versions.
The Need for Central Governance
With no central governance across organizations, DevOps teams are spread thin. Enterprises without centralized governance or visibility across organizationally deployed clusters simply do not have the resources for effective management. Within the stack, compliance, regulatory and IP challenges are governing where application resources are used, already allocating much-needed support and time. As a result, for example, security operations are unable to ensure proper versioning for vulnerability management. Additionally, the lack of a standardized set of observability tools across the organization makes support difficult, as it takes longer to diagnose problems within the clusters. Organizations need to centrally govern these clusters and associated workloads to ensure consistency, security, performance and to enforce proper configuration and policy management across the entire footprint.
The Need to Focus on Day 2 Operations
We are seeing organizations struggle to deploy and manage their Kubernetes clusters due to the increasing level of oversight required and the current lack of attention during the planning phase. Day 2 operations can be a “sink or swim” time for these organizations. Without effective Day 2 operations, organizations will face challenges scaling their IT environment and will not be ready to handle new threats to security and availability.
Cloud native development is an agile process that, when implemented correctly, accelerates innovation within an organization and creates a competitive advantage. The agile nature of it means that changes are frequently being made to applications and the underlying infrastructure, often multiple times per day. Cloud native applications are also designed to scale elastically with demand, which results in even more infrastructure changes. Organizations need to ensure that their Day 2 operations strategy and the software they use to implement it are able to support this constant change.
While containers are bringing a wide variety of benefits to your business, they are also introducing new complexities that need to be properly navigated. As complexity within cloud native environments and container strategy increases, so does the need for continuous oversight, organization, and streamlined management. From the number of clusters that need to be managed, to proactively preparing for “Day 2” success, organizations can alleviate the challenges of managing their open source stacks by creating a strong DevOps strategy that simplifies the process.