A shot at solving the world of cloud infrastructure and integrations

A customary TL;DR

Integration and provisioning of infrastructure takes time. Doing it good takes a ton of it. It is usually repetitive work, done in a non-optimal manner with an end result that’s probably the who-knows-which duplication of the same exact structure. We’re working on a platform to integrate anything with an API, employ best practices, and generate duplicatable and modular environments that are secure and fast. It empowers developers by giving back power in their hands. It will allow ops engineers to go back to their zone in innovation and give everyone the capabilities of simple yet high-quality infrastructure in the palm of their hand. Ambitious? It is. But so are we.

Problem

Modern software companies usually build their own infrastructure from scratch. The responsible are referred to as “DevOps” / “SRE” / “Operations” engineers. Servers, networking, security and the entire system architecture is usually planned and executed with the given resources the company has, usually by the mentioned personnel. The process though does not end there, this the point where integration and maintenance begin, and where infrastructure deployment ends. Integration is always manual and specific and 9 times out of 10 is not documented (let alone automated) as code.

In recent years, a growing number of organizations discovered the cloud - a theoretically infinite pool of resources and services, ready to be dispatched and scaled in any direction or extent at any time. Engineers handling the resources, are usually doing repetitive work, which had been probably done before by one their peers, and most certainly by someone else in the world. Her’s a blameless accusation: more often then not, infrastructure work does not hold a perfect quality standard for numerous reasons. Some of these are multitasking, heavy load of work, and unfortunately sometimes just plain lack of knowledge which translates to a reduced level of product. As mentioned, there’s no one to blame. When engineers are being assigned with more responsibilities than one can handle (and ops engineers usually are), quality tends to decrease.

Whether levels of specific domain knowledge, the load of work or unstructured plans are caused by a shortage of personnel or educational reasons, would not be discussed here (but left as food for thought). The sad fact is, that they exist, and they hurt us all. We’re attempting to remedy what we find a major factor.

Existing solutions / Methods of operations

A word of warning, this section is a discussion and a long breakdown of concepts we believe are a crucial prerequisite to discussion the bigger matter. You’re welcome to skip it, but it would be smart to at least consider the 4-point list below before moving on.

Leveraging the advantage of the cloud can be divided into a few general segments, followed by a break down of implications and future projections. This is a good place to remind that the integration part of the process is completely absent in the first three and rarely ever seen in the fourth:

100% Manual work usually includes web-console interaction of a user
Manually using cloud internal platforms that wrap other services, allowing to automate the provisioning of infrastructure resources and configurations e.g. AWS’s ElasticBeanStalk.
Using external tools to provision and arrange the cloud infrastructure or parts of it e.g. Rancher for Kubernetes clusters deployment
Following the concept of Infrastructure as Code; leveraging a tool like HashiCorp’s Terraform or AWS’s CloudFormation to create, maintain and manage infrastructure resources as code managed in a VCS providing the advantages of its own i.e. version control, change management, gated deployments, code reviews, and the major by-product: quality.

Let’s break these down to pros & cons

1. Manual work

As seeing creatures, we human beings tend to always prefer a visual interface when interacting with systems. It’s not by mistake that this is the first place we land and play with when exploring a new tool or platform. The public clouds world is no exception, but while manual mode seems like the quickest way to make X done, it’s also very dirty.

QUICK AND DIRTY MEME

Manual work on cloud platforms:

Is good for learning and playing around
Can apply burning changes in a relatively rapid manner
Is hard to repeat / clone / reproduce
Is ephemeral
Cannot be backed up
Is not versioned and cannot live in a version control system (unlike code of any sort)

One of the basic principals of the well-architected framework is handling disaster recovery, which cannot be followed without applying basic automation to the creation of an organization’s infrastructure. Taking it a step further to the application of Chaos Engineering principles, can also explain the need to store infrastructure as code (even if the concept often feels far-fetched).

Another great feature that cannot be used working manually, is the ability to clone infrastructure. Simple yet powerful ability that allows creating environmental stages i.e. production / staging / development / UAT and so on in seconds. It’s even more prominent when developers require separate environments for development and testing. When companies grow, the number of services and engineers developing them grows exponentially. To utilize the growth and translate it to speed and quality, and obviously preventing chaos in the process, one must be able to clone and deploy parts of the infrastructure in minutes. Just consider the queue of engineers waiting for an environment to clear in order to test a new feature. Even if the time is being used for other features or relevant affairs, quality levels drop immediately when contexts are being switched, and with that, speed and efficiency that are translated directly to $$$.

Lastly, integration of services, such as hooking up a monitoring system, or a new APM service, will always require repetitive manual work. On the best (and rare) occasions it would be part of an automated script running API calls, that would have to be manually triggered every time it’s required. Writing the script, by the way, is a repetitive task on its own, something that was probably done somewhere else in the world and chances are it was already done better.

2. Using automated architecture frameworks

One of the widely used solutions for the problems described in the passage above is the use of tools like Elastic Beanstalk; a framework that generates most of the infrastructure required to deploy a stable application, in a relatively simple flow. It will handle monitoring, scale, security, deployment, uptime and some other nifty features which would otherwise take days to implement.

While the idea and implementation are great, and Elastic Beanstalk and its rivals are widely used and adopted, it does not enforce the use of infrastructure as code (one could argue it encourages the opposite), and it only takes the user from a specific point, leaving a lot of preparation work which has to be done first. One example is setting up a well-architected VPC, that in order to be well deployed has to have correct networking configured, secured and thoughtfully designed. As integration is a big part of such a service, where automated monitoring and scale are implemented by default, it only allows a few basic ones, cannot be extended and the creation as mentioned usually involves a lot of manual work to prepare.

3. Using external tools for managing and deploying architecture

Hard-to-deploy infrastructure is a problem many companies and open source projects are trying to solve. Take Kubernetes for example, which is probably the hottest buzzword in the container world today, but holds a complete opposite reputation when it comes to deploying and managing clusters. The range of deployment tools and managers which still don’t provide a good enough solution to fit all created a vacuum where cloud providers deployed their SAAS versions of the famous orchestrator i.e. AWS’s EKS, Google’s GKE and Microsoft’s AKS.

Another good example is Apache’s Kafka - one of the world’s most used data streaming solutions, which is considered a huge pain to deploy and manage. It’s no wonder their creators deployed their own SAAS and are making a fortune selling it as a product.

While the names above are also an example of brilliant solutions, they fail to properly address the process of deployment. The surrounding infrastructure and its integration are not being referred to at all. Deploying applications together with their surrounding infrastructure is not an option, not to mention the ability to hook them to one other external services on the same step. Like always, provisioning is the first part, which would be followed by a usually-longer phase of manual integrations.

4. Writing infrastructure as code (IAC)

The optimal solution to handling disaster recovery, allowing cloning and redeployment of infrastructure as modular bits is writing it as templated code. It is inferred but should be mentioned that code would be managed in Git (or any VCS), adding the functionality of version tracking, rollbacks, patches and all the good stuff we get when managing any kind of source code.

And yet, IAC is painful:

Code needs to be written, and that’s no easy task
It’s usually developed in a custom way that may allow cloning but usually not in a completely different environment such as another cloud account, let alone a different company.
Since the work is being done internally, it usually won’t hold the perfect standards intended by the developer, e.g. deploying a templated AWS VPC doesn’t mean it uses secure private subnets or separated CIDRs with proper Access Control Lists. Architecture design matters to the smallest of details, and we all fail to follow them to their full extent when it’s not our focus or priority, that’s a disappointing fact
Services cannot be integrated amongst themselves. Well, to be honest, they can; by customizing the platform in use. When was the last time anyone utilized the power of CloudFormation “Custom Resources”? Or had created their own Terraform Provider? Statistically, lim --> 0.

“Bonus”: Homegrown tools

While the vast majority of organizations fall into one of the above categories, few try to overcome the problem by developing their own tools and platforms. Without digging too deep into the obvious reasons why sidetracking to something that’s not the team’s core business, it’s important to note that homegrown tools suffer from trust issues. [Rude generalization alert]: Developers tend to disrespect and quickly lose interest in infrastructure platforms developed internally. That’s not always the case, but from our 8-years experience as a DevOps consultancy we’ve yet to see a full-blown platform developed internally, that’s trusted and used with pleasure around all departments. Moreover, in our experience, 95% of infrastructure-related homegrown solutions were actually in the process of deprecation. Users cannot be simply judged as dumb; their instincts of trust represent the bigger picture: building infrastructure is not their company’s business.

Yeah, ok, the world sucks, everything is bad and quality is down the drain. Leave us the f alone.

Well, yes, but also, no: there’s pain, but it is curable.

How awesome would it be if there was a magic wand, that

Doesn’t require code yet generates duplicatable infrastructure
Doesn’t require knowing best practices but implements them by default
Doesn’t require describing every bit of the infrastructure but provides it in full
Lets the user draw his environment with a simple drag and drop visual interface, having the system do the connections for him
Allows full integration of every part of the end result, deployable with the same button that instantiates the and provisions the entire thing

Consider drawing your own environment in just a few minutes

Adding monitoring, log collection, data metrics, container orchestration together with your application
Hitting a button and getting your deployed app running in the cloud, monitored, collected and visualized, integrated with whichever resources you wished and added to the map
All in no more than a few minutes of work.

Sounds impossible huh? Well, we used to think so too.

Meet the impossible Devek.

For years as DevOps consultants, we’ve sinned in every sin I described. We were caught not reusing our own code, sometimes not using code at all. We too sometimes neglected best practices for speed and what we thought was efficient when wearing our developer’s custom. All the while advocating theory and perfectly designed architectures when playing the role of consultants.

The hour has come to put an end to this madness.

We’ve designed a platform that we personally needed. Such that would allow us to reuse our own deployed code while perfecting it with every new deployment. We wanted to rapidly drag and drop our infrastructure while separating the dirty from the quick. But on top of all, we implemented integration design abilities; we wanted to be able to ship our infrastructure and make every bit of it play with everyone right from the start. No more separated deployments, phases of configuration or repetitive work of connecting services to one another!

Now we (and the world) can ship an application, with its surrounding infrastructure and relevant tooling, having them all fully operational right from step one. In simple words, we translated our knowledge, mixed it up with the pains we’ve been experiencing for years, into one perfectly fabricated medication, and we want you to enjoy it too.

How

In order to employ our theory, Devek doesn’t only design and ship infrastructure, it integrates every bit of it by tailored API communication. For example, a design can include an AWS account, mixed with best practice secured networking, together with a Kubernetes cluster already consisting of the user’s applications as pods. We can add monitoring with Prometheus, Logging with the Elastic stack, some error tracking with Sentry.io, and alerting with PagerDuty. Accessing a simple design makes sure the user provided the minimum necessary details to make all of the above play together right from the start.

We can extend Devek to integrate and flawlessly connect any API oriented service in the internet. This means you can deploy your server on Goole cloud, consuming a database from Compose.io, sending metrics to NewRelic, and using AWS’s CDN service CloudFront on top. With Devek, the sky is the limit, and the matrix of abilities is endless.

Join us in our path to revolutionize the world of integration. Help us bring the power back in the developer’s hands.

Help us make America the internet great again!

Devek is in alpha. Contact me for early open access to the system to try it out.

Share on

Twitter Facebook LinkedIn