Please forgive me dear readers as this is a article that needs to be made, and has been a few times around the internet. One of the best examples out there is by Jason Scott entitled, “FUCK THE CLOUD”, and I honestly couldn’t have said it better myself. In fact, you should check out Jason’s other articles linked at the bottom of the one that is linked, especially the one about magnolia where he makes pretty much the same points that I do. In this article, I’m not merely going to rant about “the cloud” as an abstraction, but rather that the suits and marketing folks took a simple abstraction that was traditionally represented upon a whiteboard to mean an external network that you don’t control outside your own network, make it seem fluffy and in turn make you cede all control over your infrastructure. Instead, I’m going to compare the very real and two vastly different paradigms that’s really on offer here when it comes to your infrastructure. Infrastructure as a service (the cloud), and infrastructure as code. Yes, both can compliment each other quite well. However, over reliance on infrastructure as code brings great benefits and value – the cloud brings added benefits, added value and greater risk. Allow me to explain in the most Penn and Teller way I can.
Infrastructure As A Service (Sadly known as “The Cloud”)
IAaS is actually a very useful tool that one can use when needed such as when you’re experiencing a greater than expected load on your systems and you need some fall-back to handle the extra load. That is the perfect scenario of where this type of stuff comes in so very handy. There are of course the other applications of IAaS such as large scale data storage if bandwidth is an issue for your application. Perfect examples of this would be something that serves up large media files where offerings such as Content Delivery Networks (CDN) are the ideal solution if you don’t want to, or can’t roll your own. Anyone who has utilized Akamai or S3 knows of its benefits. It might seem reasonable to serve from “the cloud”, and it is, but the problem I have is that companies all over the world store everything they have on other outside third party infrastructure such as Amazon Web Services. People, this is bad! The marketing folks at Amazon, and the fanboys won’t tell you this, but this is not what you’re supposed to do! Any self-respecting network or system architect would never offer up the entire crown jewels of their company and place their faith in someone like Amazon or Microsoft. Re-read that, faith. You’re putting your faith into a company to store all your services, your data, your customers data and probably even your backups. Faith is not a backup strategy. It’s a lousy, lazy and down right ugly idea that most of these so-called “experts” have gotten it into their head that it’s the best solution to all their problems. It isn’t. It’s a bloody nightmare with additional overhead that cedes all sorts of control over your infrastructure to the point where it truly is not worth it.
Infrastructure as a service is a huge business risk. When done incorrectly.
To understand why it is a business risk, we must re-learn the value of our companies intellectual property. If we truly value it and do not want it stolen or compromised through poor security then we have to take that power into our own hands. Yes, it is not easy, but if you actually value these things then you’ll hire people who actually know how to properly administer, secure and deploy systems at all levels. Most companies now do not do that, they’re ceding far too much of their control and with it, their security to firms like Amazon. Again, you are putting your faith into Amazon or worse Microsoft, that they have this “security thing” figured out and it won’t be a problem. You think that you just have to worry about the security of your own image or your own application? News flash. Most companies when they deploy an image, rarely if at all do any kind of security hardening on it outside of the basics they get when they create it through the AWS wizard. There is a reason that traditionally we have expressed external infrastructure outside of our control as a cloud – it is nebulous in all its parts. This is not an added benefit, this is a risk. How are we to accurately security audit the full stack of such a nebulous system?
You cannot. Believing you can is bullshit.
Most system operators/administrators or “DevOps” as they sometimes incorrectly like to be called these days will not be able to tell you about the security of their infrastructure if it is hosted in the cloud. If your job is to push buttons on a website to administer the operations of your stack, or simply edit configuration files then you’re not allowed to call yourself a “DevOps guy” and need to had in your DevOps card – and probably your man card with it. It’s sad that through the creation and growth of what is unquestionably a fantastic technology, virtualization, has made traditional (and the up and coming) system operators dumber and more ignorant than a bag of rocks. Most administrators, or “cloud babysitters”, will not be able to tell you if their images and the software contained within them have full RELRO, if stack canaries are inside their binaries or if they are position independent executables (PIE). Ask any typical administrator what they are and they’ll draw a blank. RELRO is Relocation Read-Only is a simple exploit mitigation technique employed during the build of your binaries (if the options are set) that can be effective in mitigating return-orientated exploitation. Stack canaries are simple protections for the stack whereby a random integer is pushed onto the stack just after the function return pointer has been pushed on. Then the canary value is actually checked before the function returns. This is a major annoyance if you’re an attacker and trying to smash the stack. Whilst this isn’t the be-all and end-all of exploit mitigation, it does make an attacker work much harder and is generally an effective pain in the ass though it isn’t perfect. Explaining all of these security measures in your binaries is well outside the scope of this article. Thankfully, with Fedora 23, all packages are going to be rebuilt with all of these security measures enabled. For the most sensitive of services, these are already enabled and have been for a while – even across many distributions. Thankfully though, it’s easy to enable this yourself as rebuilding a Fedora package is insanely easy and there are a few I have enabled myself for my own use. I wonder how many system administrators or cloud babysitters do the same thing these days? Probably not at all. Most of them barely understand how the systems truly work. You’re supposed to be the master of your domain, the king of your kingdom. Instead, system administration has become the twenty-first century equivalent of paper pushing.
I’ve seen, interviewed with and contracted with companies who have their entire infrastructure on the cloud. Including their backups. What if everything you had was compromised? What if your keys to your AWS instances were stolen (I’ve seen it happen) and with it, they took your backups? In the last several years, the cloud has been seen as an attack vector against companies. You see it all the time now.
That cloud is actually a fart (CAPEX vs OPEX)
If you buy in to the typical drivel that is peddled by the suits at Amazon, Microsoft of SAP then you belong to some of the most gullible and idiotic group of people that is shared by flat-earthers, Scientology and viewers of the Fox News Channel. I’ve seen the same marketing material trumped out by all three companies to try and sell the gullible CIO/CTO/CEO to buy into this new fangled buzzword. The first place they go to is the capital expenditure (CAPEX) versus the operational expenditure (OPEX) argument and try and convince people that the cost of running all their services on the cloud and giving them a big cheque every month will be cheaper than the up-front investment in servers and services. Yes, there’s no up-front cost for buying physical hardware that you need, that’s true. However that’s not typically really the case in the long term, and as I’ve pointed out above there are great IT security and business risks involved with having a total buy in to the cloud which have to be taken into consideration for this type of decision. With the increase of competition in this space, I’m sure the prices will be reasonable for most people to consider this, even start-ups. It’s a compromise. These services should always be involved in infrastructure planning and high load contingencies, ready to complement the rest of your services when load spikes. Not replace them.
If you buy some of my cloud, I can buy a Jaguar F-Type
The debate should not be about CAPEX vs OPEX, rather it should be about operational and business value. If your company owns its own hardware, that is an asset that’s added on to the company book value. Compared to all the other assets in your company, you might think that a bunch of servers doesn’t really add all that much value. Sure, like all other assets there is a depreciation factor associated with them. That’s less of a problem as it was ten years ago as the hardware inside servers will be more useful for longer than previous generations of hardware was. Regardless of what dollar value the bean-counters associate with them. But if they don’t have much value “in the grand scheme of things”, then how do you value your intellectual property? Would you store all your engineering documents, designs and schematics in a desk drawer of another company? Would you first have that other company look through all your patent documents between your legal team, your engineers and the patent office as they’re mailed between one and another? That’s what you’re doing if you offload all your email to Microsoft and Google. A lot of my infrastructure contracts over the last few years has been to design, deploy and administer secure email systems because businesses, especially engineering and scientific companies do not trust such valuable information to be hosted on other servers.
The other value you get is that with owning these systems yourselves, you will also have a more knowledgeable, responsible and smarter team of people within your company who knows how to do system operations properly. By offloading all your technology to the cloud, the only planning you truly have is faith in those systems. I don’t have faith in any system, that’s why I design and plan extensively to mitigate any scenario that might arise. If you have an operations team where their answer is to always automatically say, “we’ll do it on the cloud”, you need a new operations team. Fire them and get people who are actually skilled and take pride in their craft. Hire actual engineers to do this, these days – you have to. You need people with experience writing code. Recently, I interviewed with a company for a DevOps position and rejected me because I was too, “development focused” and I asked too many questions so they thought I was unable to take direction. At first I was upset because it was somewhere that I truly wanted to work, now I’m still laughing about it at how ridiculous it sounds. It seems as if they don’t actually know what DevOps is. If you hire good people, the value it will add to your business will be paying off every single day with every single transaction. Even if it’s something small such as starting out with bringing your email, DNS and corporate website in house.
The cloud is not always the answer to scalability
If you will indulge me to channel into my inner Penn and Teller, this one really pisses me off. The fact that organisations view the cloud as the only answer to scalability. It isn’t. Usually it’s sheer laziness at work where they attempt to through more hardware, more instances at a problem. People these days have forgotten how to optimize. They’re scared to look at creative or “non traditional” approaches. This is especially prevalent within web companies who make heavy use of Django or Rails and are too scared to try to increase throughput throughout their application servers, instead resorting to, “fire up another instance”. Of all my DevOps friends who work in those environments, all they do is have a single stack set up per instance and don’t even fully utilize the CPU cores. Their solution rather than to scale up on that box first and utilize multiple instances per box is just to scale out. It’s a waste of resources.
Yes, the cloud can greatly aid you in your need to scale, but you have to be doing things right in the first place. As per the previous examples that I have cited, the cloud is perfect for scaling media serving or even the size of your data if you so desire or have the need.
Every cloud has a silver lining
If there’s anything that I’m trying to sell you on, it’s responsibility and accountability. The cloud definitely has a place is the world and provides a service that can add value to your organisation, if used responsibly. As I have already stated, it’s perfect for excess load, distribution and so forth, but the responsibility to your organisation and infrastructure is yours. Properly designing systems is not hard. In fact, it’s quite easy if you have the skills and because you actually have control.
So what is the correct paradigm to look at your infrastructure?
Infrastructure As Code
The reason that many people have been attracted to the cloud in the last few years is because they falsely assume that their own server farms are too hard to manage or maintain. Bullshit. These days, we have fantastic tools available to us, but the absolute best and champion of “infrastructure as code” is Puppet. The guys over at PuppetLabs have unquestionably made everyone’s lives so much easier. Not just in the cloud environment, but also for in house server farms. Being able to automate and easily manage your infrastructure is truly a blessing. The entire concept of infrastructure as code is something that we’ve been waiting for a very long time. It’s actually somewhat disappointing that all of this wasn’t already mature for ten years within the industry at the time the guys at Puppet showed up. It was the ultimate missing link for administration and if you’ve ever used it, you’ll know why. It’s impossible to go back to the old days before puppet. Not only has it eased the management and deployment of production environments, but when combined with tools such as Vagrant it then becomes invaluable within your own development environment. Infrastructure as code that is versioned within git repositories, combined with virtualization is the holy grail of system operation and deployment in the twenty-first century. This is a pattern that will live on for decades.
One of the primary reasons that people started to switch to the cloud in the first place was to ease the burden of setup. I can fully remember the (now agonizing) days of when each and every server or VM image had to be setup manually. I also remember trying to maintain massive bash or Perl scripts just to set up the basics of the system or your software upon it. It was a nightmare. Now, the friction to having your own hardware is much lower than it used to be. Most of the problems are gone, the only one that really remains is the hardware costs which are easily managed and mitigated by using the cloud only for when you’re under peak loads. If your business truly does require you to invest in more hardware servers due to a stable growth trend, then do it. Doing so is a true investment that will pay off, and having the right team in place to manage it will add untold amounts of value to your operations. There is absolutely no reason why a company should have single point of reliance on a sole vendor such as Amazon or Microsoft to host their services, likewise, you shouldn’t keep all your hardware servers located in one place. All the new DevOps tools now allow you to easily manage a multiple site setup with ease which will then mitigate your reliance on a single co-location vendor. It’s painlessly easy to migrate running VM guests to other VM hosts to another side of the world from your laptop, I’ve done it. The annoying wait process for additional hardware also doesn’t seem to be a problem either these days. Most datacenters will have hardware available for you to put in, or you can have new hardware delivered over-night to your co-location facility to be installed. If your business plans ahead properly and your operations team is part of those plans, with metrics in hand, then the previous headache where you had to wait for the next hardware cycle is fairly averted. This is not hard to do.
Leverage the power of Infrastructure as Code as much as you can. I plan to be going more in depth in future articles about DevOps, Puppet and yes, even cloud infrastructure as much as possible. The intent for all categories is to showcase their power and what they’re best used for to increase adoption of the right tool for the right job. In essense that is my main problem with the over-enthusiastic push towards putting absolutely everything on the cloud. It is not the right tool for the job all of the time. You wouldn’t hire a steel post pile driver to nail a carpet down, would you?
Another article that I aim to write sometime this week is about how some cloud services is actually detrimental to your privacy and in part, that is also what some of this article is about. Whether for personal use or for business, the cloud involves you having to say, “Yes, I relinquish control”.