redbluemagenta

a blog curated/written/whatever by christian 'ian' paredes

DevOps: Fully Automated Infrastructure

(Inspired from the dev2ops post on the same subject.)

I am extremely interested in the “DevOps” movement, if not for the methodologies it purports to fulfill. The ideas aren’t necessarily new; there’s a lot of literature - typically published on USENIX - on new methods of looking at infrastructure provisioning and maintenance. However, the DevOps movement is probably the largest movement that aims to actually implement these ideas in web operations (and hopefully the reach of the movement will penetrate other markets as well.) One of the ideas that is highly appealing to me, and I presume for many other systems administrators as well, is the idea that the infrastructure should be completely automated and abstract enough for us to operate on the systems without touching any one of them directly. Having a fully automated infrastructure allows us to treat each box in the system as simply a commodity, and not as a unique box with special customizations on it.

As soon as we have an infrastructure where we only worry about service levels, and not about any particular box being up or down (because we can just reprovision the box to take up another role), we can start adding more layers that start changing how we view our systems. Instead of worrying whether httpd is running on a-box-009.prod, we instead look at metrics that infer, “hits per second have gone down from 100/s to 20/s.” We observe trends in our data and provision more boxes as necessary, instead of having to tweak one special box for one special set of hardware in order to squeeze performance out of it (hardware is cheap, labor is not.) If we do any sort of performance tuning, we only do it once on a single box of a certain class, to which we bake the configuration into our specification and roll it out to all of the other servers (where said class is what we specify as a grouping of servers: small EC2 instances could belong in a single class, whereas medium EC2 instances can belong in another class. Probably even better is to classify servers by role AND by hardware specs.)

We should recognize that hardware is cheap, and that we should be able to reproduce everything we do. It’s not enough to set it up on one box and call it good: if we can’t roll out the configuration to many boxes along with monitoring and metrics within a much tighter time frame, we aren’t finished with our job of rolling out the service. We do this by using Puppet or some other configuration management system; adding a new package in Puppet should entail adding a declaration that adds Nagios and Munin/Cacti/etc. monitoring for that package as well.

Though our unique knack for figuring out core problems by just looking at how a system behaves is still useful in this new view of systems administration, we should strive to take our craft and mold it into a profession that has more of an engineering/architectural bent to it. Instead of sniffing around and using our intuition for solving issues, we should be able to abstract enough of our craft away from solving individual issues and start looking at systems in a more holistic way. This new view, something that DevOps aims to bring to fruition, will help us build scalable systems quickly and reliably, and will free up our time to tackle larger projects and to listen more to customer concerns.

Comments