Computer Weekly wrote today “Why networking is the last frontier of IT automation

It should be, but the reality for the IT team in most SME organisations is very different.

One suggestion in the article is that “If you mess up a SQL server, you can just press a button and have another one up”.

While the IT Managers and sysadmins we deal with would like this to be true in their organisation, I think most would say this ideal is not quite their reality.

So how do we make this work in practice?

To answer that, we need to widen our focus – not just SQL Server – to any server based workload. Windows or Unix / Linux:

  1. Documenting your configuration – infrastructure-as-code
  2. Spinning it up
  3. Testing it
  4. Rinse and repeat

Documenting your configuration – infrastructure-as-code

Most companies we come into contact with have servers which have been in place for some time – often years, set up by previous colleagues or outsourced companies who are no longer around. Configuration has been changed many times, and probably hasn’t been documented along the way.

Infrastructure-as-code is the idea that every individual aspect of your server or network’s configuration can be expressed as a piece of code which can be re-run whenever you like, to restore that piece of configuration to its original state.

For Unix / Linux users, this won’t be difficult, but for Windows users, it’s much harder – it’ll rely on your ability to use command line tools such as powershell to work out all the different pieces of your configuration.

If you’re running a modern version of your operating system or infrastructure software, you’ll be able to query, store and re-produce every aspect of the configuration of your server and turn it into a piece of code which can be re-run later. If you’re running older, or unsupported versions this will be much harder.

Configuration management

Once you’ve got your infrastructure all defined as code, it’s still going to keep on changing. You need to keep track of that – otherwise, all your hard work so far will be lost.

By checking your infrastructure-as-code into version control, you’ll be able to keep track of how it changes over time, and tie those changes directly back to individual user or management requests in your ticketing system.

But that’s not the whole story – you need a configuration management tool like Chef, Ansible, Puppet or Powershell Desired State Configuration.

Spinning it up

Once you’ve got your infrastructure all defined as code, you’ll need a process and a place to re-build it. This could be a public cloud provider like Amazon Web Services, or Microsoft Azure, or it could be an on-site test environment using VMWare, HyperV or Qemu. Wherever you choose to do this, it needs to be segregated from your live environment entirely.

The ability to spin up and configure cloud resources using tools like Chef, Ansible, Puppet or Powershell Desired State Configuration means you can build a consistent, repeatable, and identical environment to your live one, but entirely separate from it.

You can use the new environment for testing, and you can spin it up any number of times, use it, break it, change it, and when you’re done, throw it away. It’s disposable. Infrastructure which would previously take weeks or months to build now takes minutes.

Rinse and repeat

Once you’ve perfected the ability to spin up a complete copy of your production IT infrastructure, you’re in a completely new place.

You can:

Experiment – spin up a complete copy of your production systems to test anything you like. It doesn’t matter if you break everything – just throw away your environment and start again.

Develop features – If you’re running applications or services on top of your IT infrastructure, you can start to ship features regular and often, but only when you’re confident you can rebuild the underlying infrastructure easily and quickly in the event of a problem.

Test your backups – Many people have backups, hardly anyone ever tests them. That’s often because they don’t have an identical replica of their live system to restore into without breaking anything. Unless you can restore your backups and access the data contained in them, you don’t have any backups.

Practice disaster recovery – If you have a capability to tear down and rebuild your infrastructure reliably, you can do it regularly. Many IT disaster recovery plans fail because they are only tested every few months. In the meantime, your IT infrastructure has changed, and the plan hasn’t been updated. If you’re rebuilding parts of your infrastructure regularly as part of your routine, the plan gets tested every time you do that, and you get lots of practice.