AWS Meetup - recommendations from end users

A few nights ago I attended the AWS meetup in Sydney
http://www.meetup.com/AWS-Sydney/events/222805209/

The topic was "what I have learnt from using AWS", one perspective presented by Paul Wakeford from Fairfax (large AWS footprint / expenditure) and another presented by Andrew Boag from Catalyst IT (smaller annual spend). They went through what they may have done differently if they started again, useful tips for scaling, security, costing etc.

Summary

It was refreshing to see that several cloud initiatives which I've played with are in line with the direction that more-experienced folk are deploying their environments into AWS (although there's much more I could be doing better)
Some of these "best practices from end users" are high level and don’t apply to all situations

Tagging

"Tag EVERYTHING"

  • Tags are a way of logically labelling resources in AWS using custom key-value pairs.
  • AWS allows up to 10 tags per resource, applicable for most resource types.
  • Some resource types (e.g. auth keys, auto-scaling launch configurations etc) don’t have tags available, however tags can be easily used for standard resources (e.g. VMs, security groups, volumes, etc)
  • Suggestion to use a standard format for tags, e.g.
  • Tag 1 = name (e.g. device name, security group name, etc) - note that AWS automatically sets a random hostname so this may be useful for setting something more readable
  • Tag 2 = customer/consumer (e.g. Travel web team)
  • Tag 3 = group responsible for managing the resource (e.g. Web development Team A)
  • Tag 4 - prod/non-prod
  • Tag 5 - who provisioned the resource (e.g. web_auto_scaler, user-robbie, etc)
  • Put scripting in place to automatically tag child objects (e.g. if a VM is created, copy the tags from the VM to the block storage devices, snapshots etc)

Naming conventions

  • Ensure you have a standard convention for naming and it is well understood

Billing

"Engineers aren’t accountants"

  • Use different AWS accounts to easily split billing (can be easier than splitting by tag)
  • AWS costs can skyrocket with increased usage. The more you can drill down into "per project", "per department" etc billing, the easier it will be to justify cost increases
  • Use tags to further split billing
  • USD credit card rates - a few percentage points can make a huge difference. Check your bank fees and consider using a different provider for cheaper USD conversions
  • Set up billing alerts (e.g. every $10,000 get an Email, etc). and/or set up a dashboard to monitor this longer term

Automatic instance termination

  • (ALWAYS?) set policies to automatically scale down idle/dev resources outside of business hours etc.
  • Wait for a consumer to request a resource, rather than pre-emptively starting resources

Architecting for the cloud

  • Not every application is suitable for AWS
  • AWS isn’t designed for a "lift and shift" migration of VMs. Instead, each VM should be written to scale up, scale down, re-deploy etc
  • Stateful / snowflake systems should be avoided. Try to make immutable (immutable = "state doesn't mutate - the only way to change it is to re-deploy / rebuild") environments which can be easily destroyed and re-deployed.
  • In AWS, if you claim a resource (i.e. run a VM with 16 CPU cores), you pay for it and don’t benefit from any thin provisioning, even if your server is idle. Therefore always-on systems with large amounts of resources for occasional spikes in workload are probably more expensive to run in AWS as compared to your own data centre / ESX cluster
  • Consider redesigning the application to spread across multiple hosts, use a load balancer and scale up/down when required
  • Be prepared to migrate back to internal systems (or privately hosted etc). Hyper-scale cloud (e.g. AWS) may end up being more expensive, sometimes there comes a point that it makes more sense running it in house. Generally, if you can't easily re-deploy from scratch, I'd question whether it is valid to put it in AWS
  • Be very careful about dedivated instances (pre-paying for a set period of time at a fixed cost). Most of the time, your requirements change and you will rarely reap the benefits of this approach

Multiple AWS accounts

  • There are challenges moving resources between them if new accounts created later, so create separate AWS accounts as early as possible.
  • Separation into different accounts also helps for role based profiles - it can be difficult to split user permissions within an account but easy to restrict certain actions for a user to a single AWS account.
  • As an example, if you're running multiple major websites, you may run 30 separate accounts (one account per product - i.e. one for news website, one for fishing website, one for real-estate website, etc).

Expiry dates for snapshots (and perhaps dev VMs?)

  • Set a tag indicating an expiry date. Then (via scripting) this resource (e.g. VM, snapshot etc) will be automatically destroyed on the expiry date.
  • This be difficult to introduce in a culture of "keep everything"

Auth

  • Use multi factor authentication (username + password, as well as a software token such as Google Authenticator)
  • Assign users in a parent/root AWS account, rather than in individual accounts
  • Use a hardware token for the root/master user/password and keep it locked away in a physical safe. There should be no need to ever use this for day to day admin.
  • Use profiles etc instead of service accounts
  • Use a good/consistent naming convention for users etc