I went along to the first Sydney Puppet User Group meetup for 2016. Below are some notes.
Masterless Puppet
Richard Tomkinson - Cloudten Industries
"Using AWS Buckets and IAM roles"
Very interesting talk. Went through a brief case study, outlining the things which worked well, the challenges and other considerations. Some of the notes don't relate directly to the case study (e.g. multiple ways of managing code repos, etc).
Case study. Requirements:
- Scalable
- DR
- Multiple environments
- Encryption
- Developers need access to push code and read logs
- Dont want to provide devs with full access to OS though
AWS Auto Scaler specifics
- Scale up based on metrics such as CPU utilisation, memory etc.
- Likewise if an instance drops below a certain threshold it destroys itself.
- Each instance stateless - no permanent information stored on each instance.
Instance Build workflow:
- AMI
Vanilla AMI, not much built into it except for a specific version of the puppet client
- Bootstrap script
When the AMI boots, it pulls down further instructions from a code repository. This way, if improvements / bug fixes etc need to be made, the AMI doesn't need to change, only the code which is pulled down when booting. The remainder of the install steps are driven by the bootstrap script.
This script may also look after minor details such as setting initial hostname, network, defining base Yum repos, etc.
- Install Packages
Basic packages such as puppet and facter can be installed here.
- Pull code for puppet manifests
To perform the local puppet apply
, the puppet code needs to be sync'd with this box. Code is pulled using git, svn, curl etc from an S3 bucket.
- Puppet Apply
Selected puppet manifests are applied to the local instance, to configure the environment (monitoring, networking, etc) and install/activate the application.
Code for the application is pulled from an S3 bucket
- Amazon Simple Queueing Service - application code updates
(details here are quite high level, some of the specifics might not be quite accurate).
https://en.wikipedia.org/wiki/Amazon_Simple_Queue_Service
It is a distributed queueing service, typically residing on each node in the environment.
Each dev team has their own S3 storage for code storage (e.g. git, SVN).
They push to a code repo (e.g. stash), which pushes out to the AWS S3 bucket via a post-commit hook (e.g. when they merge into master branch, or if a dev instance when they merge into a specific staging branch).
This post commit hook also places a trigger file into the bucket.
The AWS SQS regularly polls the S3 bucket (containing app code and/or puppet code) for code updates. In this situation, a trigger file is used (e.g. a temp file called "newcode.push.$environment" is deployed into the bucket).
When this trigger file is detected, the SQS:
- pulls the new code for the app and the puppet manifests, then queues the deployment for each app server
- Each node subscribing to the queue
- is notified of the pending code release to be applied via a trigger file
- sleeps for a random period (to reduce the risk of this application reloading at the same time as other nodes), then
- applies the new code and restarts the appp
- removes the temp file from S3 once the application reload has been completed. (not entirely sure when this occurs, it may be earlier once the SQS subsystem has accepted it, then SQS takes care of the remainder of the queueing).
AWS IAM is used for S3 bucket access control.
Security credentials queried via special AWS metadata accessed via HTTP
What is puppet used for?
- Install patches, kernel hardening etc
- Define custom package repos
- AWS tagging using Facter
- Package installation
- Config management, auth/keys, logging, etc
- Possibly some application configuration (e.g. nginx deployment) - although app specific code may simply be pulled as a separate exercise
Why masterless?
- Less instances to manage
- Don't want access for developers to infrastructure
- Don't need puppet enterprise
- Don't need to manage puppet certificates or puppet node lifecycle
Why SQS Queue
- Don't need Bamboo, Jenkins etc. It is very simple, using files to trigger pulls.
- Also don't need to worry about configuring post-commit hooks to interface with the application directly.
- As the latest code revision always exists in the S3 bucket, if a new instance starts and isn't issued with the "update code" instruction, it will still retrieve the correct/latest revision of the code via the bootstrapping workflow.
Issues they faced
Some useful thoughts when designing such a solution. High level list of possible issues and non-exhaustive. Some good things to consider.
- Enforced package versions which are removed from repos cause the build to break (as the package versions are hard coded into the puppet manifests)
- Autoscaling delay - time taken to scale up, new instances may be initialised whilst waiting for build to complete. Measure how long a build takes and account for this in the auto scaling.
- S3 sync didn't handle zero byte files properly
- Event trigger mechanism required a lot of tweaking
- Developers started storing files on local instances directly (bypassing git etc). On the plus side, if you can automate the build workflow (and perhaps regularly - daily etc terminate and rebuild each instance), the developers will prefer this workflow and storage on instances won't be reliable.
- They tried using S3FS (shared filesystem) - caused much grief. Worked around this by moving to S3 posts from each instance to store back in the S3 bucket (e.g. for logs and other persistent storage). There are other approaches also valid here - collectd, syslog, etc.
- Needed to find a balance between longer build time (but highly customisable) vs short build time by hard-coding more into the base AMI (which becomes less flexible).
- S3 doesn't exist in the VPC, it is outside of this region. Additional work required to set up gateways for accessing S3 securely whilst maintaining VPC separation.
Other possible improvements
- Deploy new nodes, check then scale down old nodes (that way, if a build fails, Prod isn't affected)
- Create multiple environments and give each developer (or dev team) their own environment
Questions
Are there any smarts to check for a build failure (and cancel queued rebuilds etc)
- No. Limited business impact and they can re-deploy old versions quickly.
Are you using AWS generated hostnames or manually set hostnames. Have you played with both approaches?
- Tagging policy, set customer name, prod/dev/server-type (e.g. sql/web)/instance-id. They use a mixture of instance-id and other customer-specific data for the hostname. Smart - keep the hostnames dynamic but still encode some human-readable useful info in the hostnames.
VCE Presentation
Andrew - VCE
Brief presentation from the hosts of this meetup.
Who VCE are:
Started as a VMWare/Cisco/EMC joint venture. Now owned by Dell.
Focussed on converged systems.
- VCE VBlock
- VCE VxBlock
- VCE Technology Extensions
- VCE VxRack Systems
VCE act as a single integrator and support provider (including software/firmware update centralisation). They also centrally manufacture and preconfigure technology stacks prior to deployment to the customer's environment.
VCE, Puppet and Docker
Puppet modules for VCE vision used for managing VCE VBlock etc technology stacks through VCE Vision. Not published in puppetforge yet, so this approach is still quite new.
VCE Vision deployed into docker containers to manage dependencies.
For me, limited use for now, but interesting to discover Puppet edge cases like this, not just managing Linux / Windows operating shystems.