Puppet Sydney Meetup January 2016

I went along to the first Sydney Puppet User Group meetup for 2016. Below are some notes.

Masterless Puppet

Richard Tomkinson - Cloudten Industries

"Using AWS Buckets and IAM roles"

Very interesting talk. Went through a brief case study, outlining the things which worked well, the challenges and other considerations. Some of the notes don't relate directly to the case study (e.g. multiple ways of managing code repos, etc).

Case study. Requirements:

Scalable
DR
Multiple environments
Encryption
Developers need access to push code and read logs
Dont want to provide devs with full access to OS though

AWS Auto Scaler specifics

Scale up based on metrics such as CPU utilisation, memory etc.
Likewise if an instance drops below a certain threshold it destroys itself.
Each instance stateless - no permanent information stored on each instance.

Instance Build workflow:

Vanilla AMI, not much built into it except for a specific version of the puppet client

Bootstrap script

When the AMI boots, it pulls down further instructions from a code repository. This way, if improvements / bug fixes etc need to be made, the AMI doesn't need to change, only the code which is pulled down when booting. The remainder of the install steps are driven by the bootstrap script.

This script may also look after minor details such as setting initial hostname, network, defining base Yum repos, etc.

Install Packages

Basic packages such as puppet and facter can be installed here.

Pull code for puppet manifests

To perform the local puppet apply , the puppet code needs to be sync'd with this box. Code is pulled using git, svn, curl etc from an S3 bucket.

Puppet Apply

Selected puppet manifests are applied to the local instance, to configure the environment (monitoring, networking, etc) and install/activate the application.

Code for the application is pulled from an S3 bucket

Amazon Simple Queueing Service - application code updates

(details here are quite high level, some of the specifics might not be quite accurate).
https://en.wikipedia.org/wiki/Amazon_Simple_Queue_Service
It is a distributed queueing service, typically residing on each node in the environment.

Each dev team has their own S3 storage for code storage (e.g. git, SVN).
They push to a code repo (e.g. stash), which pushes out to the AWS S3 bucket via a post-commit hook (e.g. when they merge into master branch, or if a dev instance when they merge into a specific staging branch).

This post commit hook also places a trigger file into the bucket.

The AWS SQS regularly polls the S3 bucket (containing app code and/or puppet code) for code updates. In this situation, a trigger file is used (e.g. a temp file called "newcode.push.$environment" is deployed into the bucket).

When this trigger file is detected, the SQS:

pulls the new code for the app and the puppet manifests, then queues the deployment for each app server
Each node subscribing to the queue
is notified of the pending code release to be applied via a trigger file
sleeps for a random period (to reduce the risk of this application reloading at the same time as other nodes), then
applies the new code and restarts the appp
removes the temp file from S3 once the application reload has been completed. (not entirely sure when this occurs, it may be earlier once the SQS subsystem has accepted it, then SQS takes care of the remainder of the queueing).

AWS IAM is used for S3 bucket access control.
Security credentials queried via special AWS metadata accessed via HTTP

What is puppet used for?

Install patches, kernel hardening etc
Define custom package repos
AWS tagging using Facter
Package installation
Config management, auth/keys, logging, etc
Possibly some application configuration (e.g. nginx deployment) - although app specific code may simply be pulled as a separate exercise

Why masterless?

Less instances to manage
Don't want access for developers to infrastructure
Don't need puppet enterprise
Don't need to manage puppet certificates or puppet node lifecycle

Why SQS Queue

Don't need Bamboo, Jenkins etc. It is very simple, using files to trigger pulls.
Also don't need to worry about configuring post-commit hooks to interface with the application directly.
As the latest code revision always exists in the S3 bucket, if a new instance starts and isn't issued with the "update code" instruction, it will still retrieve the correct/latest revision of the code via the bootstrapping workflow.

Issues they faced

Some useful thoughts when designing such a solution. High level list of possible issues and non-exhaustive. Some good things to consider.

Enforced package versions which are removed from repos cause the build to break (as the package versions are hard coded into the puppet manifests)
Autoscaling delay - time taken to scale up, new instances may be initialised whilst waiting for build to complete. Measure how long a build takes and account for this in the auto scaling.
S3 sync didn't handle zero byte files properly
Event trigger mechanism required a lot of tweaking
Developers started storing files on local instances directly (bypassing git etc). On the plus side, if you can automate the build workflow (and perhaps regularly - daily etc terminate and rebuild each instance), the developers will prefer this workflow and storage on instances won't be reliable.
They tried using S3FS (shared filesystem) - caused much grief. Worked around this by moving to S3 posts from each instance to store back in the S3 bucket (e.g. for logs and other persistent storage). There are other approaches also valid here - collectd, syslog, etc.
Needed to find a balance between longer build time (but highly customisable) vs short build time by hard-coding more into the base AMI (which becomes less flexible).
S3 doesn't exist in the VPC, it is outside of this region. Additional work required to set up gateways for accessing S3 securely whilst maintaining VPC separation.

Other possible improvements

Deploy new nodes, check then scale down old nodes (that way, if a build fails, Prod isn't affected)
Create multiple environments and give each developer (or dev team) their own environment

Questions

Are there any smarts to check for a build failure (and cancel queued rebuilds etc)

No. Limited business impact and they can re-deploy old versions quickly.

Are you using AWS generated hostnames or manually set hostnames. Have you played with both approaches?

Tagging policy, set customer name, prod/dev/server-type (e.g. sql/web)/instance-id. They use a mixture of instance-id and other customer-specific data for the hostname. Smart - keep the hostnames dynamic but still encode some human-readable useful info in the hostnames.

VCE Presentation

Andrew - VCE

Brief presentation from the hosts of this meetup.

Who VCE are:

Started as a VMWare/Cisco/EMC joint venture. Now owned by Dell.
Focussed on converged systems.

VCE VBlock
VCE VxBlock
VCE Technology Extensions
VCE VxRack Systems

VCE act as a single integrator and support provider (including software/firmware update centralisation). They also centrally manufacture and preconfigure technology stacks prior to deployment to the customer's environment.

VCE, Puppet and Docker

Puppet modules for VCE vision used for managing VCE VBlock etc technology stacks through VCE Vision. Not published in puppetforge yet, so this approach is still quite new.

VCE Vision deployed into docker containers to manage dependencies.

For me, limited use for now, but interesting to discover Puppet edge cases like this, not just managing Linux / Windows operating shystems.

Tag

Back to all post