Puppet meetup Sept 2015

Some notes from tonight's puppet meetup at Optiver Sydney. Thanks to Johnny, Kieran and Pedram, some very interesting discussion points.

  • ScienceLogic - promising enterprise tool for managing puppet performance across multiple tenants
  • Puppet at scale was a very interesting talk - some great approaches for overcoming scaling limits.

ScienceLogic - Johnny Miza

Only a very quick overview of ScienceLogic, but good to know the tool exists. Might be useful in the enterprise, looks like it achieves much the same as Graphite (monitoring), puppet enterprise, etc. I'm sure this is an over-simplified view.

  • Multi-tenant view to the puppet environment
  • Secure Multi Tenancy focus - used by many MSPs.
  • Centalised enterprise management platform
  • Automated discovery of puppet resources
  • Science Logic provides performance metrics over the puppet environment
  • Puppet often used for deploying ScienceLogic collectors too
  • RBAC available - you can set permissions for certain data sets.
Secure Multi Tenancy

This is achieved by establishing a secure tunnel from the collector (i.e. the agent within the tenant's secure environment) into the centralised ScienceLogic database.

Order in a world of Snowflakes - Kieran Sweet & Pedram Sanayei (Sourced)

Sourced are a systems integrator in the financial services sector, targeting config management, automation etc.

They have given an overview of an engagement about 3 years ago, for a company who had a heap of cloud instances (Azure, AWS etc).

Every cloud environment instance is referred to as a "snowflake"

Their Cloud Broker helped manage all of these components, however it was a very simple set of scripts simply looking for an exit(0) return code from each script. When there was an error, the cloud broker didn't know what to do.

Initial Solution (up to 1,000 instances):
  • Puppet
  • Stash source control
  • Cloud broker writing to NFS
  • NFS being spread across multiple sites

Initially puppet fixed a lot of their issues, but wasn't scalable across multiple sites.

Network faults etc resulted in puppet runs being unsuccessful and the puppet enterprise console showing a heap of errors.

They also needed the Broker to be the External Node Classifier (but also wanted other devices to be able to be registered without going through the broker - i.e. independent of the broker)

2nd Solution

Puppet enterprise

  • eYAML for encrypted secrets in hieradata
  • mCollective and r10k
  • Puppet Enterprise supports PKI everywhere
  • Node Classifier extends how the product works and reduces reliance on NFS
3rd approach needed

Queries (when nodes check in for puppet run) - three options

  • Query the broker via an API
  • Any more than 3 node sessions per second caused the broker to crash
  • Push data into node groups
  • Values pushed via the node classifier into node groups, data is then fetched out of puppet directly when a node checks in.
  • This was scalable up to about 4000 node groups. (each group only had one node in it)
  • Redis (Managed caching tier)
  • A middle cache where the Cloud Broker pushes data to the cach.
  • Redis can sustain 50,000 requests per second.

Reliant Security has written a hiera back-end to fetch values directly out of hiera.

Hiera redis supports complex data structures. Can push JSON to redis and reference these variables in heiradata.
end users to set application configurations to use JSON, simplifying management across large numbers of nodes.