Usenix ATC 2016

This is the first day of the Usenix Annual Technical conference June 2016, held in Denver, Colorado, USA.
Specifically, the "HotStorage" and "HotCloud" events were held over the first two days of the conference.

As each presentation was delivered by a single researcher, only this main presenter is referenced in the below text. The links to the Usenix conference page provide more information as well as copies of the papers submitted for the conference.

HotCloud

Session 1 looked at cloud spot pricing strategies, limitations and some possible approaches for improvements both for consumers and providers.

How not to bid in the cloud

Prateek Sharma - University of Massachusetts Amherst

https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/kim

This workshop looked at the consumption of cloud services using supply/demand driven pricing strategies. The focus seems to be consumption (i.e. how to consume efficiently) rather than perhaps how the pricing is set etc.

Transient servers in the cloud - spot instances are suitable for cost effective compute which is not delay latency sensitive

Many complex bidding strategies have been proposed, to optimize cost whilst maintining availability.

Prateek's paper has looked at spot pricing across multiple regions, AZs etc, plus multiple instance types/sizes

At the cheap end, Availability is consistently high but not 100%
There is no penalty for high bid prices, you only pay the spot price, not your bid price

Mean time between revocations

No matter how high your bid, you will not avoid revocations
Price spikes are normally far too high to simply bid high to maintain availability

Revocation gap between different regions

Some analysis has been done on the crossover between revocation timing between regions/AZs, however there sin't much predictability to this (when comparing the same type of compute).

How about sharing statistical data - If everyone was to look at patterns then follow the same advice to optimise cloud, won't this simply saturate the available environments?
The uptake is probably low so not a problem yet. If however this is published and widely consumed (particularly in APIs etc) it could become more of a challenge.

Follower vs trend-setter, if people come up with their own algorithms/logic for spreading across independent cloud zones they will be OK, but if they simply follow other advice (which others will too) they are more likely to see saturation.

Spot price characteristics are set by internal supply and demand within AWS, factors such as

total resources available
consumed on-demand EC2 resources
consumes reserved instances
consumed spot instances
maintenance of AWS infrastructure

Overall observations based on their data:

90% of bids yield availability, cost and MBTR which are near-optimal.

Quality of consumption in the cloud

Eric Keller - University of Colorado

https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/kablan

QOS on Services in the cloud include

Throughput
Response time
Packet loss rate
Uptime
latency
etc

A lot of effort has gone into defining and measuring these.

There are two sides to performance metrics

The provider's perspective
The consumer's perspective

The provider sets these QOS metrics based on affordability, differentiation, ease-of-measurement, security, etc

Two main types of consumers

stable, predictable, etc
unpredictable, unpatched, bursty performance etc

Quality of Consumption (QoX) - measurement of how effectively consumers use cloud

Why would cloud providers care?

Concern about attacks on a cloud platform
Optimisation based on common usage patterns
Perhaps if a certain consumer / workload is more prone to attacks, these workloads could be moved to more conservative/protected infrastructure.
competitive advantage

They could track consumers using a credit-rating type of approach. For instance, consumers could

Security & privacy around sharing usage data is a concern. If this is available to consumers, it could be used to exploit known vulnerable environments (e.g. if you can determine which zones/infrastructure contain vulnerable environments, you could reverse engineer them and attack in a more targetted fashion)

Another avenue - SaaS providers (between a rock and a hard place). How do they consume cloud infrastructure and also protect against excessive usage of components such as internal IO (e.g. network from one instance to another within the same cloud)

Cloud Spot Markets are not sustainable

Supreeth Subramanya - University of Massachusetts Amherst

https://www.usenix.org/conference/hotcloud16/workshop-program/presentation/subramanya

Mature commodity spot markets are inherently volatile and unpredictable.

Compute compared to all other commodities is stateful. There is always an overhead for this, whether

losing data
implementing high availability
implementing a deployment methodology to overcome stateful requisites

Presenter's conclusion that as the cloud spot markets mature, the value of the resources decrease.

Volatility of spot markets decrease (the market stabilises) as each spot market becomes more deprecated. Older instance types perhaps more reliable (but less efficient).

Spot priced products aren't the leading offering from cloud providers, therefore the cloud providers may not want to implement features to make spot easier to consume than reserved instances.

My own concerns here

Overall I'm somewhat concerned however, that the future seems to align more with guaranteeing infrastructure availability rather than encouraging application developers to expect outages. One approach which was mentioned was to introduce regular consistency point flushing into the applications (e.g. writing to a distributed consistent stateful data platform) - perhaps combined with triggers of a pending spin-down (e.g. 30 second warning should trigger a consistency point to be written then prevent further writes).

Feeding the pelican - using archive hard drives for cold storage

Austin Donnelly, Microsoft

https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/black

The Pelican was a rack of the cheapest & most dense storage - 1152 disk drives with 2 x servers controlling them.

(the previous Pelicantechnical paper can be found here https://www.usenix.org/conference/osdi14/technical-sessions/presentation/balakrishnan )

This platform is classed as cold storage. Most drives are not active at any given time, only about 8% (drives spun-up on-demand). This ensured that power was not exceeded for the rack.

Cold storage is typically written once and read rarely (probably never again).

Drive lifetime measured in TB per year. The most risk in head issues (e.g. dust etc) is when the head is actually reading / writing. Thermal expansion is used to fly the head closer to the disk platter surface to read/write. When it is idle, the head is further away to prevent damage.
Therefore TB per year looks at how much time the head ahs been close to the disk platter.

Pelican looks at several factors

reliability of drives (failure rates etc)
Spin up time
Spin down time
Power draw (inrush and ongoing) above standby
This also looks at power differences between standby, spinup, normal operation, spin-down

These drives are performing around

100,000 spin-ups per year.
60TB IO per year
250 Powered On hours per year

Drive failures vs temperature

The warmer drives don't fail as regularly as the cooler drives

The Pelican drives are seeing higher humidity. Warmer drives burn off the humidity
Dust in the environment may also be related

Failure rate is below 4% per year (in line with industry normals)

Failure rates are MUCH lower for the first 1 year of life compared to normal drives. Then it becomes around the same as industry normal.

RAID is not suitable for Pelican (cold storage) as it would require any drives to be online to complete a write commit to disk

Conclusion

Archive drives are effective and reliable storage
Temperature and humidity are major factors in failure rates
regular spin up / down does not decrease drive life.

Data Management Approach for SMR

Adam Manzanares

https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/black

plus Fenggang Wu
Evaluating host-aware SMR storage

https://www.thomas-krenn.com/en/wiki/Linux_Storage_Stack_Diagram

(Very cool diagram.)

Shingled Magnetic Recording Storage achieves high density storage using narrow tracks which partially overwrite adjacent tracks.

Traditional drives use PMR (Parallel Magnetic Recording), however this has faced increasing issues with scalability.

See http://www.storagereview.com/what_is_shingled_magnetic_recording_smr

Generally SMR is only useful in sequential archive storage.

Random writes must first be buffered, to convert the errant write pattern to sequential. The metadata used to manage the media cache differs across drive types:

Host-Managed SMR
Host-Aware SMR
Drive-Managed SMR

The conclusion from today's discussions was that Host-Aware SMR provides the best overall benefits for handling metadata related to random writes.

Non-sequential writes are still worse than sequential due to garbage collection - flushing the Media Cache. This can then become a blocking factor preventing further writes from occurring whilst GC finishes.

Workload analysis with SSD in mind -

Gala Yadgar - Israel Institute of Technology

https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/yadgar

Workloads inspire optimisations of media. Disks are designed with specific workloads in mind.

Some key SSD characteristics:

read/write quickly
erase (garbage collection) slowly
limited lifetime

Flash - partition by temperature (access patterns) to minimise write amplification

Application specific optimisation for NVMe SSDs

Hyeong-Jun Kim, Sungkyunkwan University

https://www.usenix.org/conference/hotstorage16/workshop-program/presentation/kim

Latency of different storage types

HDD = 10s of milliseconds
NAND = 10s of microsections
3D XPoint = 10s of nanoseconds
DRAM = naoseconds

Stack

IO flows through the following layers

App
Filesystem
Block
Request queue
SCSI
SAS Driver

This stack is being optimised. Lower latencies present opportunities to minimise layers such as queueing.

Optimising the kernel

can't implement any policy which favours a certain application

Previous optimisations included direct access to storage devices (raw PV access etc)

One proposed approach is similar to solarflare onload, but for storage. Use a driver to bypass kernel and allow an application to directly access the storage.

One Use case where significant improvements have been seed include redis, lots of very small writes.
Observed performance increase was 15% more IOPs and about 13% less latency

Still very infant but some good opportunities for development and adoption in write-latency sensitive workloads, as well as optimising / replacing memcache.