Thursday, January 19, 2012

Thoughts on Amazon's DynamoDB


Amazon just announced their latest addition to  the AWS suite.  DynamoDB is a "fast, highly reliable and cost-effective NoSQL database service designed for internet scale applications".  I haven't had a chance try out to the service for myself, but while reading Werner Vogels blog about it, some key statements caught my attention.
 Amazon DynamoDB stores data on Solid State Drives (SSDs) and replicates it synchronously across multiple AWS Availability Zones in an AWS Region to provide built-in high availability and data durability.
SSD has become popular to speed up storage performance.  I've seen instances where SSD's are just as a caching tier in front of a somewhat traditional storage solution.  Most notable example here would be Facebook's Flashcache, a linux module which provides "a simple write back persistent block cache designed to accelerate reads and writes from slower rotational media by caching data in SSD's".  Amazon however chose to use SSD as the primary storage. Obviously this is calculate into it storage pricing, currently $1 USD per GB per month. 
What I found most interesting about this architecture detail, is that Amazon is capable of designing flexible services while still allowing themselves to make specific hardware choices, and create automation processes to support those choices.  Typically today, it is popular to standardize on commodity hardware which is relatively easy to manage and created services on top.  Having the flexibility to pick specialized hardware for a certain end user service allows for higher quality services.
 Each service encapsulates its own data and presents a hardened API for others to use.  Most importantly, direct database access to the data from outside its respective service is not allowed. This architectural pattern was a response to the scaling challenges that had challenged Amazon.com through its first 5 years, when direct database access was one of the major bottlenecks in scaling and operating the business.
Encapsulation is one of those things that everyone will agree with in discussion.  However, in practice it seems to be a bit more challenging.  There are excuses everywhere for getting an exception and creating tightly couple services.  Performance is often used as excuse, or just pure laziness.
Designing encapsulated services and orchestrating them in a true service oriented architecture requires discipline and planning.  Even more, I'm of the opinion that organizations should be structured around such coarse grained services.  People in those service teams should identify them selves with the services, and pride themselves in availability numbers and adoption.  It's much more than a technical decision.
Amazon's success here is good evidence that the required investment pays off.
 As we spoke to many senior engineers and service owners, we saw a clear pattern start to emerge in their explanations of why they didn't adopt Dynamo more broadly: while Dynamo gave them a system that met their reliability, performance, and scalability needs, it did nothing to reduce the operational complexity of running large database systems. …  Dynamo might have been the best technology in the world at the time but it was still software you had to run yourself. And nobody wanted to learn how to do that if they didn’t have to. Ultimately, developers wanted a service.
 The whole cloud business is sometimes reduced to being just the next hype and a lot of hot air.  However, the art of hiding complexity while creating large scale, distributed solutions is in my opinion exactly what cloud is about.  Yes, we might have been doing that to a certain degree well before 'cloud' emerged, but hey… now we have a name for it.
There's a lesson in here as well.  Since the technology allows for it, it is tempting to create private IaaS, PaaS and cloud-like solutions to be used within enterprises.  However, if those services lack a good usability strategy,  it will either prevent adoption or your operations teams will get bombarded with support tickets from people who don't understand how to use your service.  So the lesson is clear:  ease of use is equally important than delivering a high quality service.  Your users want to create business apps, and you can't expect them to be experts in the service you just created.
Consider usability as the (self-)marketing for your service.  A product with a lousy marketing strategy is often doomed to fail.
 Throughput reservations are elastic, so customers can increase or decrease the throughput capacity of a table on-demand
Throughput needs to be reserved to get the best results.  This means that as an end user, you need to have a pretty good idea of how you're going to be using the service, and what through-put you expect.  This may sound logical, but often applications are created and deployed without a good understanding of how they'll be used, or how busy they will be.
The above statement is an argument for doing your homework.  Large scale applications requires a lot of upfront analysis and planning, maybe even traffic simulations to understand what your infrastructure needs are going to be.  This reminds me of another recent article with a great punch line:  "Cloud is complex - deal with it"