Archive for September, 2012

IEEE Computer Society Talk 20120924 Big Data, Clouds, and Low Latency

2012/09/25

The topics of Big Data and/or Cloud Computing have become way too big to cover in a one hour talk. Choosing  a slice through these topics was a challenge, but putting a Low Latency spin on it seems to open up a formally taboo slice.

After an introduction to Big Data, Clouds, and Hadoop, the talk discusses low latency and basically concludes that if you need low latency, you probably need a private cloud.

That said, it appears that there are now people in the Hadoop infrastructure that are interested in low latency, and it will certainly be interesting to see how that shakes out.  The optimization of keeping MapReduce processes in memory for subsequent execution on new data was added by Google in the early days of MapReduce.  This helps, but doesn’t do everything, and it certainly doesn’t address the need for special hardware (switches, firewalls, fiber optic boosters and repeaters, motherboard paths to IO, caching, and of course nanosecond clocks.)

The bleeding edge folks for low latency in the cloud are the electronic high frequency securities traders.  Where a millisecond advantage is worth 100’s of millions of dollars, no expense is spared on equipment or software.  These traders have achieved mind-blowing results.

The talk also examines the new Open Compute server designs sponsored initially by Facebook.  Unfortunately, Facebook has few needs for low latency.  As with Google, Yahoo, et al, the only low latency need is at the user interface where the users expect “instant” results from their queries.  Even the version 2 Open Compute designs seem a little lacking; although the AMD design is probably better than that of Intel in this arena.

The slides for the talk are here.

-gayn

How hard is 99.999% availability? The GoDaddy outages.

2012/09/15

On June 22, 2010 GoDaddy suffered a four hour outage.  On June 14, 2011 they were down for three hours, and recently on Sept 10, 2012 their DNS service was down for 6 hours.

What is their availability rating for these three years, assuming no other outages?  The calculation is easy. Three years equals 26,298 hours.  They were down 13 hours, and up 26,285 hours.  Their availability is 26285/26298 = 99.95%.  or about three nines.

Now GoDaddy promises its DNS customers 99.999% availability.  If we just focus on their DNS service, with its recent 6 hour outage, how many years, counting 2012, will it take for 100% DNS up-time, to recover to this five nines goal?  Well, there are 8766 hours per year, so we need to solve (8766*n – 6)/(8766*n) = 0.99999.  The answer is a staggering n = 68.45 years!

Well, there is mathematics and there is marketing. After 10 years of perfect up-time, I’m sure any company would advertise “Ten years of perfect up-time” and sweep the previous outages under the rug.  This might even be reasonable from a practical point of view, at least after ten years.  That said, with the cloud vendors and even the infrastructure vendors struggling to get over three nines, how does a company that needs five nines make its decisions?

The answer of course is to never put all you eggs in one basket.  In the era of immature cloud computing, it will be a challenge, and this challenge will be expensive.