Archive for the ‘High Availability’ Category

The recent Github attack


Last Thursday (3/26/2014) a DDoS attack on the code sharing site began, targeting github code pages for, a non-profit that mirrors the web content of sites censored by China, and, a mirror of the Chinese version of the Times, also censored by China. Connections to are via https and are encrypted. Thus “code” posted can be any content, and China’s Great Firewall, can’t filter it when an ordinary citizen retrieves it. China tried to block all of in 2013, but its software technology sector objected to the point that the block was removed.

It appears that there were a number of DDoS attack vectors and techniques, but the one that interested me the most, described in detail by insight-labs, was to hijack the Chinese browser Baidu’s user tracking javascript code (similar to Google Analytics code) and to insert a loop that opened and every two seconds. Thus everyone in China that uses this popular browser became an attack site against github! Since China controls its inner network and the Internet border, it was more or less trivial to insert this MITM (or as calls it, a “man on the side”) attack. Apparently only a small fraction of Baidu retrieved pages get injected with this attack; most pages are retrieved normally. Thus Baidu users rarely notice a glitch.

The bottom line is that China’s Great Firewall has been converted from a censorship tool to an attack tool. The folks at Baidu claim to know nothing about this, and frankly, I’m hard pressed to figure out what they could have done to their browser to prevent this hijack. Could our NSA hijack Microsoft’s Internet Explorer in this way?


IEEE Computer Society Talk 20120924 Big Data, Clouds, and Low Latency


The topics of Big Data and/or Cloud Computing have become way too big to cover in a one hour talk. Choosing  a slice through these topics was a challenge, but putting a Low Latency spin on it seems to open up a formally taboo slice.

After an introduction to Big Data, Clouds, and Hadoop, the talk discusses low latency and basically concludes that if you need low latency, you probably need a private cloud.

That said, it appears that there are now people in the Hadoop infrastructure that are interested in low latency, and it will certainly be interesting to see how that shakes out.  The optimization of keeping MapReduce processes in memory for subsequent execution on new data was added by Google in the early days of MapReduce.  This helps, but doesn’t do everything, and it certainly doesn’t address the need for special hardware (switches, firewalls, fiber optic boosters and repeaters, motherboard paths to IO, caching, and of course nanosecond clocks.)

The bleeding edge folks for low latency in the cloud are the electronic high frequency securities traders.  Where a millisecond advantage is worth 100’s of millions of dollars, no expense is spared on equipment or software.  These traders have achieved mind-blowing results.

The talk also examines the new Open Compute server designs sponsored initially by Facebook.  Unfortunately, Facebook has few needs for low latency.  As with Google, Yahoo, et al, the only low latency need is at the user interface where the users expect “instant” results from their queries.  Even the version 2 Open Compute designs seem a little lacking; although the AMD design is probably better than that of Intel in this arena.

The slides for the talk are here.


How hard is 99.999% availability? The GoDaddy outages.


On June 22, 2010 GoDaddy suffered a four hour outage.  On June 14, 2011 they were down for three hours, and recently on Sept 10, 2012 their DNS service was down for 6 hours.

What is their availability rating for these three years, assuming no other outages?  The calculation is easy. Three years equals 26,298 hours.  They were down 13 hours, and up 26,285 hours.  Their availability is 26285/26298 = 99.95%.  or about three nines.

Now GoDaddy promises its DNS customers 99.999% availability.  If we just focus on their DNS service, with its recent 6 hour outage, how many years, counting 2012, will it take for 100% DNS up-time, to recover to this five nines goal?  Well, there are 8766 hours per year, so we need to solve (8766*n – 6)/(8766*n) = 0.99999.  The answer is a staggering n = 68.45 years!

Well, there is mathematics and there is marketing. After 10 years of perfect up-time, I’m sure any company would advertise “Ten years of perfect up-time” and sweep the previous outages under the rug.  This might even be reasonable from a practical point of view, at least after ten years.  That said, with the cloud vendors and even the infrastructure vendors struggling to get over three nines, how does a company that needs five nines make its decisions?

The answer of course is to never put all you eggs in one basket.  In the era of immature cloud computing, it will be a challenge, and this challenge will be expensive.

Crashes on Wall Street


It is of course interesting when cloud vendors have problems.  Cloud computing is like the 1849 Gold Rush to California.  If you are not heading there, you are at least talking about it.

Less discussed are recent trading crashes on “Wall Street”.  Maybe it is the distance of the moon to the earth, but there sure have been a noticeable number of such crashes.  Now Wall Street is a little more secretive than the big cloud vendors.  Maybe it is because there is less technical scrutiny on Wall Street, and they don’t perceive the advantages of openly discussing technical problems.

What prompts this post is an article in the business section of today’s LA Times about Knight Capital’s recent crash having its root cause a total lack of adequate quality assurance, but I’m getting ahead of myself …

Let’s start with the automated trading system BATS.  In a sense, BATS is a competitor of all the established stock exchanges, and is now the third largest stock exchange in the US.  It is totally automated, replacing floor traders with software.  It started in October 2008, and it was doing so well, that it decided to have an IPO – an entrepreneur’s dream exit – on March 23, 2012.  They, not surprisingly, picked themselves to be the exchange to list and sell their stock.  But a funny thing happened on the way to the market, just as their stock was about to trade, their system crashed!  Not just stopped, but many trades, including Apple’s stock that morning, were corrupted.  They finally pulled the plug, but the damage was done.  I assume they were able to mop up the corrupted trades, but their embarrassment was so great that they withdrew from the planned public offering. (BATS recently announced they will try the IPO route again.) I tried hard to find the root cause, but secrecy prevailed.  They did release a statement that they rated their system as 99.9% available.  They had had a few crashes prior to the March debacle, but that wasn’t warning enough. Such a low availability rating is inexcusable for a stock trading system, and it appeared to me that their testing was just totally inadequate.

Imagine how unhappy all the Facebook investors were when the NASDAQ IPO software couldn’t process all the buy requests in the early minutes of the Facebook IPO.  NASDAQ claimed they spent “thousands of hours” running “hundreds of scenarios” for their testing.  Again I haven’t seen a technical root cause analysis for the NASDAQ problems.

I’m going to skip over the JP Morgan billion dollar loss, but it was due to a rogue trader executing more than risky trades.  I will, however, wonder aloud why adequate reporting software didn’t exist to flag such a series of risky trades.

Public exchanges are not the only sources of computer bugs.  AXA Rosenberg had to pay $217 million to pay for investor losses due to a “significant error” in its software systems.  It also paid a $25 million penalty to regulators for … guess what … hiding the error!  “The secretive structure and lack of oversight of quantitative investment models, as this case demonstrates, cannot be used to conceal errors and betray investors.”

Not to totally pick on the US, and to give this blog a bit of an international flair, consider Madrid’s Bolsas y Mercados Espanoles, which suffered a four hour outage when communication servers crashed.  What, no redundancy?  No duplicate Internet connections?  Sigh…  This breakdown affected two multilateral trading platforms operated by NYSE Euronext: Smartpool and NYSE Arca Europa where orders could be submitted but not traded.  Note that a bug in one exchange can affect other exchanges!

The Tokyo Stock Exchange just had its second major problem in the last year.  The root cause was reported to be a router going down, and the failover to a backup router also failed.  First kudos to the managing vendor Hitachi for disclosing the problems.  Second, the key lesson to learn is to test your failover mechanisms!  BUT, it took 95 minutes for the on-site staff to diagnose the problem and to affect a manual failover.  Way too long.  The lesson here is that to manage Mean Time To Repair (MTTR), training and practice is essential.  Good diagnostic software might also have helped identify the problem faster.

Now, back to the US and Knight Capital:  In less than an hour, trades that were supposed to be spread out over days, were executed essentially one after the next.  The result was a $440 million dollar loss for the firm.  Had not investors led by Jefferies Group, Ltd.  provided $400 million, Knight Capital would have gone under.

Now comes today’s LA Times article, by-lined Bloomberg News.  It appears that Knight Capital installed some new software designed to interface to the NYSE’s new retail liquidity program for small investors.  The “Law of Untended Consequences” bit them.  The installation of this new software somehow activated previously dormant software, which started multiplying trades by 1000.  What a bug!  ANY KIND OF TESTING would have discovered such a huge bug.

There is a theme here.  Software on financial exchanges executes trades at dizzying speeds.  A bug can very quickly cause millions of dollars in bad trades.  I’m a little shocked that this software is obviously not tested adequately.


References and interesting links:

The costliest bug ever:

How software updates are destroying Wall Street (Bloomberg and Businessweek):

Two Years After the Flash Crash (of 2010), Are Markets any Safer?   [I strongly recommend this article!]

SEC judgement on AXA Rosenberg Entities:

Lessons from More Microsoft Azure and Amazon AWS Outages


It’s been awhile since this blog has discussed cloud outages (for example: quick link).  The recent reports of Microsoft and Amazon outages gives one pause to contemplate.  First, cloud outages are facts of life, and they in no way should deter anyone from embracing cloud technology – public, private, or hybrid.  On the other hand, anyone adopting any technology should have some deep thoughts about how to deal with the inevitable failure that this technology will bring to your operation.

First, what has happened?  Microsoft Azure has recently has two outages, and Amazon AWS has had one.

Let’s start with an Azure outage on Feb 29, 2012 where a leap year bug took out VM services for 13 hours and 23 minutes.  Apparently, when a VM was initiated on Feb 29, 2012 for a year, it was given a certificate stating that it was valid until Feb 29, 2013, which is an illegal date.  This initialization failed with an erroneous interpretation that the server itself had failed, and an attempt to initialize this VM on another physical server was attempted.  This next attempt failed for the same reasons, and  you can see that further attempts created a real mess.  Note that this mess was caused by TWO bugs, not just the leap year bug.

OK, so it took 13 hours and 23 minutes to patch this bug on all but seven Azure clusters. Those clusters were in the middle of a different upgrade.  What to do?  Microsoft’s effort bombed.  They attempted to roll back and patch, but they failed to  revert to an earlier version of a network plug-in that configures a VM’s network.  The new plug-in was incompatible with the older, patched, host and guest agents, and all VMs in these 7 clusters were immediately disconnected from the network!  Cleaning up this new mess took until 2:15 AM the next day.  The total lack of full functionality lasted over 26 hours.

What to fix?  Clearly the sophomoric leap year bug was fixed along with some testing for date/time incompatibilities among software components.  Fixed also was the problem of declaring an entire server bad, when just a VM had problems.  Finally, Microsoft intelligently added graceful degradation to VM management by blocking new VMs or extending old ones, instead of rashly shutting down the entire platform due to a small problem.

Because their customer service lines were swamped this sad leap year day, Microsoft also upgraded its error detection software to detect problems faster, and it upgraded its customer dashboard to improve its availability in the presence of system problems. Outage notification via Twitter and Facebook has now been at least partially implemented.

Next, on June 14, Amazon’s AWS center in Virginia experienced severe storms and consequently the failure of back-up generators (See [8] for a frank  and excellent root-cause analysis that includes bugs found and plans to improve backup power and to fix the bugs.)  Related power-related problems took down portions of the data center. Multiple services and some hosted web sites were down for several hours.  It was reported [6] that this was a “Once in a lifetime storm”, but it took down Netflix, Instagram, and Pinterest.  Once power was restored, Amazon went to work restoring “instances” (running jobs) and storage volumes.  Amazon also reported unusually high error rates for awhile, the cause of which will probably not be known.  Amazon calculated [7] their outage to be 5 hours and 20 minutes.

Next, on July 26, 2012 11:09 AM, Microsoft announced [3] “an availability issue” for Windows Azure in the West Europe sub-region. At 1:33 PM, they announced that this issue was resolved.  This was an outage of 2 hours and 24 minutes, although Microsoft totaled the outage as having duration 3.5 hours [4].  Apparently storage and running applications were not affected.  As of this post writing, I can find no root cause analysis that has been published by Microsoft.

OK, what can we learn from these (and other) outages?   First cloud technology is new, and even the most experienced pioneer, Amazon, has problems.  Second, putting all of one’s computational eggs in one cloud vendor’s data center basket is not going to give you a five nines system, and you may well lose important data.

The obvious, and expensive, solution is to duplicate your cloud based systems across multiple geographies and to have a fail-over strategy from your primary system location to your secondary location.  I’ve seen people recommend the use of two different cloud vendors, but I find it hard to believe that the pain and cost of two different vendors are worth it.  The disaster data seem to indicate low probability of systemic and instantaneous errors occurring across a vendor’s entire set of data centers.  (Although Amazon’s EC2 and EBS failures in April 2011 did affect two “Availability Zones.” )  In addition, while cloud vendors have at best skimpy data center fail-over services, you might as well use what they have.  It is interesting that [9, 10] Adrian Cockroft of Netflix argues for using three availability zones (presumably in different geographic locations) with no extra (live) instances.

What about private clouds?  Well, they are great, and they provide improved availability.  On the other hand, they are just as susceptible to disasters as regular private data centers.  The good news, is that fail-over to a public cloud may give reduced performance, but it may be a cost-effective business continuity strategy.  This fail-over to a public cloud may also take a long time to “spin up”, because public cloud vendors take a long time (20 to 40 minutes) to instantiate a job as big as a private cloud, even if the data are all on its file system and the capacity is “reserved.”

An alternative, of course, is to duplicate your private cloud at a second geographic location.  The spin up time would be much less. Private clouds are usually local in order to provide high bandwidth and low latency to the clients. This performance advantage would be lost at a remote location, but the business continuity may be worth this degradation in network performance until the local private cloud is up again.

All data center fail-over mechanisms require  a reasonably continuous backup stream to the remote data site and the ability to launch the private cloud system at the remote site.  Those transactions that didn’t get recorded at the remote site will probably be lost, and a restart (from a checkpoint) mechanism is essential.

Let’s analyze availability ratings for Azure and AWS.  Recall, the availability of a system is the percent of time the system is fully functional.   Microsoft’s Azure was not fully functional, due to the two outages discussed here, for 29.5 hours.  If this was the only time the system was not fully functional (most likely there would be other shorter and unreported partial outages) for the year, then Azure would have no greater than a 99.66% availability rating.  For Amazon, the loss of full functionality in 2011 (see  quick link) was several days although within 12 hours 60 percent of the instances were restored.  Let’s estimate 48 hours of not full functionality in 2011 and just 5.3 hours (so far) 48 in 2012. This would average 99.7% availability (less if I didn’t use zero downtime for the rest of 2012.)

Now some could argue that these outages didn’t affect all their datacenters, and that the availability ratings  – based on say 10 data centers – should be an order of magnitude better.  Well, ok, this would rate them both at three nines not two nines and change.  But the point is that this is VERY FAR from five nines.  The conclusion is again that depending on only a single data centers represents very low availability even before one factors in bugs and other problems in the customer’s system.

Finally, I note that many of the large cloud vendor’s problems are due to the fact that they are large and hence complex.  A private cloud would well have less complexity, have a simple redundancy story, and hence a higher availability rating.  Sadly, I haven’t seen such an example.







[6]  (This site has (or had) an apparent AWS ad with the cute caption “Cloudy with a chance of fail.”  Clicking on it led to )





Big Data, Cloud Computing, and Hadoop


I gave a talk to the IEEE on June 23, 2012 on Consulting Opportunities Using Apache Hadoop.   As appropriate for the IEEE, the talk went over big data and the hardware and data centers being used to process big data.  We discussed Google’s failure data for clusters of machines with 100’s to 1000’s of machines.  We went over Google’s File System, and the Hadoop open source analog HDFS, and then we plunged into several examples and techniques for writing applications that utilized the MapReduce infrastructure.

Throughout the talk, we discussed many consulting opportunities in this relatively new, fast growing, space.

The slides are here:  IEEE Gayn Winters Hadoop Slides 20120623

Please add questions and suggestions for a future talk/seminar.




– – I wrote out this specification many years ago, partially to practice writing formal specifications, and partly to get clear definitions of the basic operations on tables in a relational database management system (RDBMS). Too many times, I see vague definitions and a couple of examples.  I wanted very precise definitions.  Somehow I can’t find those original notes, but since I was looking at Hadoop and Hbase, I decided to redo them.

– – The notation used is a variant of one developed by Butler Lampson, who taught it to me when we were both at Digital Equipment Corporation. It is simple and intuitive, and it is defined using simple set notation. Types have both state and operations (functions and procedures) defined on them to form a class. The parameters N and V in RDBMS(N, V) are classes with implied operators. Think of N as “names” which have an order operation and a concatenation operation. V is a class of “values” which can be thought of as a union of names and numbers. There is an order operation defined on V. where the numbers sort before the names.  Other interpretations of V are possible.

– – The notation X → Y denotes partial functions from X to Y, i.e. a function that is not necessarily defined for all x in X. If f: X → Y is a partial function, then f!x for x in X means that f is defined at x, and the value is then denoted f(x) as usual. We think of f in two ways: First f is a set of pairs (x, y) such that If (x,y1) In f And (x,y2) In f Then y1=y2. The second way is to think of X = {x1, x2, …xn} so that (f(x1), f(x2), … f(xn)) is a point in n-space. This is particularly fruitful when X is a set of names and the n axes are labeled with the names in X. This second way of thinking will be used in this spec for a RDBMS.

– – The notation VAR x: X defines a state variable of type X. This construct can be followed with a “:=” to initialize x, or it can be followed with a vertical bar “|” preceding a condition. One reads the latter as “Choose an x such that condition is true.” No method to construct such an x need be given, and all construction options are open to the implementor. Details are discussed in [1].


ColumnNames = CN = N

Row = Tuple = CN → V

– – think of a row or tuple as an n-tuple whose components are labeled by column names. Geometrically, a tuple is a point in n-space, and it is sometimes helpful to think geometrically about such points. On the other hand, thinking of this n-tuple as a row vector, and soon as a row of a matrix is also fruitful. Note that both of these interpretations needs an order defined on the column names for visualization. This order is not part of the formal definition. When dealing with multiple partial functions r: CN → V , it is useful to think of their domain as large enough to include the domains of all rows of interest, in which case, a particular r may not be defined on a particular column name cn. One often says then that r is “Null” at cn.


Table = Relation = Set(Row) = Set(Tuple)

– – There are two ways to think of a table (= relation). The first is as a set of points in n-space, and the second is as a matrix whose columns have column-names, and whose rows may or may not have names. There is no defined order of columns or of rows. Thus the first of these interpretations needs to order the columns for visualization of points in n-space, and the second also needs to order the rows to obtain a matrix that one can write down and/or print. Entries in this matrix where the row partial function is not defined at a particular column name are sometimes said to have the Null value; although in these notes Null is not otherwise defined and is definitely not an element of V. A Null represents a lack of a defined value. Note that since this definition of Table is a set, a table cannot have identical rows. If this is desired for identical rows r and r’, then an additional column-name cn needs to be added with r(cn) not equal to r'(cn). With the image of a table T as a matrix T with orders on the column names c1, …, cn and rows r1, … rm, one often identifies the jth column name with the jth column whose value in row ri is tij = ri(cj).

Function domain(T: Table) Returns Set(CN) = Return (Union {r.domain| r In T})

Notation, we write T.domain for domain(T)

Function Projection(C: Set(CN), T: Table) Returns Table =
VAR T’ : Table := { }
For r In T Do
..VAR r’: Row := { }
..For cn In C Do If r!cn Then r'(cn) := r(cn) Done
..If r’ = { } Then Skip – – get a new r
..T’ += r’
End – – Projection

– – Here are three interpretations: 1. If one views T as a set of points in n-space, and C is a subset of the axes, then T’ is the projection of T onto the orthogonal space whose axes are named by C. 2. Viewing T as a matrix, then Projection(C,T) is just the matrix whose columns are the subset of those of T that are defined by C and with any all-Null rows omitted. 3. In SQL, if cn1, cn2, … cnn are the column names in C then one writes SELECT cn1, cn2, … cnn FROM T.


Predicate = Function(Row) Returns Boolean

– – A predicate p(r) is usually an equality or inequality expression involving the values of r.

Function Subset(T: Table, p: Predicate) Returns Table =
VAR T’: Table := { }
For r In T Do If p(r) Then T’ += r Done
End – – Subset

– – As above there are three interpretations: 1. If the r In T are viewed as points in n-space, and p defines the cloud of “true” points, then T’ is the set of points inside this cloud. 2. Viewing T as a matrix, T’ is the subset of rows r of T for which p(r) is true. 3. In SQL one writes SELECT * From T WHERE p(r). The * indicates to select all columns.

– – Most SELECT statements involving only one table are a projection of a subset, written
SELECT cn1, cn2, … cnn FROM T WHERE p(r). In other words, select the rows where p is true and then select some columns from the result.

Function Union (A: Table, B: Table) Returns Table =
VAR T: Table := {r: Row | r In A Or r In B}
End – – Union

– – If A and B are sets of points in n-space, then Union(A, B) is their set union. If A and B are thought of as matrices with the same column names, then Union(A, B) is the matrix whose rows are the union of the rows in A with the rows in B. If A and B have differences in their column names, then one first takes the union of their domains, extends each row of A and B with Nulls for the column names on which the row is not defined, and then takes the union of the rows. An example of Union is given in the discussion of OuterJoin below.

Function Product(A: Table, B: Table) Returns Table =
VAR AN: Set(CN) := { | cn In A.domain}, BN: Set(CN) := { | cn In B.domain}
VAR T: Table := { }
For (r,s) In (A,B) Do
..VAR t: AN + BN → V := { } – – Note the pair (r,s) defines a new row t.
..For In AN Do If r!cn Then t( := r(cn) Done
..For In BN Do If s!cn Then t( := s(cn) Done
..T += t – – add this newly defined row t to T
End – – Product

– – The row t of Product(A, B) defined above is often written t = (r, s).

– – First note that AN + BN is the (disjoint) union of the domains of tables A and B. Thus the product table T has the original column names of table A, prefixed with “a.” plus the original column names of table B, prefixed by “b.” Now since (A,B) is the Cartesian product = all pairs (r,s) with r In A and s In B, if table A has m rows and table B has n rows, then table T will have m*n rows.

– – While it blows some people’s minds, there is a geometric interpretation. If we view A as a set of points in m-space and B as a set of points in a (different) n-space, then Product(A, B) is the Cartesian product of these two sets of points.
– – In practice, while conceptually appealing, Product(A,B) is rarely formed. It is just too big. The operation that most frequently is used is called a Join. Most Joins of two tables A and B can be defined as projections of subsets of Product(A, B). Now a Join isn’t computed by first computing the product, but rather it is computed more directly, and the database folks have done a lot of work optimizing the calculation of Joins. Here we will just define them. At this point, I’ll note that Wikipedia has a very nice article on Joins at . In particular, the author of that article has a simple example, used below, that seems to illustrate all the relevant concepts.

– – The simplest, and the most common Join is defined by a single column name cn that is common both to A and to B. The subset predicate is for t = (r, s) in Product(A, B), p(r,s) is true if and only if t( = t(, i.e., r(cn) = s(cn).


A = Employee table



Rafferty 31
Jones 33
Steinberg 33
Robinson 34
Smith 34

B = Department table



31 Sales
33 Engineering
34 Clerical
35 Marketing

Note the John has not been assigned a DepartmentID, and Marketing (DepartmentID = 35) has no one in it. Since A has 6 rows and B has 4 rows, the Product(A, B) has 24 rows. Since A has 2 columns and B has 2 columns, Product(A, B) has 2+2 = 4 columns. It is a modestly interesting exercise to write out this 24 by 4 matrix.

If cn = DepartmentID, then the Join above is





Rafferty 31 31 Sales
Jones 33 33 Engineering
Steinberg 33 33 Engineering
Robinson 34 34 Clerical
Smith 34 34 Clerical

The reason John and Marketing do not appear in this result is that the predicate will not return True when evaluating Nulls.

The SQL for the above “Inner Join” has two forms. The first has an explicit Join operator and the second has an implicit Join:

SELECT * FROM Employee INNER JOIN Department ON a.DepartmentID = b.DepartmentID;


SELECT * FROM Employee, Department WHERE a.DepartmentID = b.DepartmentID;

A longer form of both uses “Employee” for “a” and “Department” for “b”.

Sometimes, the redundant column b.DepartmentID is removed by projection, and the remaining columns are renamed to yield:




Rafferty 31 Sales
Jones 33 Engineering
Steinberg 33 Engineering
Robinson 34 Clerical
Smith 34 Clerical

The SQL/92 for this SELECT * FROM Employee, Department USING DepartmentID.

Unfortunately not all RDBMS systems support the USING clause.

Exercise 0: Write out the spec for

Function InnerJoin(A: Table, B: Table, cn0) Returns Table

directly, i.e. without using Subset and Product. This code is very similar to that of Product.

Definition: A subset of columns C of a table is a primary key if for every r In T

  1. r!cn for every cn In C
  2. For every s In T, If r(cn) = s(cn) for every cn in C, Then r = s.

The column DepartmentID is a primary key for the table Department in the above examples.

Left outer join

The result of a left outer join for table A and B always contains all records of the “left” table (A), even if the join-condition does not find any matching record in the “right” table (B). This means that if the ON clause matches 0 (zero) records in B, the join will still return a row in the result—but with NULL in each column from B. This means that a left outer join returns all the values from the left table, plus matched values from the right table (or NULL in case of no matching join predicate).

Given the requirement that even if the Join cannot find a matching row in B for a row r in A, the row r remains part of the result, with Nulls filling all the columns representing B column names, this result cannot be part of Product(A, B). It follows that we must revisit and modify the code for the Function Product (or InnerJoin, if you did Exercise 0) in order to define LeftOuterJoin.

– – When cn0 is a primary key for B we can easily define/construct efficiently:

Function LeftOuterJoin(A: Table, B: Table, cn0) Returns Table =
Assert cn0 is a primary key for B
VAR AN: Set(CN) := { | cn In A.domain}, BN: Set(CN) := { | cn In B.domain}
VAR T: Table := { }
For r In A Do
..VAR t: AN + BN → V := { } – – Note each row r in A defines a new row t in the Join.
..For In AN Do If r!cn Then t( := r(cn) Done – – this loads up the side of t.
..If r!cn0 Then
….VAR s In B | s(cn0) = r(cn0) – – such s is unique since cn0 is a primary key for B
….For In BN Do If s!cn Then t( := s(cn) Done – – this loads the side of t.
..T += t – – add this newly defined row t to T, even if no matches occur in B
End – – LeftOuterJoin

– – Even when cn0 is NOT a primary key for B we can define a RightOuterJoin(A, B, cn0) which will add the rows of B for which there is no matching (on cn0) row in A.

Function RightOuterJoin(A: Table, B: Table, cn0) Returns Table =
VAR AN: Set(CN) := { | cn In A.domain}, BN: Set(CN) := { | cn In B.domain}
VAR T: Table := { }
VAR t: AN + BN → V
VAR foundNoMatchInA: Boolean
For s In B Do
..foundNoMatchInA := True
..For r In A Do
….If r!cn0 Then
….t := { }
….For In BN Do If s!cn Then t( := s(cn) Done – – loads side of t
….For In AN Do If r!cn Then t( := r(cn) Done – – loads side of t.
….T += t
….foundNoMatchInA := False
….Done – – since cn0 is not a primary key, look for more matching rows r.
..If foundNoMatchInA Then – – Add this row s anyway
….t := { }
….For In BN Do If s!cn Then t( := s(cn) Done – – t is Null on all
….T += t
End – – RightOuterJoin

There is no standard SQL for Outer Joins, but each vendor modifies their SQL to include it, e.g.


a.DepartmentID = b.DepartmentID;

Example of a LeftOuterJoin: 





Jones 33 Engineering 33
Rafferty 31 Sales 31
Robinson 34 Clerical 34
Smith 34 Clerical 34
Steinberg 33 Engineering 33

Example of RightOuterJoin which drops the information about John having no department, but adds the information that Marketing has no people:





Smith 34 Clerical 34
Jones 33 Engineering 33
Robinson 34 Clerical 34
Steinberg 33 Engineering 33
Rafferty 31 Sales 31
NULL NULL Marketing 35

When cn0 is NOT a primary key, on can still define a LeftOuterJoin'(A, B, cn0) to be equal to RightOuterJoin(B, A, cn0). Instead of getting exactly the number of rows of A for the join, one can get more.

Function OuterJoin(A: Table, B: Table, cn0) Returns Table =


Return(Union(LeftOuterJoin(A, B, cn0), RightOuterJoin(A, B, cn0)))

End – – OuterJoin

Example of OuterJoin(A, B) where the information contains both the fact that John has no department as well as Marketing having no people:





Smith 34 Clerical 34
Jones 33 Engineering 33
Robinson 34 Clerical 34
Steinberg 33 Engineering 33
Rafferty 31 Sales 31
NULL NULL Marketing 35

Exercises that give clues for implementation:

  1. Write out the code for LeftOuterJoin’ (the version that doesn’t assume cn0 is a primary key) without using RightOuterJoin.
  2. Write out the code for OuterJoin without using Union, LeftOuterJoin, or RightOuterJoin. Do it without the assumption that cn0 is a primary key for B, and then again when it is.

[1] Butler Lampson, et al, “Principles of Computer Science”, MIT Lecture Series, circa 1990.

Agile Data Structures


I’ve been working with a start-up that is going to need a very large database. It is an obvious cloud application, and is going to require a lot of modeling. Also, it shouldn’t be too surprising that every meeting we have changes the very nature of the database and the applications it will support. Now the Agile folks are definitely not surprised, and they claim to have just the answer: Agile everything.

OK, I like Agile philosophies (Cf. my post “Thoughts on Agile and Scrum”), but the practicality of constantly changing the schema of a huge database with many applications banging on it seems daunting, especially in a cloud environment. In addition, a start-up doesn’t have the luxury of simultaneously updating each application with every schema change; hence, I decided to see if some form of database refactoring could be designed into the system from the beginning. The goals would be not only to address continual specification and design changes, but more importantly to keep the individual application development teams on paths that allowed them to react to these changes naturally as their current work tasks (“sprints”) completed, and they had the opportunity to seriously contemplate these changes and how to address them.

I’ve already started plowing through many of Scott Ambler’s writings, cf.,, and He writes well, and it is as good a starting point as I can find.

Thus I’m going to make a series of blog entries as I think through the pros and cons of the Agile tools and data techniques out there – especially in the context of developing a large database on the cloud (or maybe I should say “on a cloud.”) Stay tuned.

Blackberry Outages


Blackberry Outages


On the heels of a Blackberry outage on December 22, 2010, several Blackberry Messenger bugs widely reported in June 2011, and the Blackberry outage in July 2011, comes a huge outage in October 2011.

The following quotes are taken from RIM’s UK service site.

Monday 10th October -15:00(BST)

We are currently working to resolve an issue impacting some of our BlackBerry customers in the Europe, Middle East and Africa region (EMEA.)

Tuesday 11th October -21:30(BST)

The messaging and browsing delays that some of you are still experiencing were caused by a core switch failure within RIM’s infrastructure. Although the system is designed to failover to a back-up switch, the failover did not function as previously tested. As a result, a large backlog of data was generated and we are now working to clear that backlog and restore normal service as quickly as possible.

Wednesday 12th October – 12.00 (BST)

We know that many of you are still experiencing service problems. The resolution of this service issue is our Number One priority right now and we are working night and day to restore all BlackBerry services to normal levels.

Thursday 13th October – 17.05 (BST)

As BlackBerry services continue to return to normal, some users may still be experiencing delays with messaging and browsing. In some cases this can be resolved by a full device reboot. To reboot your device, remove and then reinsert the battery.

OK, so here are some of my thoughts on this outage:

The stated root cause was a core switch failure coupled with a failover mechanism that did not work.  A third problem (it seems failures always come in quantities of three or more!) was that the system couldn’t handle the subsequent backlog of messages.  In other words, it didn’t have the capacity to handle a large number of essentially simultaneous messages.  Also, the system seemed to affect certain users in theUnited States.  Perhaps this was because of backlogs of messages to or from EMEA.  I couldn’t find an acknowledgement or explanation from RIM.  I also find the level of disclosure from RIM to be far below the quality and completeness of Amazon’s disclosures that I reported on earlier. The bottom line is that the outage lasted from Monday afternoon to Thursday evening, or a little more than 72 hours, and it affected many users in EMEA and some inNorth America.

Aside from the fact that a 72 hour outage spoils one’s MTTR and availability numbers, it appears that the client base was a bit upset.  This was not so much due to an outage, but it seemed to be due to a loss of confidence in RIM and its Blackberry system.  Here are a few quotes posted on

From Andrew Baker, Director, Service Operations,SWNCommunications Inc., posted onOct. 13, 2011,

“RIM has been having lots of PR problems and corporate customer problems of late.

“Based on my own personal sampling of several dozen customers who use RIM services, the sentiments generated by this set of outages are not good. There were a few hold-outs who were adamant that RIM would survive all the talk of doom and gloom, that are now looking to implement alternatives.

“This is at both the technical level and the executive level within these organizations.

Confidence has suffered considerably, and the timing for them could not be worse. And they totally botched the PR associated with this outage.

“They do not appear to have a sound strategy to deal with the many competitive challenges of their market, and they are poor communicates even in crisis. They failed to capitalize one of their core strengths, which was device security, and have undermined confidence in their other, which is their network.

“RIM is in the midst of a death spiral — the only question is how long it will take. Look for a number of highly publicized defections over the next few months, which will add fuel to the perception of their demise, and hasten it.”

From Patrick Adams, Director, Adduce:360: “In my experience the BB BES approach is still seen as the mainstream messaging solution for [corporations] – however, I do think that BB has a big challenge to maintain their position – irrespective of this recent outage.

From Adele Berenstein, Consultant and Trainer, Customer Satisfaction and Reputation Management: “I believe that RIM (the manufacturer of BB) has a unique position as the secure provider for email for corporations. Unless there is an alternative with equivalent functionality, corporations and governments will not have a choice but to forgive. I am sure the big corporations are putting pressure on RIM to fix their problems.

“When an alternative comes along, then RIM could conceivably loose their significant market share in the business and government markets.”

From Philip Sawyer, Managing Member, Voyage Media Group: “Honestly, I’m not sure RIM will ever get back on track. The well-established iPhone and Android devices have been steadily eating away at BB’s user base…and with Windows Phone rapidly gaining traction with their recent innovations, I think we’ll see another migration begin to take place, as die-hard business users begin to notice the growth of another trusted name in the corporate world (Microsoft) into the smartphone industry.”

And finally from Juan Barrera, “There is no way I am going back to BlackBerry, I waited over 5 years for them to come up with something remarkable, only saw excuses after excuses from its very-high-egos Co-CEOs. Now 100% Apple! at all levels.”

My personal opinion is that BB BES is a very nice product.  I loved it when I was using it, in spite of a few quality problems.  What is sad is to see RIM’s image erode with individual customers abandoning the BB, and with corporate customers looking for an alternative.  With competition on the horizon, RIM should have been focusing on cementing their customer base.  They clearly have not.  The moral here is that becoming lax on quality and not working seriously to protect availability can really ruin a company.


Amazon and Microsoft Cloud Outages in Europe


Amazon and Microsoft Cloud Outages in Europe.

Apparently both Amazon’s Elastic Compute Cloud (EC2) and Microsoft’s Business Productivity Online Suite (BPOS) house their cloud services near Dublin, Ireland.  I can personally attest that Dublin is a great place to be, especially if you don’t mind a lot of drizzle and rain.  If you are a business, you like Ireland because of the tax breaks, educated workforce, a cool climate to avoid excessive air conditioning of data centers, and good Internet connections to the mainland.

With the rain, at times, comes lightening, and apparently lightening knocked out both Amazon and Microsoft’s cloud facilities by knocking out a power substation nearby.  It appears that neither Amazon nor Microsoft had backup cloud facilities in Europe.  Thus their cloud customers, whose data must remain physically in Europe by law, were knocked out.

Amazon reported, “We understand at this point that a lighting strike hit a transformer from a utility provider to one of our Availability Zones in Dublin, sparking an explosion and fire,” Amazon wrote. “Normally, upon dropping the utility power provided by the transformer, electrical load would be seamlessly picked up by backup generators. The transient electric deviation caused by the explosion was large enough that it propagated to a portion of the phase control system that synchronizes the backup generator plant, disabling some of them. Power sources must be phase-synchronized before they can be brought online to load. Bringing these generators online required manual synchronization.”  According to Amazon, EC2 instances were being brought back within three hours of the lightening strike, and within about 12 hours 60 percent of instances had been restored.  However, “Due to the scale of the power disruption, a large number of EBS servers lost power and require manual operations before volumes can be restored,” said Amazon. “Restoring these volumes requires that we make an extra copy of all data, which has consumed most spare capacity and slowed our recovery process.”  This took several days to complete.

Microsoft reported, BPOS was also knocked offline for several hours by the lightning strike. According to Microsoft, European BPOS services were restored within four hours. In a statement, Microsoft said “a widespread power outage in Dublin caused connectivity issues.”

Well, some EC2 customers weren’t knocked out, in the case of Amazon.  Amazon had three “Availability Zones” (AZs) within the same facility.  An AZ is just another cloud, and a customer can design their cloud application to fail-over from one AZ to another.  Amazon must have designed their AZs in Dublin reasonably well, since NetFlix reported that they were able to execute a failover, because only one of the Dublin AZs was knocked out by the storm.

OK, so kudos for Amazon for having backup AZs there in Dublin that were resilient to minor storm damage.  (Amazon probably should have had better line conditioning so that the surge didn’t propagate to the other AZs.)  Small raspberries to Amazon for not explaining to their customers exactly how AZs worked in Dublin so that more customers could have taken advantage of them for “storm protection.”  On the other hand, big raspberries for Amazon for not having a geographically separate site, preferably far from Dublin.  Only with a second location would Amazon’s cloud (EC2) be protected from a devastating lightening strike, a major earthquake, or another disaster that would knock out the entire location.  I’ll note that a customer that takes advantage of say two AZ’s in Dublin, could easily move their backup AZ to another location whenever Amazon gets its head out of a dark place.  (To be fair, Amazon probably needs more European customers to make a second Ireland site cost effective.)

These outages weren’t the first for either Amazon nor for Microsoft.  It is clear that Cloud Services in general are still maturing.