Archive for the ‘Cloud Computing’ Category



ONVIF – formally known as the Open Network Video Interface Forum

In the early 2000’s, my company Bristol Systems Inc. got into IP cameras and access control HID security cards as part of our comprehensive security program for our customers. Sadly, ONVIF had not yet been formed.

ONVIF was formed in 2008 by Axis Communications, Bosch Security Systems, and Sony. The video security market at the time was formed of companies who made and/or sold video cameras and recorders. In the worst case, each pair of such devices had proprietary programming interfaces and proprietary protocols for communication between the cameras and recorders. This was an interconnect nightmare for customers who might want to add a camera to system with a recorder or wanted to update their recorder. The idea of ONVIF was to standardize communication APIs and protocols between these devices, in order to permit interoperability independent of vendor, and to be totally open to all companies and organizations in this space. Its goals, beyond interoperability, are flexibility, future-proofing (your camera will continue to work in a heterogeneous system even if its manufacturer goes belly-up), and consistent quality.

The forum has now dropped the longer name as its standards have expanded beyond video, for example, to storage and to access control. It is now known simply as ONVIF.

The ONVIF standards are comprised of a core standard and several additional technical standards called “profiles”. All ONIVF conformant devices must conform to the core standard and to one or more profiles. One can think of the profiles as groups of features. This grouping provides some sanity in this market: if a vendor decides a particular profile is necessary or desirable then this vendor must implement all of the (mandatory) features of the profile. A device that only implements some of one profile and some of another cannot be ONVIF compliant.

The Core Specification 2.5 (December 2014) is rather comprehensive. This spec is around 150 pages and includes device and system management, web services, framework for event and error handling, security, ports, services, device remote discovery (necessary for plug and play interoperability), and encryption for transport level security. It includes data formats for streaming and storing video, audio, and metadata. It also includes a wide variety of service specifications, e.g., access control, analytics, imaging, pan-tilt-zoom, recording control, replay control, etc. It uses IETF, and other networking standards.

The current profiles are identified by the letters: S, C, G, Q, A, and T. Thus we have Profile S, Profile C, Profile G, Profile Q, Profile A, and (draft) Profile T. To remember which is which, I use:

  • S = “streaming” for sending video and audio to or from a Profile S client or device. Basic camera controls.
  • C = “control” for basic access control such as door state and control, credential management, and event handling
  • G = “gigabyte” for storage, recording, search and retrieval
  • Q = “quick” for quick installation, device discovery and configuration
  • A = “additional access” for more on access control, configuration of the physical access control system, access rules, credentials, and schedules
  • T = “ tampering” for compression, imaging, and alarms for motion and tampering detection.

In each profile, support for a feature is mandatory, if any aspect of that feature is supported by the device, otherwise it is conditional. For example, Profile S specifies compliance requirements for pan-tilt-zoom, which would be conditional on whether the camera supported aspect of pan-tilt-zoom. In which case, the camera would have to support all Profile S features of pan-tilt-zoom. If the camera does not support pan-tilt-zoom, then it can still be Profile S compliant.

In future posts, I’ll write about selecting a video (and audio) security system for my home, and about integrating my system into the neighborhood watch, which is a heterogeneous collection of security systems. In particular, exactly how the various profiles come into equipment decisions will be detailed in depth.




A friend of mine, The Patent King, pointed out to me that the recent court decisions on patents are going to change what software can be patented. This is both a forward and a backward statement. In fact, all of the court cases are backward looking cases, and the Patent Office in its consideration of future patents will be forward looking. These new considerations are collectively called “Alice” primarily after:

  • Alice: Alice Corp. Pty. Ltd. v. CLS Bank Int’l (2014)

but quite a few other court cases come into play. The references below, all of which I found enlightening, cite such cases.

The technology issue is: What software is patentable? The two-step answer starts simply enough. Step 1: The claim must be directed to a process, machine, manufacture, or composition of matter. This is not new. Typically software patents are directed to processes or to machines, and this post will focus on these.

New is Step 2: You are almost out of luck if your claim is directed to a law of nature, a natural phenomenon, or an abstract idea; however, Alice provides some wiggle room for dealing with these “judicial exceptions.” Your claim must identify the exception and must state explicitly how your invention, as a whole, amounts to significantly more than the exception.

Of course the trick is to satisfy “significantly more”. This is similar to the Potter Stewart test for hard core pornography, “I know it when I see it.” As technologists interested in an issued or a future patent, we must work with our patent attorneys to review as many similar cases as we can and make our arguments accordingly.

The rest of this post considers some interesting exceptions mostly of type “abstract ideas”. These include mathematical formulas, mitigating risk (hedging), using advertising as currency, processing information through a clearing house, authentication, organizing information, formulas for updating alarm limits, comparing new and stored information to identify options for action, etc. The “et cetera” means there is no end to the types of abstract ideas.

Returning to the Alice case itself, the patent was about a computer system that acted as an intermediary to maintain and adjust account balances to satisfy business obligations and settlement risk. Since this is a long standing commercial practice, and the patent cites this abstract idea, it is a judicial exception. However, viewing the claim as a whole, it failed to add significantly to the abstract idea. In other words, just crunching the numbers on a computer system is not patentable.

The Ultramercial patent 7,346,545 (the “545 patent”) provided an interesting case. The patent concerned an eleven step process whereby a consumer receives copyrighted material in exchange for viewing an advertisement. Ultramercial sued Hulu, YouTube, and WildTangent for patent infringement. This case bounced around the courts, but after Alice, it was determined that each of the eleven steps as a whole merely implemented the abstract idea of using ads for currency and did not add significantly more to this abstract concept. The 545 patent was ultimately declared invalid.

The case Bilski v. Kappos (2010) concerned Bilski’s patent on hedging to mitigate settlement risk. This patent was deemed too broad and covered well-known practices in a comprehensive way. Fundamentally, one cannot patent an invention that covers an entire abstract and well-known idea.

Mayo Collaborative services v. Prometheus Labs. Inc. (2012) provides an example where an action (raising or lowering the amount of a drug administered) was taken based on a blood test (for metabolites). The action would be normal for any well informed doctor. This case actually falls under the law of nature exception, but the principle applies elsewhere. If all your software does is automate what a trained practitioner does normally, then it is not patentable.

Ancora Technologies, Inc. v. Apple, Inc. is interesting and is not yet resolved by the Supreme Court. Ancora’s invention was to put authentication software in the flash reserved for the BIOS. This would make it more difficult for a hacker to get around the authentication check. Ancora sued Apple for infringement of their patent 6,411,941 (the “941 patent”). If it is accepted that authentication checks are abstract ideas, then is putting such a check in the BIOS flash significantly more than other implementations of this abstract idea? If putting such a check on a hard disk is not patentable, then why should putting such a check in the BIOS flash be patentable? Is the method of putting the check in the BIOS flash and not screwing up the BIOS a patentable significant extension of the abstract idea? Apple has appealed to the Supreme Court.

There are some interesting ramifications of Alice to the cloud, data analytics, and cyber-security worlds. Look for future posts on these topics.

Recommended Reading:

Go Ask Alice – Delightful paper by Berkeley law professor Joe Merges

Patent Eligibility in the Wake of Alice – Nice analysis by Berkowitz and Schaffner

Summary of Ancora v. Apple – by IP firm Sughru Mion

Apple appeals Ancora Ruling – News flash from Law360

USPTO 2014 Interim Alice Training – Very good slide-set tutorial

Learning from Financial Trading Bugs


The commodities and securities trading exchanges provide challenging examples for cloud and big data application development. Their users are disparate traders world wide. They have user requirements for high trading volumes and for low latency. They utilize enormous amounts of storage, networking, and computer processing power. My IEEE Computer Society talk, here, discusses some of the technical features for such applications and for the hardware on which they run. Ordinary public cloud systems cannot currently address these needs, and perhaps they never will. On the other hand, those of us developing big data and/or cloud software applications can learn a lot by studying these “bleeding edge” applications, their bugs, and the consequences of such bugs.

Big Data pioneers such as Yahoo!, Linkedin, Facebook, Google, eBay, etc. have, of course, their own bugs that have economic consequences for both the companies and their customers. Larger service providers such as Amazon, Microsoft, GoDaddy, and Rackspace have outages that do serious damage to their customers. However, financial trading applications can cause millions of dollars of damage in just a few seconds, and the governmental oversight agencies eventually get involved. This has happened in a big way this year [1,2] with four incidents that seem to have galvanized these agencies into action:

  • On Feb 24, options market maker Ronin Capital injected more than 30,000 mispriced quotes into the NYSE Amex exchange.
  • On March 23, the BATS Exchange, handling its own IPO traffic on top of other traffic, crashed. (How embarrassing!) Among other losses, this caused a brief 9% price decline in Apple shares.
  • On May 18, the Facebook IPO had many orders stalled and not executed on the NASDAQ exchange. The Union Bank of Switzerland, alone, lost more than $350 Million, and curiously Knight Capital lost $35.4 Million in this incident.
  • On August 1, the Knight Capital Group lost $440 Million by flooding the NYSE with bad orders.

Since “You can’t know the players without a program…”, here is a brief cheat sheet of agency acronyms:

  • CFTC = Commodity Futures Trading Commission
  • FIA = Futures Industry Association
  • FIA-EPTA = European version of the FIA-PTG
  • FIA-PTG = FIA’s Principal Traders Group
  • FRB = Federal Reserve Bank
  • FSOC = Financial Stability Oversight Council (established by the Dodd-Frank Act)
  • IOSCO = International Organization of Securities Commission
  • MFA = Managed Funds Association (hedge funds)
  • SEC = Securities Exchange Commission

Of course numerous observers clamored for reform, e.g. [5,6,7,10] but the above agencies started to issue calls for action:

  • MFA requested of the SEC mandatory risk checks on all orders, new requirements on system testing, and a requirement for an individual with a “kill switch” to watch over all trading activity. (Imagine not trusting computer programs and wanting a human being to watch over automated trading!) [14]
  • The FIA PTG/EPTA issued its “Software Development and Change Management Recommendations”, March 2012. While both reasonable and comprehensive, there is nothing new in the report from an academic software development perspective. What is interesting is that they felt it was necessary to prepare it for financial application development. [13,14]
  • The FSOC made some vague recommendations in July 2012 that the SEC and the CFTC consider establishing error control and standards for exchanges, clearing houses, and other market participants that are relevant to high-speed trading. [11]
  • August 2, the FIA PTG make a “soft” statement to the SEC at their Roundtable noting that the 2005 regulations, designed to encourage market competition created “different safety controls” which now need “smart regulatory policies.” August 3, FIA PTG/EPTA issued a stronger statement on the “Knight Capital” problem, stating “Rapid advances in trading technology have brought very substantial benefits… but … they also have introduced new sources of risk.” They reiterated their earlier recommendations for “tests and controls” that trading firms should consider when they change their technology systems. [12, 13]
  • August 2012 The IOSCO issued a “Consultation Report” entitled “Technological Challenges to Effective Market Surveillance Issues and Regulatory Tools” which called for greater data collection for the purposes of surveillance of automatic or algorithmic trading of securities. [8] It refers to an earlier paper “Objectives and Principles of Securities Regulation” dated May 2003 that has 38 “principles” for such software development and regulation. Both papers are good reading. IOSCO further warns of the dangers of the then (and now) situation due to the neglect of these principles. [3]
  • October 1, 2012 the FRB of Chicago issued a report “How to keep markets safe in the era of high-speed trading” by Carol Clark. By interviewing various vendors, the author points out that there are a few places in the system where checks can and should be made. It makes solid recommendations on various risk limits, risk mitigation techniques, kill switches, position limits, and profit and loss limits. Good paper. [4]
  • October 4, 2012 The FIA PTG responded to the Chicago FRB’s report, supporting its recommendations. [15]
  • October 10, 2012 The FIA PTG/EPTG responded to IOSCO’s recommendations for market surveillance and audit trail quality, wanting more, especially, surveillance for illegal or inappropriate conduct which might be facilitated by automated trading. [3]

Wow! Four bugs caused all this commotion? Well, no. The noticeable problems were occurring prior to 2012 and also outside of the US. (Many of these are discussed in earlier posts.) There clearly was a welling up of (and I’m not sure this is the right word, but) anger.

So, besides just being new, what is wrong? Well, in high frequency trading, speed is king, and it would appear that no one wants to slow down their software by putting in audit trails that IOSCO recommends. Vendors force the regulators to read the code to audit their systems! Can you imagine how worthless that exercise is? No one seems to realize that such code additions would actually help test and debug their systems. Risk and profit/loss limits seem easy to implement, but again while it does slow down the system a little bit, the more likely reason is that such limits are an annoyance. Again regulation is needed.

Complexity is probably the number two reason for such bugs hitting. Here comes the argument that good testing won’t find all bugs. On the other hand, most of the bugs reported (or deduced) seem well within the current art of testing. I’ve seen no bugs reported that only occur on weird combinations of extreme data. In one case, the addition of new code activated some old “dead” code [14]. Both bugs (dead code and the new activation problem) could have easily been caught by reasonable testing. I’ve read about the now boring excuse of rushing new functionality to market for competitive reasons. Give me a break. With hundreds of millions of dollars at stake, shouldn’t the vendors be able to afford decent automated test suites? Properly done, such test suites make the development go faster! On the other hand, I’d hate to see government regulations on testing. It would be a case of the ignorant policing the ignorant. My guess is that the best government regulations would be to impose massive fines and to enforce total restoration of all money lost due to a bug.  Even with proper catastrophe insurance, this should be significant motivation for quality!

For sure, a desire for high performance with complex software, made more difficult by dealing with relatively new big data infrastructure, is a recipe for lots of bugs. While I’ll discuss big data and cloud application development in subsequent posts, my thinking here is simple: Invest at least as much in your testing and its automation as you do in writing your application. Follow the IOSCO principles by adding code for debugging and for auditing. It will pay for itself. Get audited. Audits probably won’t find anything, but your financial and legal consequences will probably be less severe should a bug rear its ugly head. Also, when high performance in networking and IO is desired, go with new hardware that has built-in measurement and time-stamping features. It this is not possible, then add such measurements to your software. Finally, do some sanity checks and reasonability calculations to make sure you are not doing something fundamentally wrong.






[4] “How to Keep Markets Safe in the Era of High-Speed Trading”, Carol Clark, 9/17/2012, Essays on Issues, The Federal Reserve Bank of Chicago, October 2012, Number 303.











[15] October 2012 News.

IEEE Computer Society Talk 20120924 Big Data, Clouds, and Low Latency


The topics of Big Data and/or Cloud Computing have become way too big to cover in a one hour talk. Choosing  a slice through these topics was a challenge, but putting a Low Latency spin on it seems to open up a formally taboo slice.

After an introduction to Big Data, Clouds, and Hadoop, the talk discusses low latency and basically concludes that if you need low latency, you probably need a private cloud.

That said, it appears that there are now people in the Hadoop infrastructure that are interested in low latency, and it will certainly be interesting to see how that shakes out.  The optimization of keeping MapReduce processes in memory for subsequent execution on new data was added by Google in the early days of MapReduce.  This helps, but doesn’t do everything, and it certainly doesn’t address the need for special hardware (switches, firewalls, fiber optic boosters and repeaters, motherboard paths to IO, caching, and of course nanosecond clocks.)

The bleeding edge folks for low latency in the cloud are the electronic high frequency securities traders.  Where a millisecond advantage is worth 100’s of millions of dollars, no expense is spared on equipment or software.  These traders have achieved mind-blowing results.

The talk also examines the new Open Compute server designs sponsored initially by Facebook.  Unfortunately, Facebook has few needs for low latency.  As with Google, Yahoo, et al, the only low latency need is at the user interface where the users expect “instant” results from their queries.  Even the version 2 Open Compute designs seem a little lacking; although the AMD design is probably better than that of Intel in this arena.

The slides for the talk are here.


How hard is 99.999% availability? The GoDaddy outages.


On June 22, 2010 GoDaddy suffered a four hour outage.  On June 14, 2011 they were down for three hours, and recently on Sept 10, 2012 their DNS service was down for 6 hours.

What is their availability rating for these three years, assuming no other outages?  The calculation is easy. Three years equals 26,298 hours.  They were down 13 hours, and up 26,285 hours.  Their availability is 26285/26298 = 99.95%.  or about three nines.

Now GoDaddy promises its DNS customers 99.999% availability.  If we just focus on their DNS service, with its recent 6 hour outage, how many years, counting 2012, will it take for 100% DNS up-time, to recover to this five nines goal?  Well, there are 8766 hours per year, so we need to solve (8766*n – 6)/(8766*n) = 0.99999.  The answer is a staggering n = 68.45 years!

Well, there is mathematics and there is marketing. After 10 years of perfect up-time, I’m sure any company would advertise “Ten years of perfect up-time” and sweep the previous outages under the rug.  This might even be reasonable from a practical point of view, at least after ten years.  That said, with the cloud vendors and even the infrastructure vendors struggling to get over three nines, how does a company that needs five nines make its decisions?

The answer of course is to never put all you eggs in one basket.  In the era of immature cloud computing, it will be a challenge, and this challenge will be expensive.

Crashes on Wall Street


It is of course interesting when cloud vendors have problems.  Cloud computing is like the 1849 Gold Rush to California.  If you are not heading there, you are at least talking about it.

Less discussed are recent trading crashes on “Wall Street”.  Maybe it is the distance of the moon to the earth, but there sure have been a noticeable number of such crashes.  Now Wall Street is a little more secretive than the big cloud vendors.  Maybe it is because there is less technical scrutiny on Wall Street, and they don’t perceive the advantages of openly discussing technical problems.

What prompts this post is an article in the business section of today’s LA Times about Knight Capital’s recent crash having its root cause a total lack of adequate quality assurance, but I’m getting ahead of myself …

Let’s start with the automated trading system BATS.  In a sense, BATS is a competitor of all the established stock exchanges, and is now the third largest stock exchange in the US.  It is totally automated, replacing floor traders with software.  It started in October 2008, and it was doing so well, that it decided to have an IPO – an entrepreneur’s dream exit – on March 23, 2012.  They, not surprisingly, picked themselves to be the exchange to list and sell their stock.  But a funny thing happened on the way to the market, just as their stock was about to trade, their system crashed!  Not just stopped, but many trades, including Apple’s stock that morning, were corrupted.  They finally pulled the plug, but the damage was done.  I assume they were able to mop up the corrupted trades, but their embarrassment was so great that they withdrew from the planned public offering. (BATS recently announced they will try the IPO route again.) I tried hard to find the root cause, but secrecy prevailed.  They did release a statement that they rated their system as 99.9% available.  They had had a few crashes prior to the March debacle, but that wasn’t warning enough. Such a low availability rating is inexcusable for a stock trading system, and it appeared to me that their testing was just totally inadequate.

Imagine how unhappy all the Facebook investors were when the NASDAQ IPO software couldn’t process all the buy requests in the early minutes of the Facebook IPO.  NASDAQ claimed they spent “thousands of hours” running “hundreds of scenarios” for their testing.  Again I haven’t seen a technical root cause analysis for the NASDAQ problems.

I’m going to skip over the JP Morgan billion dollar loss, but it was due to a rogue trader executing more than risky trades.  I will, however, wonder aloud why adequate reporting software didn’t exist to flag such a series of risky trades.

Public exchanges are not the only sources of computer bugs.  AXA Rosenberg had to pay $217 million to pay for investor losses due to a “significant error” in its software systems.  It also paid a $25 million penalty to regulators for … guess what … hiding the error!  “The secretive structure and lack of oversight of quantitative investment models, as this case demonstrates, cannot be used to conceal errors and betray investors.”

Not to totally pick on the US, and to give this blog a bit of an international flair, consider Madrid’s Bolsas y Mercados Espanoles, which suffered a four hour outage when communication servers crashed.  What, no redundancy?  No duplicate Internet connections?  Sigh…  This breakdown affected two multilateral trading platforms operated by NYSE Euronext: Smartpool and NYSE Arca Europa where orders could be submitted but not traded.  Note that a bug in one exchange can affect other exchanges!

The Tokyo Stock Exchange just had its second major problem in the last year.  The root cause was reported to be a router going down, and the failover to a backup router also failed.  First kudos to the managing vendor Hitachi for disclosing the problems.  Second, the key lesson to learn is to test your failover mechanisms!  BUT, it took 95 minutes for the on-site staff to diagnose the problem and to affect a manual failover.  Way too long.  The lesson here is that to manage Mean Time To Repair (MTTR), training and practice is essential.  Good diagnostic software might also have helped identify the problem faster.

Now, back to the US and Knight Capital:  In less than an hour, trades that were supposed to be spread out over days, were executed essentially one after the next.  The result was a $440 million dollar loss for the firm.  Had not investors led by Jefferies Group, Ltd.  provided $400 million, Knight Capital would have gone under.

Now comes today’s LA Times article, by-lined Bloomberg News.  It appears that Knight Capital installed some new software designed to interface to the NYSE’s new retail liquidity program for small investors.  The “Law of Untended Consequences” bit them.  The installation of this new software somehow activated previously dormant software, which started multiplying trades by 1000.  What a bug!  ANY KIND OF TESTING would have discovered such a huge bug.

There is a theme here.  Software on financial exchanges executes trades at dizzying speeds.  A bug can very quickly cause millions of dollars in bad trades.  I’m a little shocked that this software is obviously not tested adequately.


References and interesting links:

The costliest bug ever:

How software updates are destroying Wall Street (Bloomberg and Businessweek):

Two Years After the Flash Crash (of 2010), Are Markets any Safer?   [I strongly recommend this article!]

SEC judgement on AXA Rosenberg Entities:

Lessons from More Microsoft Azure and Amazon AWS Outages


It’s been awhile since this blog has discussed cloud outages (for example: quick link).  The recent reports of Microsoft and Amazon outages gives one pause to contemplate.  First, cloud outages are facts of life, and they in no way should deter anyone from embracing cloud technology – public, private, or hybrid.  On the other hand, anyone adopting any technology should have some deep thoughts about how to deal with the inevitable failure that this technology will bring to your operation.

First, what has happened?  Microsoft Azure has recently has two outages, and Amazon AWS has had one.

Let’s start with an Azure outage on Feb 29, 2012 where a leap year bug took out VM services for 13 hours and 23 minutes.  Apparently, when a VM was initiated on Feb 29, 2012 for a year, it was given a certificate stating that it was valid until Feb 29, 2013, which is an illegal date.  This initialization failed with an erroneous interpretation that the server itself had failed, and an attempt to initialize this VM on another physical server was attempted.  This next attempt failed for the same reasons, and  you can see that further attempts created a real mess.  Note that this mess was caused by TWO bugs, not just the leap year bug.

OK, so it took 13 hours and 23 minutes to patch this bug on all but seven Azure clusters. Those clusters were in the middle of a different upgrade.  What to do?  Microsoft’s effort bombed.  They attempted to roll back and patch, but they failed to  revert to an earlier version of a network plug-in that configures a VM’s network.  The new plug-in was incompatible with the older, patched, host and guest agents, and all VMs in these 7 clusters were immediately disconnected from the network!  Cleaning up this new mess took until 2:15 AM the next day.  The total lack of full functionality lasted over 26 hours.

What to fix?  Clearly the sophomoric leap year bug was fixed along with some testing for date/time incompatibilities among software components.  Fixed also was the problem of declaring an entire server bad, when just a VM had problems.  Finally, Microsoft intelligently added graceful degradation to VM management by blocking new VMs or extending old ones, instead of rashly shutting down the entire platform due to a small problem.

Because their customer service lines were swamped this sad leap year day, Microsoft also upgraded its error detection software to detect problems faster, and it upgraded its customer dashboard to improve its availability in the presence of system problems. Outage notification via Twitter and Facebook has now been at least partially implemented.

Next, on June 14, Amazon’s AWS center in Virginia experienced severe storms and consequently the failure of back-up generators (See [8] for a frank  and excellent root-cause analysis that includes bugs found and plans to improve backup power and to fix the bugs.)  Related power-related problems took down portions of the data center. Multiple services and some hosted web sites were down for several hours.  It was reported [6] that this was a “Once in a lifetime storm”, but it took down Netflix, Instagram, and Pinterest.  Once power was restored, Amazon went to work restoring “instances” (running jobs) and storage volumes.  Amazon also reported unusually high error rates for awhile, the cause of which will probably not be known.  Amazon calculated [7] their outage to be 5 hours and 20 minutes.

Next, on July 26, 2012 11:09 AM, Microsoft announced [3] “an availability issue” for Windows Azure in the West Europe sub-region. At 1:33 PM, they announced that this issue was resolved.  This was an outage of 2 hours and 24 minutes, although Microsoft totaled the outage as having duration 3.5 hours [4].  Apparently storage and running applications were not affected.  As of this post writing, I can find no root cause analysis that has been published by Microsoft.

OK, what can we learn from these (and other) outages?   First cloud technology is new, and even the most experienced pioneer, Amazon, has problems.  Second, putting all of one’s computational eggs in one cloud vendor’s data center basket is not going to give you a five nines system, and you may well lose important data.

The obvious, and expensive, solution is to duplicate your cloud based systems across multiple geographies and to have a fail-over strategy from your primary system location to your secondary location.  I’ve seen people recommend the use of two different cloud vendors, but I find it hard to believe that the pain and cost of two different vendors are worth it.  The disaster data seem to indicate low probability of systemic and instantaneous errors occurring across a vendor’s entire set of data centers.  (Although Amazon’s EC2 and EBS failures in April 2011 did affect two “Availability Zones.” )  In addition, while cloud vendors have at best skimpy data center fail-over services, you might as well use what they have.  It is interesting that [9, 10] Adrian Cockroft of Netflix argues for using three availability zones (presumably in different geographic locations) with no extra (live) instances.

What about private clouds?  Well, they are great, and they provide improved availability.  On the other hand, they are just as susceptible to disasters as regular private data centers.  The good news, is that fail-over to a public cloud may give reduced performance, but it may be a cost-effective business continuity strategy.  This fail-over to a public cloud may also take a long time to “spin up”, because public cloud vendors take a long time (20 to 40 minutes) to instantiate a job as big as a private cloud, even if the data are all on its file system and the capacity is “reserved.”

An alternative, of course, is to duplicate your private cloud at a second geographic location.  The spin up time would be much less. Private clouds are usually local in order to provide high bandwidth and low latency to the clients. This performance advantage would be lost at a remote location, but the business continuity may be worth this degradation in network performance until the local private cloud is up again.

All data center fail-over mechanisms require  a reasonably continuous backup stream to the remote data site and the ability to launch the private cloud system at the remote site.  Those transactions that didn’t get recorded at the remote site will probably be lost, and a restart (from a checkpoint) mechanism is essential.

Let’s analyze availability ratings for Azure and AWS.  Recall, the availability of a system is the percent of time the system is fully functional.   Microsoft’s Azure was not fully functional, due to the two outages discussed here, for 29.5 hours.  If this was the only time the system was not fully functional (most likely there would be other shorter and unreported partial outages) for the year, then Azure would have no greater than a 99.66% availability rating.  For Amazon, the loss of full functionality in 2011 (see  quick link) was several days although within 12 hours 60 percent of the instances were restored.  Let’s estimate 48 hours of not full functionality in 2011 and just 5.3 hours (so far) 48 in 2012. This would average 99.7% availability (less if I didn’t use zero downtime for the rest of 2012.)

Now some could argue that these outages didn’t affect all their datacenters, and that the availability ratings  – based on say 10 data centers – should be an order of magnitude better.  Well, ok, this would rate them both at three nines not two nines and change.  But the point is that this is VERY FAR from five nines.  The conclusion is again that depending on only a single data centers represents very low availability even before one factors in bugs and other problems in the customer’s system.

Finally, I note that many of the large cloud vendor’s problems are due to the fact that they are large and hence complex.  A private cloud would well have less complexity, have a simple redundancy story, and hence a higher availability rating.  Sadly, I haven’t seen such an example.







[6]  (This site has (or had) an apparent AWS ad with the cute caption “Cloudy with a chance of fail.”  Clicking on it led to )





Big Data, Cloud Computing, and Hadoop


I gave a talk to the IEEE on June 23, 2012 on Consulting Opportunities Using Apache Hadoop.   As appropriate for the IEEE, the talk went over big data and the hardware and data centers being used to process big data.  We discussed Google’s failure data for clusters of machines with 100’s to 1000’s of machines.  We went over Google’s File System, and the Hadoop open source analog HDFS, and then we plunged into several examples and techniques for writing applications that utilized the MapReduce infrastructure.

Throughout the talk, we discussed many consulting opportunities in this relatively new, fast growing, space.

The slides are here:  IEEE Gayn Winters Hadoop Slides 20120623

Please add questions and suggestions for a future talk/seminar.




– – I wrote out this specification many years ago, partially to practice writing formal specifications, and partly to get clear definitions of the basic operations on tables in a relational database management system (RDBMS). Too many times, I see vague definitions and a couple of examples.  I wanted very precise definitions.  Somehow I can’t find those original notes, but since I was looking at Hadoop and Hbase, I decided to redo them.

– – The notation used is a variant of one developed by Butler Lampson, who taught it to me when we were both at Digital Equipment Corporation. It is simple and intuitive, and it is defined using simple set notation. Types have both state and operations (functions and procedures) defined on them to form a class. The parameters N and V in RDBMS(N, V) are classes with implied operators. Think of N as “names” which have an order operation and a concatenation operation. V is a class of “values” which can be thought of as a union of names and numbers. There is an order operation defined on V. where the numbers sort before the names.  Other interpretations of V are possible.

– – The notation X → Y denotes partial functions from X to Y, i.e. a function that is not necessarily defined for all x in X. If f: X → Y is a partial function, then f!x for x in X means that f is defined at x, and the value is then denoted f(x) as usual. We think of f in two ways: First f is a set of pairs (x, y) such that If (x,y1) In f And (x,y2) In f Then y1=y2. The second way is to think of X = {x1, x2, …xn} so that (f(x1), f(x2), … f(xn)) is a point in n-space. This is particularly fruitful when X is a set of names and the n axes are labeled with the names in X. This second way of thinking will be used in this spec for a RDBMS.

– – The notation VAR x: X defines a state variable of type X. This construct can be followed with a “:=” to initialize x, or it can be followed with a vertical bar “|” preceding a condition. One reads the latter as “Choose an x such that condition is true.” No method to construct such an x need be given, and all construction options are open to the implementor. Details are discussed in [1].


ColumnNames = CN = N

Row = Tuple = CN → V

– – think of a row or tuple as an n-tuple whose components are labeled by column names. Geometrically, a tuple is a point in n-space, and it is sometimes helpful to think geometrically about such points. On the other hand, thinking of this n-tuple as a row vector, and soon as a row of a matrix is also fruitful. Note that both of these interpretations needs an order defined on the column names for visualization. This order is not part of the formal definition. When dealing with multiple partial functions r: CN → V , it is useful to think of their domain as large enough to include the domains of all rows of interest, in which case, a particular r may not be defined on a particular column name cn. One often says then that r is “Null” at cn.


Table = Relation = Set(Row) = Set(Tuple)

– – There are two ways to think of a table (= relation). The first is as a set of points in n-space, and the second is as a matrix whose columns have column-names, and whose rows may or may not have names. There is no defined order of columns or of rows. Thus the first of these interpretations needs to order the columns for visualization of points in n-space, and the second also needs to order the rows to obtain a matrix that one can write down and/or print. Entries in this matrix where the row partial function is not defined at a particular column name are sometimes said to have the Null value; although in these notes Null is not otherwise defined and is definitely not an element of V. A Null represents a lack of a defined value. Note that since this definition of Table is a set, a table cannot have identical rows. If this is desired for identical rows r and r’, then an additional column-name cn needs to be added with r(cn) not equal to r'(cn). With the image of a table T as a matrix T with orders on the column names c1, …, cn and rows r1, … rm, one often identifies the jth column name with the jth column whose value in row ri is tij = ri(cj).

Function domain(T: Table) Returns Set(CN) = Return (Union {r.domain| r In T})

Notation, we write T.domain for domain(T)

Function Projection(C: Set(CN), T: Table) Returns Table =
VAR T’ : Table := { }
For r In T Do
..VAR r’: Row := { }
..For cn In C Do If r!cn Then r'(cn) := r(cn) Done
..If r’ = { } Then Skip – – get a new r
..T’ += r’
End – – Projection

– – Here are three interpretations: 1. If one views T as a set of points in n-space, and C is a subset of the axes, then T’ is the projection of T onto the orthogonal space whose axes are named by C. 2. Viewing T as a matrix, then Projection(C,T) is just the matrix whose columns are the subset of those of T that are defined by C and with any all-Null rows omitted. 3. In SQL, if cn1, cn2, … cnn are the column names in C then one writes SELECT cn1, cn2, … cnn FROM T.


Predicate = Function(Row) Returns Boolean

– – A predicate p(r) is usually an equality or inequality expression involving the values of r.

Function Subset(T: Table, p: Predicate) Returns Table =
VAR T’: Table := { }
For r In T Do If p(r) Then T’ += r Done
End – – Subset

– – As above there are three interpretations: 1. If the r In T are viewed as points in n-space, and p defines the cloud of “true” points, then T’ is the set of points inside this cloud. 2. Viewing T as a matrix, T’ is the subset of rows r of T for which p(r) is true. 3. In SQL one writes SELECT * From T WHERE p(r). The * indicates to select all columns.

– – Most SELECT statements involving only one table are a projection of a subset, written
SELECT cn1, cn2, … cnn FROM T WHERE p(r). In other words, select the rows where p is true and then select some columns from the result.

Function Union (A: Table, B: Table) Returns Table =
VAR T: Table := {r: Row | r In A Or r In B}
End – – Union

– – If A and B are sets of points in n-space, then Union(A, B) is their set union. If A and B are thought of as matrices with the same column names, then Union(A, B) is the matrix whose rows are the union of the rows in A with the rows in B. If A and B have differences in their column names, then one first takes the union of their domains, extends each row of A and B with Nulls for the column names on which the row is not defined, and then takes the union of the rows. An example of Union is given in the discussion of OuterJoin below.

Function Product(A: Table, B: Table) Returns Table =
VAR AN: Set(CN) := { | cn In A.domain}, BN: Set(CN) := { | cn In B.domain}
VAR T: Table := { }
For (r,s) In (A,B) Do
..VAR t: AN + BN → V := { } – – Note the pair (r,s) defines a new row t.
..For In AN Do If r!cn Then t( := r(cn) Done
..For In BN Do If s!cn Then t( := s(cn) Done
..T += t – – add this newly defined row t to T
End – – Product

– – The row t of Product(A, B) defined above is often written t = (r, s).

– – First note that AN + BN is the (disjoint) union of the domains of tables A and B. Thus the product table T has the original column names of table A, prefixed with “a.” plus the original column names of table B, prefixed by “b.” Now since (A,B) is the Cartesian product = all pairs (r,s) with r In A and s In B, if table A has m rows and table B has n rows, then table T will have m*n rows.

– – While it blows some people’s minds, there is a geometric interpretation. If we view A as a set of points in m-space and B as a set of points in a (different) n-space, then Product(A, B) is the Cartesian product of these two sets of points.
– – In practice, while conceptually appealing, Product(A,B) is rarely formed. It is just too big. The operation that most frequently is used is called a Join. Most Joins of two tables A and B can be defined as projections of subsets of Product(A, B). Now a Join isn’t computed by first computing the product, but rather it is computed more directly, and the database folks have done a lot of work optimizing the calculation of Joins. Here we will just define them. At this point, I’ll note that Wikipedia has a very nice article on Joins at . In particular, the author of that article has a simple example, used below, that seems to illustrate all the relevant concepts.

– – The simplest, and the most common Join is defined by a single column name cn that is common both to A and to B. The subset predicate is for t = (r, s) in Product(A, B), p(r,s) is true if and only if t( = t(, i.e., r(cn) = s(cn).


A = Employee table



Rafferty 31
Jones 33
Steinberg 33
Robinson 34
Smith 34

B = Department table



31 Sales
33 Engineering
34 Clerical
35 Marketing

Note the John has not been assigned a DepartmentID, and Marketing (DepartmentID = 35) has no one in it. Since A has 6 rows and B has 4 rows, the Product(A, B) has 24 rows. Since A has 2 columns and B has 2 columns, Product(A, B) has 2+2 = 4 columns. It is a modestly interesting exercise to write out this 24 by 4 matrix.

If cn = DepartmentID, then the Join above is





Rafferty 31 31 Sales
Jones 33 33 Engineering
Steinberg 33 33 Engineering
Robinson 34 34 Clerical
Smith 34 34 Clerical

The reason John and Marketing do not appear in this result is that the predicate will not return True when evaluating Nulls.

The SQL for the above “Inner Join” has two forms. The first has an explicit Join operator and the second has an implicit Join:

SELECT * FROM Employee INNER JOIN Department ON a.DepartmentID = b.DepartmentID;


SELECT * FROM Employee, Department WHERE a.DepartmentID = b.DepartmentID;

A longer form of both uses “Employee” for “a” and “Department” for “b”.

Sometimes, the redundant column b.DepartmentID is removed by projection, and the remaining columns are renamed to yield:




Rafferty 31 Sales
Jones 33 Engineering
Steinberg 33 Engineering
Robinson 34 Clerical
Smith 34 Clerical

The SQL/92 for this SELECT * FROM Employee, Department USING DepartmentID.

Unfortunately not all RDBMS systems support the USING clause.

Exercise 0: Write out the spec for

Function InnerJoin(A: Table, B: Table, cn0) Returns Table

directly, i.e. without using Subset and Product. This code is very similar to that of Product.

Definition: A subset of columns C of a table is a primary key if for every r In T

  1. r!cn for every cn In C
  2. For every s In T, If r(cn) = s(cn) for every cn in C, Then r = s.

The column DepartmentID is a primary key for the table Department in the above examples.

Left outer join

The result of a left outer join for table A and B always contains all records of the “left” table (A), even if the join-condition does not find any matching record in the “right” table (B). This means that if the ON clause matches 0 (zero) records in B, the join will still return a row in the result—but with NULL in each column from B. This means that a left outer join returns all the values from the left table, plus matched values from the right table (or NULL in case of no matching join predicate).

Given the requirement that even if the Join cannot find a matching row in B for a row r in A, the row r remains part of the result, with Nulls filling all the columns representing B column names, this result cannot be part of Product(A, B). It follows that we must revisit and modify the code for the Function Product (or InnerJoin, if you did Exercise 0) in order to define LeftOuterJoin.

– – When cn0 is a primary key for B we can easily define/construct efficiently:

Function LeftOuterJoin(A: Table, B: Table, cn0) Returns Table =
Assert cn0 is a primary key for B
VAR AN: Set(CN) := { | cn In A.domain}, BN: Set(CN) := { | cn In B.domain}
VAR T: Table := { }
For r In A Do
..VAR t: AN + BN → V := { } – – Note each row r in A defines a new row t in the Join.
..For In AN Do If r!cn Then t( := r(cn) Done – – this loads up the side of t.
..If r!cn0 Then
….VAR s In B | s(cn0) = r(cn0) – – such s is unique since cn0 is a primary key for B
….For In BN Do If s!cn Then t( := s(cn) Done – – this loads the side of t.
..T += t – – add this newly defined row t to T, even if no matches occur in B
End – – LeftOuterJoin

– – Even when cn0 is NOT a primary key for B we can define a RightOuterJoin(A, B, cn0) which will add the rows of B for which there is no matching (on cn0) row in A.

Function RightOuterJoin(A: Table, B: Table, cn0) Returns Table =
VAR AN: Set(CN) := { | cn In A.domain}, BN: Set(CN) := { | cn In B.domain}
VAR T: Table := { }
VAR t: AN + BN → V
VAR foundNoMatchInA: Boolean
For s In B Do
..foundNoMatchInA := True
..For r In A Do
….If r!cn0 Then
….t := { }
….For In BN Do If s!cn Then t( := s(cn) Done – – loads side of t
….For In AN Do If r!cn Then t( := r(cn) Done – – loads side of t.
….T += t
….foundNoMatchInA := False
….Done – – since cn0 is not a primary key, look for more matching rows r.
..If foundNoMatchInA Then – – Add this row s anyway
….t := { }
….For In BN Do If s!cn Then t( := s(cn) Done – – t is Null on all
….T += t
End – – RightOuterJoin

There is no standard SQL for Outer Joins, but each vendor modifies their SQL to include it, e.g.


a.DepartmentID = b.DepartmentID;

Example of a LeftOuterJoin: 





Jones 33 Engineering 33
Rafferty 31 Sales 31
Robinson 34 Clerical 34
Smith 34 Clerical 34
Steinberg 33 Engineering 33

Example of RightOuterJoin which drops the information about John having no department, but adds the information that Marketing has no people:





Smith 34 Clerical 34
Jones 33 Engineering 33
Robinson 34 Clerical 34
Steinberg 33 Engineering 33
Rafferty 31 Sales 31
NULL NULL Marketing 35

When cn0 is NOT a primary key, on can still define a LeftOuterJoin'(A, B, cn0) to be equal to RightOuterJoin(B, A, cn0). Instead of getting exactly the number of rows of A for the join, one can get more.

Function OuterJoin(A: Table, B: Table, cn0) Returns Table =


Return(Union(LeftOuterJoin(A, B, cn0), RightOuterJoin(A, B, cn0)))

End – – OuterJoin

Example of OuterJoin(A, B) where the information contains both the fact that John has no department as well as Marketing having no people:





Smith 34 Clerical 34
Jones 33 Engineering 33
Robinson 34 Clerical 34
Steinberg 33 Engineering 33
Rafferty 31 Sales 31
NULL NULL Marketing 35

Exercises that give clues for implementation:

  1. Write out the code for LeftOuterJoin’ (the version that doesn’t assume cn0 is a primary key) without using RightOuterJoin.
  2. Write out the code for OuterJoin without using Union, LeftOuterJoin, or RightOuterJoin. Do it without the assumption that cn0 is a primary key for B, and then again when it is.

[1] Butler Lampson, et al, “Principles of Computer Science”, MIT Lecture Series, circa 1990.

Agile Data Structures


I’ve been working with a start-up that is going to need a very large database. It is an obvious cloud application, and is going to require a lot of modeling. Also, it shouldn’t be too surprising that every meeting we have changes the very nature of the database and the applications it will support. Now the Agile folks are definitely not surprised, and they claim to have just the answer: Agile everything.

OK, I like Agile philosophies (Cf. my post “Thoughts on Agile and Scrum”), but the practicality of constantly changing the schema of a huge database with many applications banging on it seems daunting, especially in a cloud environment. In addition, a start-up doesn’t have the luxury of simultaneously updating each application with every schema change; hence, I decided to see if some form of database refactoring could be designed into the system from the beginning. The goals would be not only to address continual specification and design changes, but more importantly to keep the individual application development teams on paths that allowed them to react to these changes naturally as their current work tasks (“sprints”) completed, and they had the opportunity to seriously contemplate these changes and how to address them.

I’ve already started plowing through many of Scott Ambler’s writings, cf.,, and He writes well, and it is as good a starting point as I can find.

Thus I’m going to make a series of blog entries as I think through the pros and cons of the Agile tools and data techniques out there – especially in the context of developing a large database on the cloud (or maybe I should say “on a cloud.”) Stay tuned.