Posts Tagged ‘NASDAQ’

Learning from Financial Trading Bugs


The commodities and securities trading exchanges provide challenging examples for cloud and big data application development. Their users are disparate traders world wide. They have user requirements for high trading volumes and for low latency. They utilize enormous amounts of storage, networking, and computer processing power. My IEEE Computer Society talk, here, discusses some of the technical features for such applications and for the hardware on which they run. Ordinary public cloud systems cannot currently address these needs, and perhaps they never will. On the other hand, those of us developing big data and/or cloud software applications can learn a lot by studying these “bleeding edge” applications, their bugs, and the consequences of such bugs.

Big Data pioneers such as Yahoo!, Linkedin, Facebook, Google, eBay, etc. have, of course, their own bugs that have economic consequences for both the companies and their customers. Larger service providers such as Amazon, Microsoft, GoDaddy, and Rackspace have outages that do serious damage to their customers. However, financial trading applications can cause millions of dollars of damage in just a few seconds, and the governmental oversight agencies eventually get involved. This has happened in a big way this year [1,2] with four incidents that seem to have galvanized these agencies into action:

  • On Feb 24, options market maker Ronin Capital injected more than 30,000 mispriced quotes into the NYSE Amex exchange.
  • On March 23, the BATS Exchange, handling its own IPO traffic on top of other traffic, crashed. (How embarrassing!) Among other losses, this caused a brief 9% price decline in Apple shares.
  • On May 18, the Facebook IPO had many orders stalled and not executed on the NASDAQ exchange. The Union Bank of Switzerland, alone, lost more than $350 Million, and curiously Knight Capital lost $35.4 Million in this incident.
  • On August 1, the Knight Capital Group lost $440 Million by flooding the NYSE with bad orders.

Since “You can’t know the players without a program…”, here is a brief cheat sheet of agency acronyms:

  • CFTC = Commodity Futures Trading Commission
  • FIA = Futures Industry Association
  • FIA-EPTA = European version of the FIA-PTG
  • FIA-PTG = FIA’s Principal Traders Group
  • FRB = Federal Reserve Bank
  • FSOC = Financial Stability Oversight Council (established by the Dodd-Frank Act)
  • IOSCO = International Organization of Securities Commission
  • MFA = Managed Funds Association (hedge funds)
  • SEC = Securities Exchange Commission

Of course numerous observers clamored for reform, e.g. [5,6,7,10] but the above agencies started to issue calls for action:

  • MFA requested of the SEC mandatory risk checks on all orders, new requirements on system testing, and a requirement for an individual with a “kill switch” to watch over all trading activity. (Imagine not trusting computer programs and wanting a human being to watch over automated trading!) [14]
  • The FIA PTG/EPTA issued its “Software Development and Change Management Recommendations”, March 2012. While both reasonable and comprehensive, there is nothing new in the report from an academic software development perspective. What is interesting is that they felt it was necessary to prepare it for financial application development. [13,14]
  • The FSOC made some vague recommendations in July 2012 that the SEC and the CFTC consider establishing error control and standards for exchanges, clearing houses, and other market participants that are relevant to high-speed trading. [11]
  • August 2, the FIA PTG make a “soft” statement to the SEC at their Roundtable noting that the 2005 regulations, designed to encourage market competition created “different safety controls” which now need “smart regulatory policies.” August 3, FIA PTG/EPTA issued a stronger statement on the “Knight Capital” problem, stating “Rapid advances in trading technology have brought very substantial benefits… but … they also have introduced new sources of risk.” They reiterated their earlier recommendations for “tests and controls” that trading firms should consider when they change their technology systems. [12, 13]
  • August 2012 The IOSCO issued a “Consultation Report” entitled “Technological Challenges to Effective Market Surveillance Issues and Regulatory Tools” which called for greater data collection for the purposes of surveillance of automatic or algorithmic trading of securities. [8] It refers to an earlier paper “Objectives and Principles of Securities Regulation” dated May 2003 that has 38 “principles” for such software development and regulation. Both papers are good reading. IOSCO further warns of the dangers of the then (and now) situation due to the neglect of these principles. [3]
  • October 1, 2012 the FRB of Chicago issued a report “How to keep markets safe in the era of high-speed trading” by Carol Clark. By interviewing various vendors, the author points out that there are a few places in the system where checks can and should be made. It makes solid recommendations on various risk limits, risk mitigation techniques, kill switches, position limits, and profit and loss limits. Good paper. [4]
  • October 4, 2012 The FIA PTG responded to the Chicago FRB’s report, supporting its recommendations. [15]
  • October 10, 2012 The FIA PTG/EPTG responded to IOSCO’s recommendations for market surveillance and audit trail quality, wanting more, especially, surveillance for illegal or inappropriate conduct which might be facilitated by automated trading. [3]

Wow! Four bugs caused all this commotion? Well, no. The noticeable problems were occurring prior to 2012 and also outside of the US. (Many of these are discussed in earlier posts.) There clearly was a welling up of (and I’m not sure this is the right word, but) anger.

So, besides just being new, what is wrong? Well, in high frequency trading, speed is king, and it would appear that no one wants to slow down their software by putting in audit trails that IOSCO recommends. Vendors force the regulators to read the code to audit their systems! Can you imagine how worthless that exercise is? No one seems to realize that such code additions would actually help test and debug their systems. Risk and profit/loss limits seem easy to implement, but again while it does slow down the system a little bit, the more likely reason is that such limits are an annoyance. Again regulation is needed.

Complexity is probably the number two reason for such bugs hitting. Here comes the argument that good testing won’t find all bugs. On the other hand, most of the bugs reported (or deduced) seem well within the current art of testing. I’ve seen no bugs reported that only occur on weird combinations of extreme data. In one case, the addition of new code activated some old “dead” code [14]. Both bugs (dead code and the new activation problem) could have easily been caught by reasonable testing. I’ve read about the now boring excuse of rushing new functionality to market for competitive reasons. Give me a break. With hundreds of millions of dollars at stake, shouldn’t the vendors be able to afford decent automated test suites? Properly done, such test suites make the development go faster! On the other hand, I’d hate to see government regulations on testing. It would be a case of the ignorant policing the ignorant. My guess is that the best government regulations would be to impose massive fines and to enforce total restoration of all money lost due to a bug.  Even with proper catastrophe insurance, this should be significant motivation for quality!

For sure, a desire for high performance with complex software, made more difficult by dealing with relatively new big data infrastructure, is a recipe for lots of bugs. While I’ll discuss big data and cloud application development in subsequent posts, my thinking here is simple: Invest at least as much in your testing and its automation as you do in writing your application. Follow the IOSCO principles by adding code for debugging and for auditing. It will pay for itself. Get audited. Audits probably won’t find anything, but your financial and legal consequences will probably be less severe should a bug rear its ugly head. Also, when high performance in networking and IO is desired, go with new hardware that has built-in measurement and time-stamping features. It this is not possible, then add such measurements to your software. Finally, do some sanity checks and reasonability calculations to make sure you are not doing something fundamentally wrong.






[4] “How to Keep Markets Safe in the Era of High-Speed Trading”, Carol Clark, 9/17/2012, Essays on Issues, The Federal Reserve Bank of Chicago, October 2012, Number 303.











[15] October 2012 News.


Crashes on Wall Street


It is of course interesting when cloud vendors have problems.  Cloud computing is like the 1849 Gold Rush to California.  If you are not heading there, you are at least talking about it.

Less discussed are recent trading crashes on “Wall Street”.  Maybe it is the distance of the moon to the earth, but there sure have been a noticeable number of such crashes.  Now Wall Street is a little more secretive than the big cloud vendors.  Maybe it is because there is less technical scrutiny on Wall Street, and they don’t perceive the advantages of openly discussing technical problems.

What prompts this post is an article in the business section of today’s LA Times about Knight Capital’s recent crash having its root cause a total lack of adequate quality assurance, but I’m getting ahead of myself …

Let’s start with the automated trading system BATS.  In a sense, BATS is a competitor of all the established stock exchanges, and is now the third largest stock exchange in the US.  It is totally automated, replacing floor traders with software.  It started in October 2008, and it was doing so well, that it decided to have an IPO – an entrepreneur’s dream exit – on March 23, 2012.  They, not surprisingly, picked themselves to be the exchange to list and sell their stock.  But a funny thing happened on the way to the market, just as their stock was about to trade, their system crashed!  Not just stopped, but many trades, including Apple’s stock that morning, were corrupted.  They finally pulled the plug, but the damage was done.  I assume they were able to mop up the corrupted trades, but their embarrassment was so great that they withdrew from the planned public offering. (BATS recently announced they will try the IPO route again.) I tried hard to find the root cause, but secrecy prevailed.  They did release a statement that they rated their system as 99.9% available.  They had had a few crashes prior to the March debacle, but that wasn’t warning enough. Such a low availability rating is inexcusable for a stock trading system, and it appeared to me that their testing was just totally inadequate.

Imagine how unhappy all the Facebook investors were when the NASDAQ IPO software couldn’t process all the buy requests in the early minutes of the Facebook IPO.  NASDAQ claimed they spent “thousands of hours” running “hundreds of scenarios” for their testing.  Again I haven’t seen a technical root cause analysis for the NASDAQ problems.

I’m going to skip over the JP Morgan billion dollar loss, but it was due to a rogue trader executing more than risky trades.  I will, however, wonder aloud why adequate reporting software didn’t exist to flag such a series of risky trades.

Public exchanges are not the only sources of computer bugs.  AXA Rosenberg had to pay $217 million to pay for investor losses due to a “significant error” in its software systems.  It also paid a $25 million penalty to regulators for … guess what … hiding the error!  “The secretive structure and lack of oversight of quantitative investment models, as this case demonstrates, cannot be used to conceal errors and betray investors.”

Not to totally pick on the US, and to give this blog a bit of an international flair, consider Madrid’s Bolsas y Mercados Espanoles, which suffered a four hour outage when communication servers crashed.  What, no redundancy?  No duplicate Internet connections?  Sigh…  This breakdown affected two multilateral trading platforms operated by NYSE Euronext: Smartpool and NYSE Arca Europa where orders could be submitted but not traded.  Note that a bug in one exchange can affect other exchanges!

The Tokyo Stock Exchange just had its second major problem in the last year.  The root cause was reported to be a router going down, and the failover to a backup router also failed.  First kudos to the managing vendor Hitachi for disclosing the problems.  Second, the key lesson to learn is to test your failover mechanisms!  BUT, it took 95 minutes for the on-site staff to diagnose the problem and to affect a manual failover.  Way too long.  The lesson here is that to manage Mean Time To Repair (MTTR), training and practice is essential.  Good diagnostic software might also have helped identify the problem faster.

Now, back to the US and Knight Capital:  In less than an hour, trades that were supposed to be spread out over days, were executed essentially one after the next.  The result was a $440 million dollar loss for the firm.  Had not investors led by Jefferies Group, Ltd.  provided $400 million, Knight Capital would have gone under.

Now comes today’s LA Times article, by-lined Bloomberg News.  It appears that Knight Capital installed some new software designed to interface to the NYSE’s new retail liquidity program for small investors.  The “Law of Untended Consequences” bit them.  The installation of this new software somehow activated previously dormant software, which started multiplying trades by 1000.  What a bug!  ANY KIND OF TESTING would have discovered such a huge bug.

There is a theme here.  Software on financial exchanges executes trades at dizzying speeds.  A bug can very quickly cause millions of dollars in bad trades.  I’m a little shocked that this software is obviously not tested adequately.


References and interesting links:

The costliest bug ever:

How software updates are destroying Wall Street (Bloomberg and Businessweek):

Two Years After the Flash Crash (of 2010), Are Markets any Safer?   [I strongly recommend this article!]

SEC judgement on AXA Rosenberg Entities: