|
YOUR FEEDBACK
Did you read today's front page stories & breaking news?
SYS-CON.TV |
TODAY'S TOP SOA & WEBSERVICES LINKS Exception Handling When Exceptions Are the Rule
Achieving reliable and traceable service-oriented architectures
By: Sean Fitts
Sep. 10, 2005 03:00 PM
Every now and then, an IT glitch makes national news. Just a few weeks ago, I read in the paper about an airline that mistakenly sold thousands of roundtrip tickets online at a fare of just a few dollars each. The airline lost hundreds of thousands of dollars from the mistake, though that's really just the tip of the iceberg. The National Institute of Software Technology (NIST) estimates that application errors cost the U.S. economy $59.5 billion per year. Because nearly 80 percent of such errors are discovered after applications have been put into production, exceptions also have a significant impact on the productivity and effectiveness of your IT staff and production support teams. And that's to say nothing of foregone revenue due to poor customer service.
Unlike my highly publicized airline example, the people who know about exceptions are usually limited to customers, IT teams, and line-of-business managers - and they typically find out about exceptions in that order. It's the customer or the end user of an SOA-based business application who is usually the first to witness the consequences of an exception. Common symptoms include opaque messages (such as "Sorry, unable to process request at this time") on Web sites. Such seemingly mild errors eventually translate into more disruptive business exceptions such as delayed orders, lost packages, rejected insurance claims, and so on. Due to their distributed and heterogeneous nature, services-based systems are inherently vulnerable to exceptions. Exceptions in SOA environments can be broadly categorized into three classes:
Any developer will tell you that 80 percent of the time required to diagnose an exception is spent simply trying to replicate the scenario. That's due to all the effort of searching through log files and iteratively recoding to add more information to the log. With distributed SOA, possibly stretched across geographies, this task is even more challenging. Managing exceptions has traditionally been an expensive, extremely manual effort performed by often dedicated application maintenance teams. Since their clues reside in multiple messages that span different services in the business application, exceptions in SOA systems are even harder to detect and diagnose. To begin with, the applications themselves are seldom instrumented to proactively alert on specific exceptions. At best, an application surfaces exceptions as incoherent entries in an error log, such as "Error 00021C: Transaction rejected." These exceptions might be uncovered during routine maintenance of the application. However, as noted earlier, it's the phone calls from vexed customers that usually make IT staff aware of exceptions. Business operations teams seldom come to know about exceptions until it is too late to respond. So what are you to do about it? You could just shrug your shoulders, accept that exceptions are going to happen, and hope you're not next week's headline news. Or you could look for ways to detect, diagnose, and remedy exceptions before they bring your business to a standstill. Let's explore ways to go about the latter option.
Managing Exceptions in Services-Based Systems Since it's impossible to anticipate all patterns, operations teams need to cast as wide a net as possible across their business systems to trap the maximum number of exceptions. As a fallback, they must be able to trace and record all distributed transactions and diagnose this data for the root causes of exceptions. IT and business teams must know about exceptions immediately. It might be important to alert one or more individuals across different teams based on the nature of the exception. For example, the error code is of interest to the IT staff, while the rejected purchase order and the customer details are important to business types. The notified personnel must then be able to quickly analyze the situation, understand its cause, and implement a cure. To accomplish this, they need to know not only the exception message pattern but also the context of the business transaction in which it occurred. IT operations must be able to diagnose and resolve the exception in minutes and seconds instead of days and hours. Similarly, business operations must be able to learn about exceptions in real time in order to formalize a resolution before customer service is affected. For some exceptions, the resolution is clear. In such cases it's important to resolve the exception in-flight by applying automated exception-handling actions.
Why Traditional Approaches Don't Work for SOA Debugging and testing practices aim to isolate and eliminate logical errors. Quality assurance teams spend countless hours putting the software through scripted production simulations. Then it's up to the consumers of the production systems to report any exceptions to the technical support organization. IT operations staff members depend on applications and system logs to diagnose problems reported by customers. Patches are applied to applications if problems are deemed severe. Additionally, Network Systems Management (NSM) software is used to isolate runtime failures in the hardware or in elements of the physical layer and to trigger alerts. SOA WORLD LATEST STORIES
SUBSCRIBE TO THE WORLD'S MOST POWERFUL NEWSLETTERS SUBSCRIBE TO OUR RSS FEEDS & GET YOUR SYS-CON NEWS LIVE!
|
SYS-CON FEATURED WHITEPAPERS MOST READ THIS WEEK |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||