Designing IT Architectures for Real Time

By CIOinsight  |  Posted 11-10-2002

Designing IT Architectures for Real Time

The information backbone of your company is hardly as efficient as it could be. W. Roy Schulte, Gartner Research vice president and research fellow, Application Integration and Emerging Technologies, tells CIO Insight how to design enterprise architectures around events as they happen, for savings in time and money.

CIO Insight: What is your definition of a real-time enterprise?

Schulte: It's not something that's a slogan, it's not something that you can do just by telling people to work fast. If you want a real-time enterprise, change your business systems, change your organization chart, change your business processes and also implement a computer system that's driven by events.

All enterprises are partially event-driven, but there's no such thing as an enterprise that should be 100 percent event-driven. Every enterprise should be a proper mix. What you want to do is make sure that you're using the mix that's optimum to get the business results you want.

You're saying that most businesses need to be rewired to keep pace with the accelerating speed of business today?

The pace of business is accelerating. It's a plain fact of life. Something that might take 20 minutes to trickle across the company in the past is now done in a matter of seconds. Consider building a PC, which used to take an average of six weeks. At Dell Computer, they now take the PC order, they build it, they ship it, and they deduct the money from your credit-card account—all in 24 hours. All of this has to do with increasing the velocity of the business processes.

Now, to get this kind of speed, we have to rethink how we design computer systems. One of the most powerful ways that we can do this is to apply a so-called "concept of events." Concept of events goes back to understanding what data looks like in general.

We would say that there are three kinds of data in the world—reference data, state data and event data. Reference data is stuff that doesn't change very much, such as my name, my address, how many kids I have, the number of seats on an airplane.

State data is different. It changes in the course of business. My name doesn't change every day, but I go through many states. I go through sleeping, the state of sleep and the state of being awake, the state of being hungry, the state of being full. A lot of different states. The location of an airplane, the balance of my bank account, all of these things change, and what you're doing if you're in the information systems business, you're in the business of maintaining reference data for some purposes, but mostly what your systems are doing is changing the state data.

Event data influence the big changes. An event means that something has happened. A business event is a meaningful change in the state of a business. Although events can change the nature of state data and can change reference data, it's usually state data that gets most influenced by events. Examples of business events include opening a new account, submitting an order, changing an address, making a payment, delivering a shipment to a loading dock and so forth. Those are all business events.

Now, for those of you who are programmers, you know that within a program you also use the concept of events. If you click a mouse on an icon, an event goes to a programming side, and then the program will react to it and do something. These are not business events. These are technical events. Technical events in software help you implement business events. So there's a strategy side of events and there's a technology side of events.

The business events that I'm talking about—such as submitting an order, changing an address, making a payment and so forth—these business events are handled by every business everywhere, even if they don't have computers.

However, when you say an enterprise is event-driven, you mean something else. You mean that the systems internally are also built on an event basis. So the business event, like submitting an order, is going to be handled technically on a design basis and maybe even on a software basis as an event, as something that is captured by an action that is coming in from the outside.

Think business strategy. Traditional business strategy is all about building to a plan. In an event-driven enterprise, you build to order. So if you have a traditional enterprise, you're building to plan. You're saying, "I'm going to build 50 cars today, and the cars are going to be the following." If you're building to order, you're saying, "I'm not going to build anything today unless an order comes in, and if the order comes in, then I'm going to build that specific car." So you're not building ahead. Now, build-to-order takes a lot more agility. Your computer systems have to be running differently than if you're building to plan.

Another example would be just-in-time inventory. Traditional inventory systems a long time ago would have not been just-in-time, they would have been planned. You'd know, for example, that every day you've got to order 100 tons of steel to be delivered, and that it's all supposed to come here. Just-in-time says, "I'm going to change the orders and probably am not going to order until I get down to a very low amount. I'm going to have very small inventories, and have them in various locations, and maybe very small inventory carrying costs. But I can only do that if I'm monitoring what's happening on a real-time basis or close to a real-time basis, and I've got very good information collection and very good information dissemination."

Another example of event-driven business at the business level is to fix-as-it-fails instead of doing preventative maintenance. Common sense says you do preventative maintenance. You send a person in to change the light bulbs at a given time period because you know that statistically you can predict that light bulbs are going to start to burn out after X number of hours. You think that by doing this, you're saving money because you're saving people's time, and the people's time you're saving outweighs the cost of the extra light bulbs. So you're throwing away working light bulbs, but it's worth it to you because that way you don't have to reschedule the people and you don't have to send people in on demand.

Well, we're changing that today. Many businesses now are switching over to a fix-it-as-it-fails mode. If you have something that's big enough and worth it, now you monitor the device, whether it's a piece of equipment or a financial system, for example. You're monitoring it for signs of failure. If you have sensors on something, you can tell when something is starting to fail.

Now, you don't have to do preventative maintenance. You can get every ounce of economic life out of something, and if you see it start to fail, if you have a good enough information collection system, you can go in and fix it—but not before you have to. So here's a case where you're changing the entire way you organize maintenance activities, just like you changed the way you build-to-order instead of building-to-plan. Why can you change the way you do things? Now you have information correlated in ways you didn't have in the past. You can be smarter now about things because you know more.

Future businesses will be doing a little less planning and forecasting and a lot more acting based on actually knowing when events will occur. Now, we can measure it and track it, we can see what's happening and respond to what's actually there instead of what we think statistically might be there.

Design Requirements

Design Requirements

What implications does this have for the CIO and system designers?

To design this into an IT architecture, you have to do some additional things to your computer systems. One of the attributes of a process that's event-driven is that you have to have the recipient ready and available to go when that new information comes in.

So if I'm event-driven, it does not help me to get information if I'm not going to react to it. So in an event-driven process, the person or the computer system that's supposed to do that work has to be available or they have to be working on something that can be interrupted so when the event data comes in, your company can start responding immediately.

This is all part of what it takes on a business level to get things to go fast. You omit unnecessary steps by redesigning the business process, and reduce the start-up time to start each task. You try to shorten each step as much as you can. You combine multiple steps into one step, you do steps in parallel instead of serially, and the final thing you do is you try to offload as much work as you can onto the person or thing that's sending the stuff to you or the person or thing you're sending stuff to.

And if you do all the steps, then you speed up that business process. Systems, computer systems and business systems that are designed to be event-driven have to be designed with a specific focus on events. So this is a different design philosophy than what you're doing today, or at least what most CIOs are doing today.

Traditional computer systems keep the idea of an event within that application system—capturing the event like an order entry system would. There is an event there that any order entry system is going to say, "Aha! I recognize that an event has occurred, that order has occurred, and I know about it. And within that system, I've defined what an event is and I do something to it—I put it in a database, I process something. Maybe I'll throw something to a transaction file, and at the end of the day I'm going to send those transactions to somebody else."

In an event-driven system, you're thinking of events across the enterprise, or at least across a wider scope. You're agreeing with different business units and different application systems and different systems analysts on what the definition of that business event is. What are the attributes of the business event? If you agree on it, you're surfacing the idea of an event to a much more prominent place in the application design process.

There's a generation of software that's emerged on the marketplace whose job it is to notify and alert people or systems about derived facts and events and alert people and other machines. These alerts are being made, and they're based on thresholds. For example, I don't want to know if the airplane is going to be 15 minutes late—but I do want to know if it's going to be 20 minutes late. Or, I don't care if the stock price is going to hit such and such, but I do care when it hits another level. So you set thresholds to have this alerting take place in a way that's most useful and desired.

Notification isn't simple anymore. Now it can be done to a person through their browser if it's between 9 a.m. and 5 p.m. If it's during the evening, maybe you want the system to call them on the phone or maybe you want to send a message to their mobile device, or maybe you want to page them.

So there are systems that handle the automatic escalation of alerts. So you don't write that software, you buy software that handles the alert notification, and you buy the software so it'll escalate properly. It'll go look at a directory as to who's supposed to be notified about what facts. It'll also escalate in cases where you have a problem. The system can be told to look for an acknowledgement.

So if you don't get an answer back from the person saying "I got the message," then the system has got to be smart enough to say, "Okay, well, who's the backup destination here, who else should I send that notification to?" These systems can be very sophisticated, very powerful, so if you have something that really, really matters for that small part of your business, you want this kind of a system to be able to make sure that an event gets delivered to people so the proper corrective action can be taken. And, again, this can be done to people. It doesn't just have to be to systems.

Event-based systems have an implication for software. The software that's good at pushing information is different than the software that's used for pulling information.

To have event-driven systems, we're going to start using message-oriented middleware on a much broader basis than we're using today. There's a lot of different kinds of message oriented middleware, but to work best for real time, it has to be software that's designed with the following characteristics: First, you have to have scalability because you may have dozens or hundreds or thousands of senders, and you may have dozens of hundreds or thousands of receivers for that one particular fact or event. So in some cases we're talking about transmitting events the same way radio and television works, broadcasting the information to a large base.

In many cases, you want exactly one delivery. If I buy 100 shares of stock, I want that transaction delivered once, not twice because I don't want to buy 200 shares. So you have to have software where the quality of service is built into the system. Again, that's not something that most communication mechanisms have today on a technical level. You have to add it in the application or you have to buy message-oriented middleware.

This dynamic reconfiguration—the ability to add, delete or move senders or receivers—is not a property of most systems, and the fundamental reason why it's not part of most technical systems is because most systems are connection-oriented. In most computer communication, the sender and receiver know who each other is. They know down to the TC/PIP address, they know down to the process base, and so forth. There are direct links between the systems, and you need some sort of intermediary to be able to allow these things like dynamic configuration to take place.

You'd also like a system that can change the sender or receiver's view of the data without having to change the side. So what you'd like is a translation capability, so something that is sent here arrives in a form that's different when it arrives, such as a different format.

Then, of course, you want store and forward capabilities because you can't guarantee that System A and System B are up at the same time. With traditional computer systems, most of the traffic that happens inside a computer is very tightly coupled. This side and this side better be alive and running at the same time or else the transfer is not going to take place. And if I try to send a message or ask a question, if the system isn't running, I'm out of luck. The bits fall on the floor, and the communication has stopped. With store-and-forward, it doesn't happen. With store-and-forward, something in the middle holds it in a queue, in a temporary database so that it can get there.

Additionally, what I call publish-and-subscribe is also very helpful for event-driven systems. Publish-and-subscribe says that the subscribers define what kind of information they can get, and they notify a central authority—for example, message-oriented middleware or some other mechanism. What kind of information am I interested in? Where are the criteria for stuff I want to hear about?

The publishers, on the other hand, don't have to know anything about the receivers. They don't have to know who they are, they don't have to know how many there are, they don't even have to know if they're there. The publisher creates information and throws it over the transom, and the middleware in the center is the one that reconciles this, it takes the messages, it figures out the subscription criteria and rules, and sends it where it's supposed to go. So it's kind of like a magazine subscription but not exactly. In a magazine subscription, the writers write the stuff, they put it in a magazine, the magazine distributor has the distribution list for that information.

Further, publish-and-subscribe is many-to-many. You can have many different people sending that kind of message, many different people receiving that kind of message, and that can change dynamically during the day. Every second, you can add more senders and add more receivers. It's a very powerful communication mechanism. If you don't have it, some kinds of event-driven processing are not possible. So you'll be doing more publish-and-subscribe, again both on a business level and on a technical level to make the event driven enterprise actually work.

Picking Products

Picking Products

Are there products that help drive some of this?

Traditionally, there were I would say three fairly distinct kinds of personalities to these products. There were some products that were meant for the very extreme situations, lots of senders, lots of receivers—we're talking tens of thousands of messages a second in some cases, and many hundreds or more of senders and many thousands of destinations.

We also have general-purpose, information system-type message queuing systems of which the most widely known is IBM's MQ Series, now called WebSphere MQ. You also have products that are of the same general genre, like Microsoft's SMSQ. There were a dozen others. This was a whole set of vendors that came out in the late '80s and early '90s. Most of those systems have disappeared. There's been a lot of market consolidation here, a lot of it due to IBM strength in terms of being able to promote its product across many different platforms.

Last but not least, we have the newer generation of middleware, which is the JMS-type messaging systems. New messaging systems that are based on the Java standard can be implemented a lot of different ways, and they're implemented by a lot of different vendors. They're also implemented as a layer sitting on top of other products. Java message service is not a specification about how you do messaging internally, but it is a standard that describes the behavior of a particular type of messaging system.

These different kinds of products have actually come together a lot in the last several years. In the past, most of them did not do publish-and-subscribe. Now they all do, or almost all do.

So if you looked on paper and just did a check list of features, they'd all have a check, but if you actually then asked how well they worked, you'll still see some fairly significant differences in the personality of these products in terms of their scalability, their security, how good they are at being easily managed and how hard they are to manage.

A lot has been said about the next level of automation at companies and how it will create, in effect, an "enterprise nervous system."

The enterprise nervous system is the idea of the intelligent network. It says, "We've got these smart application systems, there's logic and there's data in those application systems, but now we're going to put some logic and data outside of the applications,"—you can say "in the network"—"and that makes us an enterprise nervous system."

So we're doing transformation that the network has some process smarts, and it has some semantic smarts because it can do the transformation. Well, the backbone of this in many cases is going to be some sort of message-oriented middleware. More of the traffic is going to flow over that than anything else.

So we're going to see message-oriented middleware being used within the enterprise, and by extension we're going to see it across enterprises in this worldwide grid. If you have every enterprise building its own enterprise nervous system, and every business is tied in inside of itself plus through its business partners, through a smarter network that can do these things, you'll step back and you'll say, "Wow, they're all connected to each other."

And what we essentially have is one network for the entire planet, and every network you see is just a sub-net of this worldwide network. And then the sub-networks are smart, and essentially what you're going to get is a network grid across the entire world that's a smart grid.

The products that have the kind of qualities that we were talking about here, this message-oriented middleware, are changing. In fact, we would say in some cases they're going to be fading because they're morphing into something much different. Plain message-oriented middleware came out as a commercial product in the late 1980s. They spread because they were imbedded into application systems and they were hidden under things like your network and system management tools.

Around 1996, when we started using them for integration purposes, we started making them a little bit more visible. When they were added to the JAVA application servers in about 2000, they became ubiquitous. Suddenly, anybody who was just buying an AP server was getting a messaging system.

They may not be using it in every case, but they were getting it. So you can buy it in an application server or you can buy a specialist product that's dedicated just to doing messaging exceptionally well.

In the future, these systems are changing their nature. The name of the game is messaging systems and the Web servicing systems coming together to creating what we're calling these enterprise service buses. So companies like Cape Clear, Sonic Software, SpiritSoft, these companies are coming up with products that are meant to talk Web services so you can be a client or a server of a Web service and talk to these communication mechanisms.

But the quality of service they had is much beyond plain soap. In addition to playing request reply activities, these products are also capable of store-and-forward, they're capable of publish-and-subscribe, they're capable of the kind of communication patterns you need on a technical level to implement those business strategies you wanted. So we expect to see these services spread across many companies to help enable the event-driven enterprise.

At the moment, enterprise services are coming from some very small vendors, but we expect over the next 12 to 15 months that major vendors will start coming out with these products. So it's going to be a Web services system, it's going to be a messaging system, it's also going to have some of the basics that you need for application integration. As these products come out, they will compete against some of the traditional integration middleware.