Best Practices for AI in Digital Transformation

The IoT Community – Internet of Things Community, was thrilled and honored to host Dr Kristof Kloeckner, Retired IBM CTO, at the IoT Slam Live 2019 event last month. Dr. Kloeckner delivered one of the most compelling presentations on IoT and AI you are likely going to see. Here is the full transcription of the keynote. We invite you to sign up for complete IoT Slam Live 2019 On-Demand Access here: 

Session Abstract:

Service-based industries are under intense pressure to improve client interactions and automate service delivery. AI provides powerful tools to transform data into insights and to manage enterprise knowledge bases. Introducing AI into digital transformation initiatives requires a holistic approach, with focus on culture, organization and processes. In this presentation, we are going to discuss experiences from using AI in the transformation of the IT services lifecycle that can be applied to other industries as well.

Full transcript of presentation.


I’m delighted to be here and to follow such engaging speakers. This morning, we heard about the importance of IoT. We heard about the importance of AI for IoT.

What I want to do in the next half hour, is talk about some best practices for AI in digital transformation that we learned when trying to transform the IT services life cycle.

I’m going to talk about my experience as the CTO of IBM Global Technology Services, a large IT Infrastructure Services provider, and address not just the technology aspects, but also what it takes to really get the use of AI accepted by the technical, the professional, the executive, the business community and the end users and what it takes to bring lasting transformation results.

While my experience is in IT services, I believe it can easily be translated to other services businesses and certainly to other IoT environments. After all, IT is an IoT Environment. It consists of connected, intelligent devices, very complex environments, very large environments, especially in service providers that have to cater to thousands of distinctive enterprise clients.

First, I am going to give a short overview about applying AI to the IT Services life cycle. Then I’m going to talk about how we applied lessons learned from other large strategic transformations to take a holistic and agile approach not just to technology, but also to communications, organization and skills.

In the end, this is all about establishing a data driven culture. Utilize the data that you amass to derive actionable insights that drive automation. This transforms the way the organization works and interacts with its clients and its end users,

I also want to address a very important problem that I believe always gets underestimated. AI augments human experts, and it is trained by human experts.

So, unless you foster knowledge communities, unless you essentially turn yourself into a knowledge based enterprise that takes the knowledge life cycle seriously, you may be successful with the first implementation of AI, but it will quickly turn stale and it won’t get lasting acceptance.

So just a very brief overview over the IT services life cycle, typically very large and complex environments. I mentioned service providers with hundreds of thousands, even millions, of connected devices, tens of millions of events per month.

And that is really the crux.. You have a huge volume problem because out of those tens of millions of events, of which only about 10% result in something that you need to take action on. Right? So that’s the reactive side. That’s sense and response.

The other side, of course, is: Can you get into predicting potential problems and build better systems based on these insights? And again, I think that applies to almost any IoT scenario.

The last element is these environments change all the time, and the requirements on these environments change all the time.

Remember when you talk about digital transformation, you are talking about an enterprise delivering its services in a digital fashion, so it relies more and more with its whole business model and with its business reputation on availability and performance of a digital infrastructure. In other words, an IT infrastructure.

And yet again, this becomes so complex that an individual or even a group of individuals cannot really combine all the necessary knowledge to build, maintain and support these systems. We are dealing with fragmented knowledge and very noisy data, because the situation where a device creates an event with well structured information, is not necessarily the norm.

So if you don’t actually deal with the data in a responsible, thoughtful way early on, you’re going to run into problems.

I can’t overemphasize the importance of data science in this context. You have to start with the data, right?

There is an old and well used phrase that data is the new oil. I’ve heard an expression building on this, that AI is the new electricity.

You have to turn oil into something useful.
And electricity drives machinery, drives devices, drives everything.

So, this complexity and the accelerating business cycles I talked about , make management extremely difficult.

And again, that brings you to the point where you have to automate.

You have to automate because humans can’t react quickly enough to 10,000,000 events a month. How many people do you need to handle this?

More importantly, humans being dragged into doing rote, almost trivial work over and over again saps energy from these humans and prevents them from addressing proactively the more important problems.

What we’re looking at here also is when I talk about automation, yet another instance of industrialization. So we’re actually talking about industrialization of IT.

And I think it’s very important to understand that you cannot automate unless you’ve simplified and standardized before. That in itself is a huge cultural shift.

And that, quite frankly, is something that cloud does for us or helps us realize and helps us implement.

I was the first CTO of cloud at IBM, and the most visionary CEOs that I talked to in the early days were essentially saying, Look, Cloud is interesting technology, but what it really allows me to do is drive the economics of standardization and simplification, right?

So that’s the backdrop to all of this.

That’s also the backdrop off any kind of feasible IoT.

Ultimately, what this means is moving from a manual implementation in a totally people based model to a technology based business model that augments the decision capabilities of people.

So when you think about the big use cases of applying AI to IT services there are essentially three major areas, three major challenges.

One is a big data problem, a volume problem.

One is a complexity problem,

and one is a human-machine, human-system kind of interaction problem.

So if you take it from the top, I was talking a lot about large volumes of unstructured and noisy data.

In IT, the standard currency is actually the problem ticket.

So you have to understand
Firstly, what category does this ticket fall into?
There’s a lot of variation in there that is actually not very important, that you have to filter out.
And secondly, do you have an automation that could handle it, and you have to decide that on the fly. Of course, automation doesn’t just happen. You have to write automation, so you need to have something that guides you where automation makes sense. We used AI and deep analytics to tell us what really makes sense to automate.

And interestingly enough, automation also then allows you to standardize on the best possible response; when you have a person responding they may have, in the best case, some procedure written down, but it may not be optimal.

When you automate, you define an optimized standard operating procedure that allows you to arrive at an optimal standard responses. Reliable responses, predictable responses. That’s a major transformative factor for service quality.

Incident management, automation, root cause analysis, going back through historical data risk prediction. We found very interesting counter intuitive results by looking at risks of changes.

Everybody knows that most critical incidents happen because some change goes wrong, right?

But, what organizations typically do to classify risk changes is entirely inadequate.

We found going through historical data that very frequently, at least in our environment, the truly risky changes were classified as low risk and incidents happened based on these supposedly low risk things.

All right, so detecting patterns and large volumes of unstructured and noisy data is a typical example for AI, in the industrial sector as well.

Next, understanding bodies of complex documents.

Well, human knowledge, organizational knowledge is usually captured and written down somewhere. You can get lucky and have a disciplined architect team that writes very good architectural documents. But don’t rely on that.

And don’t rely on meaningful root cause analysis documents and so on and so on.

You need a lot of preparation of these large bodies of complex documents, and just providing the right kind of information to subject matter experts at the right time, in a sense, creates a virtual buddy for this person that can share with them assembled knowledge of their professional community.
I think many of the problems that we are talking about at this conference are similar.
For instance, applying AI to the medical profession falls into the same space.
AI can give you a virtual buddy or colleague, a second opinion.

AI can flag problems that you yourself then have to make a decision on,

So just give you two examples.

We built a tool that pulls together information across the entire community to give IT architects that go into a client and do technical health checks the right information to do these health checks much faster, because you’ll have prepared outcomes already that you need to just verify and extend.

We build a tool that analyzes requests for proposals.

So anybody that has ever worked in a project based business knows, and dreads ‘Requests for proposal’. Usually they are hundreds of pages, often with contradictory requirements.
Just getting all the requirements,the explicit and implicit requirements is quite a task.

So we developed a tool with the help of our subject matter experts that essentially extracts concepts and extracts requirements, maps them to domain ontologies and then going forward, maps them to catalogs of solution components and comes up with a skeleton of a response document.

Now this is actually a fundamental advance because, instead of three weeks of frustrating back and forth, you can come back with a document within a day or so and start solution co-creation with your client. So that’s truly a transformative element.

Analysis of contracts. Well, we don’t let AI write contracts, but AI can raise red flags, right? So it is a second source of knowledge – a second opinion about the risks inherent in certain types of contracts.

We found that this really helps us remove risk factors and accelerate the engagement process.

And we don’t have very firm data yet, but this is actually something that increases revenue.

So the business results are quite compelling.

And last but not least, conversational interfaces.

That’s probably the one thing that gets the most attention, because of all the digital assistants in the consumer space, or sometimes even in the IT space, with cute names and avatars.

But what this is really all about, is understanding of the user’s intent and mapping it to the right response.

And you can start with very trivial requests, like ‘reset the password’. That, incidentally, happens to be the most high frequency service request for IT help desks. Usually people don’t say reset my password, please. They say I can’t access this or this doesn’t work, and so on.

The first step is understanding intent and then, even more difficult, to maintain a kind of diagnostic conversation over several steps.

So that requires context understanding to some degree. It’s actually a very nifty technology challenge and let me talk about technologies very briefly here.

What we found is just naively picking one technology approach is enough.

For instance, we picked convolutional neural networks for pattern detection. Yes, that’s a good starting point, but you really have to understand and prepare your training data and ultimately build a pipeline of technologies to deliver a robust result.

But again, the good news is that you don’t actually need to reinvent much technology because a lot of it is out there. As Mac Devine said in the beginning, there’s a power of open source, right?

For almost everything you need, you can at least find a starting point in places like Github or in the communities built around frameworks like Keras or Tensorflow.

No need for a company to suddenly become a deep expert in coding new types of neural networks. There are commercial cloud services, there are communities like Kaggle that run competitions on standardized data sets that publish the best solutions for certain types of problems. All very, very helpful to get things started,

Again, the benefits of applying AI in IT services are evident.

Here are some of our results.

We deployed in mass, across several 1000 clients.

We managed to filter out the 10% of relevant events through event correlation and order ticketing.
We managed to reduce the resolution time, actually the meantime to recovery, by about 84%. We found this reduction is quite stable across many incident types, 80 to 90% reduction off response time.
We extend continuously the range of incidents that we can handle automatically.
Now, we are at about 70%. 70 – 75 maybe 80% is a realistic goal. The rest will very likely always require some human interaction.
And talking off human interaction:
We have’ support staff assist’ that actually prepares information, pulls together information for support staff in Technology Support Services who, incidentally, go more and more into supporting IoT environments outside of IT. In this way, we manage to reduce their response times by 37%. It’s probably going up going up since then. This is about two years old.

So you see, the stakes are really high and the rewards are really high.

This picture shows a little bit our philosophy.

Remember the picture with the two halves of the brain that Mac Devine showed in the beginning? This is really about partnering a cognitive brain and AI with humans.

And on the left hand side, you essentially see augmenting human intelligence with AI to drive better decisions with the human retaining the decision rights.

On the right hand side, you see increasingly autonomic reactions, and this is really what we’re trying to drive towards: making the autonomic vision finally a reality by turning IT service management into a set of interconnected feedback loops.

In this next chart, when you go from the inside to the outside, you move essentially from sense and response, that is reaction to incidents, to proactive improvements.

That the middle loop of continual improvement and last but not least, building better systems in the first place.

To repeat:

The first is sense and response.
It’s AI insights directly connected to automation, AI brain and automation muscle.

The second is essentially continuous learning, and the third is we’re taking a step back and considering the entire scope of the system to build a better one.

And applying this learning and obviously information moves upwards through all these circles, right?

What you learn from historical data of sense and response, you can apply to improvements both short term or mid term and long term. So that’s that’s where we are moving to.

Well, if you’re interested in some of the technologies around it, some of the choices that we had to make. We published a short article in the IBM Journal Of Research and Development and published a short monograph.

So now let me talk briefly about the holistic and agile approach that we took. And yes, the terms holistic and agile are a bit overused, but I’m using them consciously in the sense that we can actually apply the lessons learned from agile transformation, from lean startup approaches and so on to digital transformation with AI and IoT.

It is very important to communicate the strategy from the top. It sounds almost trivial, but this really sets the stage right. This is not the flavor of the year, the CTO has or even the CEO has read something in an airline magazine and now wants it applied in some toy project.

This is about foundational changes towards knowledge based and data driven approaches. So unless you ground it in the data strategy, your AI strategy will fail.

You have to establish this early. You also have to address what AI can and cannot do and what purpose it serves.
Otherwise your transformation is going to fail.

And in that context, I’m very pleased to see that the community, at large, is now focusing on the ethics and the implications of implementing trustworthy AI.

When you think about the old mantra of autonomic computing, we talked about visibility, control and automation.

Visibility is transparency, right? Control is you have to set the criteria for what provides benefits and then you can automate correctly. When we introduced AI into IT services, we made some mistakes along the way.

For instance, we totally underestimated the importance of visibility. I thought, ‘Oh, we’re going to automate this’. We don’t need dashboards – totally wrong.

People actually wanted to see what’s happening so that they could see that automation actually helped them.

But this is the kind of thinking that needs to go into applying AI.

And this is the kind of communication that needs to go into it, as well.

And last, but not least, unless you have deep engagement with your experts and address their fears and the exaggerated expectations up front, you’re also going to fail.

I’ve actually talked to many people that run projects and they say that expert engagement is one of the critical success factors, or actually lack thereof is one of the major derailment factors,

And then clearly articulate, expected measurable business outcomes.

And this is something that I learned early.

You have to take an iterative approach.

You have to take a lean startup approach.

And just as we heard from the last speaker, start with something that is simple enough that you can actually see your way through an implementation, but that also has significant benefits.
That isn’t just written off as a toy in some area that has no relevance whatsoever.

So choosing your first project well is critical, as is engaging the stakeholders, and stakeholders include both representatives of end users and experts and business leaders.

I mean, this is very much common sense, but it gets forgotten way too often.

You cannot have a CTO team sort of run away with the technology – with technology enthusiasm.

And again, I cannot emphasize enough the importance of design thinking.

Think about how are people using your advisory tool.

How are they going to give feedback?

How are you going to have this feed into a continuous improvement loop for your tool..

And last but not least, make AI part of an optimization of operational processes.

If you just put AI as the icing on the cake, you’re probably going to fail.

Don’t try to solve problems of complexity with AI that you could have solved with simplification right away.
I think, this is something that you have to constantly teach yourself.

You need to ensure a critical mass of skills.

Again. You have to have data science skills because if you use the wrong data, or use your data in the wrong way, you’re going to get the wrong output from your AI systems.

You’re going to introduce bias.

You’re going to introduce strange random effects, and you’re going to lose all trust, right?

Ensure critical mass of skills but make this a community effort.

Yes, you need a center of competency.

But the center of competency needs to organize the acquisition of competency in the entire organization.

I have two examples for this.

I recently read that the country of Finland has a very ambitious program. To have every citizen gain minimal understanding off AI.

I mean, they are starting small. They’re starting with 10,000 decision makers. But, you know, just think about this vision for your country, and you can apply this to your organization.

The lab director of a lab I once belonged to essentially encourages his entire technical population to take the foundational machine learning courses on Coursera.

All right, so I mean that that is what you really need.

And again, it needs to be grounded in the goal of making sense of the data that you accumulate in your operations (and that your clients allow you to touch).

And the outcomes, the business outcomes from there and again, open source.

I cannot overemphasize the importance of open source.

There’s no need to reinvent the wheel, there are wheels off all kinds, of all sizes and embellishments available on github.

There’s an increasing availability of technical and project best practices available.

I very much like Andrew Ng’s “Machine Yearning Learning”. It tells you what you need to be aware of the major stumbling blocks, but there is no need to repeat all the mistakes that somebody else has already made.

So here are a few obvious derangement risks.

I think the most important one is really choice of problem.

You choose something that isn’t well defined, that doesn’t have enough data, that has data that is way too noisy. You’re going to fail on technical grounds.

If you’re overly ambitious, you are going to fail in terms of acceptance.

You have to understand that it’s much, much easier to augment than to replace.

And if you want to replace, it’s easier to replace a task than an entire process – than an entire role, a group of people.

But I think that’s where you know some of the healthcare AI fell on its face by at least implicitly aiming at replacing doctors rather than providing them with a virtual assistant.

Also, don’t underestimate massive integration efforts.

That’s where platforms come in.

If you can have a structured approach for data collection and preparation, if you’ve got API’s into your Data Lake and so on; that is going to help you.

Access to and engagement with experts. I talked about that

And, you know, be humble.

Be wary of all the mistakes you’re going to make, however good you are.

We had a very highly trained research team that overtrained their neural networks.

So, you will make those mistakes.

But be aware what types of mistakes.

Then again, coming back to building a data driven culture.

In the end, it’s turning your environment into a continuous feedback loop.

Let the data drive you to the decisions. making data and insights visible.
I mentioned that before, instrument and monitor your business processes end to end. You’d be surprised how little of that is actually happening in the average IT shop.

If you don’t instrument, you’re not going to get the kind of insights that you need,

I’m sure that you can apply that ultimately, to everything else.

Now let’s talk about fostering knowledge communities.

I think we learned how important it is to engage strong professional communities. To actually make it part of the role of a technical leader to foster knowledge sharing, to work on these tools,

And, quite frankly, build tools that actually provide direct benefits so the users which, in turn, gives them immediate reward for sharing their knowledge.

In summary, I hope I’ve been able to show you that applying AI to the transformation of IT brings substantial rewards, and in particular, that the lessons learned can be applied to any services based business and to IoT.

Thanks very much.


Join the discussion on Linkedin:

About Dr Kristof Kloeckner:

Dr. Kloeckner has extensive experience in building and transforming global teams and businesses and advising clients on digital transformation. He has led many innovative software products through their entire lifecycle from incubation to market leadership and has had P&L responsibility for software businesses. He has held executive leadership positions in business and technology in Germany, the UK and the USA. Until his retirement in September 2017 as Chief Technology Officer of IBM Global Technology Services, he was responsible for building an AI-driven technology platform for next generation infrastructure services for hybrid cloud environments, enabling a shift from a people-based to a technology-based business model for services. During his 33-year career with IBM, he has been at the forefront of technology innovation. As General Manager, Rational Software, he transformed IBM’s development tools portfolio to DevOps and Continuous Engineering for IoT. As CTO of IBM’s first Cloud Computing initiative he created the Beta for IBM’s first public cloud. As CTO of IBM Software Group, he introduced lean and agile development at scale throughout the company. As development leader for Tivoli, he evolved the portfolio towards service management. As CTO for WebSphere he developed the technical strategy for service-oriented architectures within IBM’s application integration middleware platform. As Director of the Hursley Lab in the UK and VP of Business Integration Development, he grew IBM’s message-oriented middleware to a full integration portfolio including IBM’s first Enterprise Service Bus. As Director of the German Software Lab, he developed IBM’s first workflow management system. Dr. Kloeckner holds a PhD in Mathematics from Goethe University Frankfurt and an honorary Doctorate of Science from the University of Southampton. He is also an Honorary Professor in the Department of Computer Science at the University of Stuttgart. He co-wrote a book on ‘Transforming the IT Services Lifecycle with AI Technologies’.