We sat down with Steven Astorino, IBM VP of Development, Hybrid Cloud, z Analytics, and Canada Lab Director, to chat about his work in machine learning and his upcoming talk at Reactive Summit in Montreal.

So, first of all, could you talk about your background? What initially drew you to machine learning and data science?

Sure. One aspect of my job really revolves around data science and machine learning. I manage IBM products like Data Science Experience, Watson Studio, SPSS Decision Optimization, Watson Explorer—that's all on the ML/data science part of the house AI. And then I also have DB2 on the Mainframe, for example, some of the most mature products in the world.

My background is in computer science. I was a programmer early in my career. Analytics has always been kind of close to me, in terms of having a real understanding of things and being able to drive insights out of information and data, whether it's on a personal level or whether it's enterprises and so forth. I've worked in different industries as well.

But most recently in this role, actually going back a few years now, we were looking at the next generation of what we should be building. And IBM had a large involvement with BARC and BARC Technology Center, contributing to open source and being able to do analytics and machine learning. Through there, we really started looking at what tooling could be made available. So I'm not sure if you're aware, but everything that we do now at IBM is really about design thinking and design first methodology — understanding who the personas are, what their job is, interviewing those individuals, and really trying to create a fit for purpose experience for them to be able to do their job much easier.

At that time, we were going back to what persona we should be looking at. "Data scientist" was one of the key personas we felt needed to be focus on. That's where products like Data Science Experience emerged into providing an IDE-type environment for data scientists. It's evolved into much more than that, as we added the IBM Research technology to it, and automated things on the machine learning aspect of operationalizing and productizing models. We can push into production, monitor, retrain, and so forth.

That's how this came about. Obviously, there's a real need for this. There are a lot of companies out there, including ourselves, that focus directly on the algorithms. So we do that as well. We have a core set of algorithms that are very key to machine learning and data science, but also around the the tooling and the automation, providing an end-to-end life cycle, being able to inject that into data, clean the data, build your models, and then prioritize and do that iterative process.

What would you say are the biggest challenges around operationalizing machine learning for the "real world?"

I think operationalizing machine learning can be a fairly simple task. What complicates thing is, if you look at large enterprise companies, leveraging the tools to automate some of the business processes they have set in place. To integrate this type of technology into those business processes is one of the biggest challenges.

Having the data scienctists empowered to build models, then, is another. Whether it's the machine learning engineer or the IT organization, they can take these models, work with the line of business, and then push them into production. It's more of a culture shift than anything else. In fact, the tools we provide include the concept of collaboration specifically to enable and automate them so that it's easier to get into production.

Another challenge for enterprises is being able to really, truly explain. Let's say you're a large bank or insurance company. You've integrated the business processes now, and all of a sudden, you find yourself in a situation where you're getting audited. How do you explain what decision a model took, and why? Coming to a high level of detailed understanding of why a model made that recommendation, and being able to explain in an audit situation — that's another really big one. There are some tools out there, and we're actively working on some ourselves, but it's one that hasn't been resolved yet. Especially at the enterprise level, in a financial situation, it's going to take some time to accept some of these latest technologies.

So we're talking about making things more transparent.

Yeah, and it's not necessarily just transparency, but having things documented in a simple way. These algorithms are super complex, right? How do we explain to a human how a decision was derived, based on maybe gigabytes of data, historical data with different labels, and different variations in the algorithm? That's really where it gets a bit tricky and complicated.

That sort of ties into a really interesting conversation happening right now around bias in AI. How do we address the issue of bias in a reactive environment?

As humans, we tend to make more mistakes based on our own biases, and also our own environment around us. Whereas a model with machine learning will not take some of those variables into account that would give you a true mathematical answer on whether to approve or reject an insurance claim, as an example.

From that standpoint, there are pros and cons. On one hand, you could say, "Look, the algorithm and the tool will do a better job than a human would." But, on the other hand, unless an algorithm is trained to detect things a human being would be able to detect, we have more sense than the machine would. So I think this is going to be an ongoing exercise for us to do better, both ourselves and the technology, to make sure that some of these things will come to fruition.

Your keynote at Reactive Summit—who should attend that, and why?

The audience should be anyone who has already had engagement with AI, machine learning, and data sciences, and also the traditional IT individuals who are now AI DevOps or machine learning engineers. Anyone in the IT industry really, who is an IT architect, or who is in charge of managing systems and bringing these machine learning models into the production environment, I think would benefit from this.

If I look at a high level, machine learning continues to improve by automating the tasks that humans would do. We're leveraging machine learning to get those automated and to make better predictions on the different industries. It will help us eventually create better services, either for us, for our clients, or for healthcare. There's a lot in all kinds of industries that we're focusing on, but to me, it's about how we're making better recommendations to solve bigger problems.

Cool. Is there anything else I didn't mention that you wanted to cover?

During the keynote, I will also touch base on the differences between machine learning, deep learning, data science, and so forth. There's a lot of confusion in the market still, and I think it's important that we all get clarity in the industry. Then I'll go into some of the deep learning aspects, and the end-to-end lifecycle of machine learning with a reactive trademark as well.