sap btp datasphere

SAP explains how Datasphere gathers together data without losing context

SMBtech spoke to SAP‘s Irfan Khan who, incredibly eloquently, told us about the company’s new Datasphere product which can gather, combine, store and examine different, historical datasets without losing the original context and metadata that helps define what it is.

You can watch the video below or read the (lightly-edited for clarity) article about it below

Video interview with Irfan Khan

YouTube video
Irfan Khan talks SAP Datasphere.

What’s your name and what do you do?

So firstly, it’s a pleasure to meet you, Nick. My name is Irfan Khan and I’m the President and Chief Product Officer for SAP’s HANA Database and Analytics planning portfolio, which is a, a broad portfolio, which is part of SAP’s business technology platform. These assets have been around in some instances for five+ years. Some of them are relatively new, but, it’s a research and development organization that I run and I have global responsibility for this portfolio.

We’re at your technology tour. What are you showing off that’s cool and interesting?

What I’m sharing, is an update for the Australia and of course, New Zealand region, is that we announced called the SAP Datasphere. Let me just explain the context and why we feel now is a time to announce a product like Datasphere and what it’s built upon.

It’s built upon the BTP services – Business Technology Platform. It’s an aggregation of a variety of different services, which also includes things like integration and extensibility services, automation, AI, etc. And the foundation really is that we talk about customers business context in many ways, if you think about the historical problem, technology in itself has been used to simplify and accelerate productivity for various classes of data. But, the underlying problem is that we always assume that data has to follow the technology.

Imagine that you’ve got a new data warehouse in an historical on-premises environment. You almost exclusively have to move the data to those environments in order to be able to take any advantage. What happens in that particular case is that the source systems that you extracted the data from typically would’ve had the context of that data, the metadata associated with the business foundations or where the data came from.

So, automatically you’re almost paying a, a tax, an egress tax, but beyond an egress tax, you’re paying very much a productivity tax because you no longer can have the business context of that data. So with SAP’s Datasphere, it comes under the umbrella of business data fabric architecture. And, essentially what that means, is that it allows for federation of data. So, imagine that you’ve still got the data in those source systems, whatever the underlying SAP source systems may be or non-SAP systems.

And going through this one virtual business data fabric layer, you’re able to reconstitute all of that business context independently, wherever the data resides. You don’t need to physically move the data in order to be able to have additional value. So, in other words, you can use the data in the most appropriate technology, don’t necessarily need to be re-platforming it into a separate new consolidated environment. And by doing so, it means that you can drive a far greater level of product productivity and greater use-cases for customers. And all you’re effectively having to deal with there right now is the business value, not the underlying plumbing and the integration data after the event.

Where does all this data get moved to? Will it be somewhere like the hyperscalers and the big clouds?

You look at any conventional customer landscape, it pretty much is all of the above. You’ll have data residing in the public cloud, pick any of the hyperscalers, you’ll have of course the on-premise assets, which are never going away anytime soon. And then, even maybe a private cloud as well. So, typically the data would be across a heterogeneous environment of many, many different locations and potential different technologies that are being used to manage, persist, and store that information. And as I said, the big thing with the business data fabric is we are not asking or forcing a customer to have to reconstitute all the movement of that data now into the Datasphere as if as if it was a physical technology layer.

This is a virtual business tech, a business-fabric layer. So, the data still sits in all of the environments that you just described, but you are actually able to take benefit from a rich experience of governed data – trustful data – where you can actually keep the preservation of all the business context without having to physically move it to yet another technology layer to be able to do that integrated reporting that you may be seeking from the overarching business case.

Why hasn’t this been done before?

It’s a very interesting question to ask. Why is it that this hasn’t been accelerated? In fact, yes, I mean there have been data meshes and data fabrics. These have been around for quite some time. But, unfortunately, they still predicate that you have to somehow move data or at least localized data in some location to have much more of an integrated virtual experience.

And by doing so, that business context is lost. So, really what we’ve been advocating is this is not SAP flying solo trying to build a business, data fabric exclusively by ourselves. Yes, we have a lot of capabilities. I described some of them under the business technology platform, but this is much more of a data ecosystem that we’re building now. So, imagine some of the providers of the underlying technology – the vendors that we’ve now curated and onboarded into our SAP data ecosystem – and under the Datasphere umbrella. Vendors like say Databricks.

Databricks is best-in breed-now, which are really driving a lot of the accumulation inside of a cloud native data lake. If you think about Collibra, which is another vendor that we’ve now introduced into our ecosystem, they are leaders as relates to data lineage, data privacy, data governance. And if you think about it, there are essentially a catalog of all catalogs that may be out there and the list goes on… DataRobot and, of course, Confluent: another couple of examples that we added to our ecosystem.

So, precisely answering your question, why has this not happened before? It’s because SAP data by definition is of high value and high significance, in pretty much every customer environment. And the argument from an SAP perspective has always been, ‘Why move the data out of SAP (losing context) when you could essentially connect to SAP? And, the response from the industry’s been, well, it makes it a little bit difficult for us to always connect SAP if we’ve already made certain technology decisions.

And now the kind of aha moment is, through the data ecosystem we allow customers to play to the customer stack – the stack that they’ve invested in over the years. And it makes a lot more sense now by us onboarding all of these best technology providers because it means that there isn’t a re-platforming event.

We are not forcing data to have to now land in the new technologies. We’re bringing the technologies to actually combine it with the existing data. And that’s essentially the big thing. SAP is now looking at this very optimistically to be able to help the customers and, of course, new prospects out there, to have a single business data fabric and allow the data sphere to allow them to preserve business context.

How much is of the metadata is siloed and how much of it is absorbed?

If you think of a persona where a business user wants to have more self-service against business content, the business content itself needs to have semantic integration. So, you need to know all the underlying data.

Imagine that you’ve got the concept of an invoice. An invoice can be represented as a virtual data product. Invoices typically will have order line items. Perhaps there’s gonna be supply chain data, maybe transportation, logistics data. All of that can be concatenate into a data product that you create as, for example, a virtual data product called an invoice. Now if you look at the historical data, which you may have the maybe sitting outside of SAP, you can create the same semantical level of integration around what we call virtual data products, represent them within the Datasphere.

So, the data sphere is effectively a cataloging integration technology that allows you to build a semantical level of modeling against all of the data.

The secret source in all of this is really that the Datasphere exposes a catalog. You can register all the artifacts, all the different data sets that have represented as either physical or virtual data products. And by doing so, you can open up the foundations of analytics and planning use-cases to operate against that data. So, the idea really is, is customers will essentially self-regulate their data products and whether that data is being served up through maybe Collibra, which has already been registered in their catalog and integrating with the data sphere, this is almost like a catalog to catalog integration. And by virtue of SAP’s data all being nicely managed within Datasphere and then being able to correlate that data to data coming from the non-SAP world – with best of breed vendors like Collibra – it means that you can effectively do self-service against all the data that you want, not just the SAP or the non-SAP data.

Leave a Reply