The Gartner iPaaS magic quadrant for 2023 is out (https://www.gartner.com/doc/reprints?id=1-2BSDJCBH&ct=221122&st=sb) and while we shred the infographic and debate the best in breed etc, one key observation from someone in the integration domain has been that we have become quite good at building and delivering system integration solutions through mature platforms, tools and processes
The advent of reliable integration platforms-as-service (iPaas) has enabled us to deliver systems integration components faster with improved quality and better observability. I love how simple CLI (command line interface) commands can launch pre-templated environments in minutes vs months! These platforms however cannot resolve issues such as poor architecture (spaghetti integrations), dependency on source systems and bad data. The latter two we leave to the data team – right?
The Data Teams
In meantime, in parallel data teams have been working along side integration teams to solve data quality, governance, reporting, analytics and other problems sometimes with the help of the integrations but more often by directly slurping data from source systems periodically (batches) or in near-real time (streams). As a result the data teams have been playing with some interesting concepts such as Data Lakes, Data Hubs, Data Fabric etc
A Data Hub is a place where data from systems is collected (real-time, near real-time or periodically) and served to consumers. It is a place where data is ingested, curated, managed and served in contrast to the traditional paradigm of data from the source systems
A data hub would solve some of the challenges of an integration platform and could potentially become the source of data for delayed consistency use cases (some architects see these as the place where we even mutate some data)
The trough of disillusionment
Data Hub strategy has moved from being an innovation trigger to the trough of disillusionment in June 2022 (source: Gartner June 2022). While Data Hub is still in the priority matrix for clients to implement in 2-5 years, there are challenges when actually implementing one
(Update: Interestingly this InfoQ https://www.infoq.com/articles/business-system-integration/ highlights the need for a Data Hub and mentions vendors solving the implementation problem by hosting your data and providing data hub as a service)
Before organisations can build a Data Hub they require careful planning, data integration and management, and ongoing operation and maintenance in order to be successful. So besides being clear on capabilities of a Data Hub, especially what it does not do, there are 5 key challenges when implementing a data hub that we have experienced
1. Integrating Data:
Integrating data from various sources is one of a data hub’s major challenges. This may entail handling data in various structures, schemas, and formats as well as making sure that the data is correctly transformed and cleaned before it is loaded into the hub.
2. Data integrity
It is crucial to ensure data quality within the data hub because bad data can result in incorrect conclusions and decisions being made. In order to maintain data quality, processes must be put in place to make sure data is entered correctly in the first place, as well as ongoing efforts to find and fix errors.
3. Data governance:
Data from diverse sources is combined in a data hub, which may raise concerns about who is the owner of the data, who has access to it, and how safe the data is. Clear data governance policies and procedures must be established in order to ensure that data is used responsibly and in accordance with existing regulations.
It might be difficult for the data hub to scale and satisfy the needs of the company as the volume and complexity of data rise. This can necessitate making investments in new technology and infrastructure in addition to putting into place data management procedures that can accommodate expansion.
5. Maintenance and support:
A data hub’s maintenance and support can be a major job because it calls for constant work to guarantee the system’s flawless operation and deal with any potential problems. For businesses with minimal IT resources, this can be especially difficult.
Amplifying the impact of Data Hub with Integration Hub
The key to building a successful data hub is being able to let consumers and providers become autonomous without needing a lot of support to integrate in and out of the hub. The data hub needs to be able to reliable, facilitate ingestion and subscription from new end systems without handholding. For example, allowing a publisher or subscriber to register and publish or subscribe information would super charge the growth and value of the data hub
A lot of this capability does not come with a standard iPaas, rather these would be platform services that would part of an Integration Hub. This hub would use existing iPaas features to provide capabilities to enable ingestion and subscription of large volume of data in periodic or real-time cadence. The integration hub would come with Registration API for providers and subscribers, APIs for automating publishing/subscribing to batches (Batch Feed API), Event subscription etc
So Integration is key to Data Hub’s success because it can streamline inbound and outbound integration from multiple sources on an ongoing basis, it can help facilitate autonomy for producers and consumers, it can help with data authorization when serving the data to systems or users, it can facilitate data management through observability especially for data in motion
Here are some key benefits of good integration for a data hub
1. Data integration:
An integration hub can help to streamline the process of integrating data from multiple sources into the data hub. Data quality and governance features, as well as the capacity to monitor and trace data flow, are just a few of the tools and capabilities that an integration hub can offer for controlling data inside the data hub. This can aid in ensuring the accuracy, consistency, and compliance with any applicable regulations of the data contained within the data hub.
2. Data management:
Data management tools and capabilities, such as data quality and governance elements, as well as the capacity to track and monitor data flow, can be offered by an integration hub. This can assist in making sure that the data in the data hub is reliable, consistent, and complies with any applicable laws.
By offering the capacity to handle greater amounts of data and support a greater number of interface points, an integration hub can assist in scaling the data hub to suit the demands of the company.
Reusing integration assets and practices can help to decrease the complexity and expense of data integration initiatives. An integration hub can encourage this.
The time and effort needed to load and update data in the data hub can be decreased with the use of an integration hub, which can offer tools and capabilities for automating data integration procedures
So Integration is the only thing for a good Data Hub?
Well not quite. We need to acknowledge that it only serves as subset of integration use cases and a Data hub can be a huge security risk. Talk about all eggs in one basket!
Security, especially a zero-trust architecture along with data encryption at rest and data authorization for APIs are key. Streaming data and events are also key for ingestion based on data change or state change. Domain Driven Design is also key to a good Data Hub architecture, if you data is not defined well then either we serve a system-view of the data (i.e. system data model) which then requires translation and mapping or we serve a mish-mash of combined data from systems which is incoherent noise; good Data Hub architecture needs a clear domain model to align with its business capabilities
Finally, knowing what the Data Hub cannot do is also key to its success. We see implementors and vendors converting the data hub from an operational store to the system of record! This is clearly bonkers because now we would need the business logic, behaviour etc from the core systems in the Data Hub which is not its intent. Besides data mutation (create, update, delete) there are use cases that require querying the system of record for real-time consistency needs that are unsuitable and not recommended for a Data Hub
In summary, we are seeing Integration teams work closer with Data teams to enable them to accelerate the value delivered through their data platform such as a Data Hub through really good integration platform services built on iPaas. These integration platform services are part of the Integration Hub where platform APIs such as Registration, Batch, Data Authorization etc are foundational
We have seen that while Data Hubs can add value to organisations by accelerating new integrations, providing a platform for data management, governance and security; they are headed to the trough of disillusionment in the adopt curve because they are not very usable. We can bring Data Hubs to the slope of enlightenment with good iPaas for Data Integration (which themselves are at the slope of enlightenment) and in the next few posts I will walk us through some of the key aspects Data Hub Integration iPaas (Integration Hub) including Data Authorization, Ingestion, Registration, Events and Subscription