Quick and efficient data onboarding and sharing may not be the most glamorous features of cloud data platforms—but they might represent the greatest ROI potential. As service providers and market data vendors have started to broadly offer these integration and collaboration capabilities, this functionality can be one of the quickest ways for a cloud data solution to begin delivering benefits to end users. Yet, perhaps even more interesting is the potential it holds for creating entirely new data operating models—eliminating internal data silos, creating new opportunities for asset managers to collaborate with partners, and even providing new services to clients.
As investment management operating models grow increasingly interdependent, data relationships with service providers, market data vendors, and even clients, aimed at minimizing the friction in onboarding and sharing data become an important objective of any data strategy. 'Because everyone else is doing it' might not always be a sound premise for action—but when your priority is to find efficient ways to collaborate, then choosing a platform with a robust data marketplace and sizable user base can make a lot of sense.
The Data Onboarding Problem
In the past, bringing in new data across organizational boundaries typically involved a series of activities that had to be coordinated and project managed. Secure file transfer structures or API mechanisms would have to be agreed upon and established, staging processes implemented, and ETL code written to bring data into a warehouse and integrate it in a form that could be used by consumers. Each of these steps required ongoing monitoring, maintenance, and change management.
The difficulty of data onboarding limited what could be realistically brought into an organization’s data platform to the most important data with rigidly defined structures and processes that rarely changed. This limited the value that data platforms could deliver, but it also introduced operational risk by encouraging the proliferation of ‘shadow’ data processes to deal with everything else. What might start as a few direct downloads or ad-hoc reports often evolve over time into business-critical processes reliant on a fragile web of manual, ungoverned data processes.
Cloud Data Marketplaces
Cloud data platforms provide the scalability and performance that legacy data platforms simply could not support. Based on these foundations, cloud data sharing capabilities make it easy for a data provider to share data ‘in place’ with multiple consumers—that is, without needing to copy the data. This makes it possible to efficiently share large and complex datasets and for any changes to the data to be immediately available in a consistent manner for all consumers.
Data marketplaces and exchanges such as those offered by Snowflake, AWS, and Databricks use these capabilities to aggregate and offer data from a wide variety of third-party data providers. Organizations are also starting to use this capability to set up their own internal data exchanges, sharing data between teams across the organization as a means of eliminating data silos.
By taking advantage of these data shares, analysts and data science teams can rapidly obtain access to data sets for analytics, and in a single query combine fresh, curated data from multiple sources without the need for building data pipelines and ETL processes or moving source data.
This multiplies the value of your data—beyond making it easier to start working with a new dataset, fluid data integration enables users to probe and derive new results by combining a dataset with others already on your data platform.
The Limitations of Today’s Cloud Data Platforms
The growth of cloud data platforms and marketplaces, however, means that your partners and data vendors are using a variety of platforms today. To solve this problem, specialist vendors are starting to offer data providers the means to efficiently share their data across multiple vendor platforms and geographic regions. While these nascent solutions mature, each platform offers workarounds to integrate data from other platforms—although this typically involves some of the costs and overhead of data movement.
Data sharing also raises challenges around data security and access controls. Current mechanisms often implement simple ‘all or nothing’ access control across organizational boundaries rather than more sophisticated ‘RBAC’ or role-based access controls. This places the onus on the consumer to ensure that they are enforcing and abiding by the conditions of access and licensing within their organization, making data governance a critical element of any data strategy seeking to enable data sharing.
Data Governance and Pandora’s Data Marketplace
In the past, data platform governance was often achieved through simple technical controls—there was a relative inability to bring in, work with, or extract data without platform administrators and technology teams doing the work to make it happen. With many of these technical obstacles eliminated, governance has a more important role than ever in guarding against risk.
For more on Data Governance, read Citisoft’s Data Management Fundamentals whitepaper
Data governance should apply controls around data sharing by establishing a structured process by which share requests can be submitted, reviewed, and implemented. This does require an agile and outcome-oriented process to engage and educate data consumers, but the benefits are twofold. Doing this well can add value not just in defending the organization against the risk of unmanaged data but also by enabling offense; for example, by establishing an effective metadata data catalog that adds value by making it easier for consumers to quickly find and work with data that is on the platform.
Implementing a Data Sharing Strategy
Data sharing offers a powerful catalyst for easy wins within a broader data strategy—enabling faster delivery of analytics, empowering business users with timely access to information, and decommissioning inefficient, high-risk manual processes. When implemented thoughtfully, accounting for a firm’s current state, culture, and business drivers, a cloud data sharing strategy can drive visible results quickly while laying the clean architectural foundation for longer-term operational and strategic advantages.
To capitalize on these opportunities, organizations should adopt a delivery-focused approach. This involves identifying existing service providers and market data vendors where sharing capabilities can be quickly enabled to maximize the value of current relationships. At the same time, the strategy must embed sound governance practices to manage risks—establishing oversight mechanisms, enforcing access controls, and promoting the consistent and efficient use of data through robust metadata cataloging and data lineage practices.
Reducing the friction of data exchange across siloes and organizational boundaries not only enhances analytics capabilities but also unlocks transformational changes to operating models. Batch processes can be reengineered into more dynamic workflows, file exchanges and manual extracts can be eliminated, and business processes across counterparties can become more integrated and real-time. Although limitations in technology interoperability and governance frameworks still exist, the ongoing evolution of cloud data sharing capabilities offers a clear pathway to improved collaboration, faster innovation, and greater organizational agility.
Building a successful data sharing strategy requires attention to your business drivers and a balance of offense and defense—delivering value rapidly while maintaining control and compliance. The journey from legacy complexity to modern efficiency can be winding and each firm’s position on that path is unique. Combining an in-depth understanding of asset management technology and operations, Citisoft can help chart the path from where you are today to the cloud-data-enabled destination best suited to supporting your business goals.
Comments