Evolving priorities demand data evolution

This blog was originally published as part of Citisoft’s Outlook 2024

The call for asset managers to modernize their data strategy has never been stronger. Most operational leaders know they must think strategically about their data ecosystem to gain a competitive edge but the path to data modernization remains uncertain, especially as the data landscape evolves.

With that in mind, we're highlighting a few areas of innovation from our Outlook 2024 paper that merit consideration and exploration for asset managers undertaking a long-term view of their data strategy—particularly as it relates to the proliferation and democratization of data.

Supporting data proliferation

A data lakehouse is particularly well-suited for advanced analytics as it can store data in its raw form and apply schema-on-read (transformation that occurs at the time of query). A use case for an asset manager might look like this following:
  • A data lake ingests and stores unstructured data like economic indicators alongside structured data like historic stock pricing
  • A data lakehouse is then used as an analytical layer by a data scientist who applies a custom analytical model to this data to explore what might happen to stock pricing next quarter 
  • This data scientist then selectively moves data from the lakehouse to the data warehouse which can then be easily pulled into a standardized dashboard for a portfolio manager

Over the past decade, asset managers have become increasingly sophisticated in their approach to data architectures—shifting from a pure data warehousing strategy to endeavoring toward big data technologies and data lakes that are capable of storing and analyzing structured and unstructured data. As we entered the 2020’s, asset managers accelerated the movement of data architectures to the cloud which has opened up new possibilities for data storage, integration, and advanced analytics.

As a result of this shift, new paradigms for data discovery, preparation, and delivery are being shaped. One area of particular focus over the last year has been the concept of data lakehouses, a hybrid data architecture that combines the benefits of a data lake and a data warehouse. Like a data lake, a lakehouse can store all types of data in a centralized repository, including structured, semi-structured, and unstructured data. However, like a data warehouse, a lakehouse is also designed for interactive analytics.

A lakehouse solution has the benefit of reducing maintenance and development costs associated with ETL processes, improving flexibility in terms of types of data stored, and is cloud-native which, when optimized, will reduce data storage costs. However, for many managers who have implemented data lakes, gaps in metadata, data quality, and lack of structure have formed what some refer to as a ‘data swamp.’ Those looking to explore a lakehouse approach will benefit from implementing well-planned data governance frameworks in tandem with or before more advanced data architecture projects.

Driving innovation through democratized data

At the risk of looking too far ahead of the industry, we think it worth exploring a few concepts that have emerged to help address the rise in data users and use cases. As many data organizations will attest, a cultural shift has emerged in recent years where an increasing number of business users want self-service access to support data-driven decision-making. However, few firms have developed their data environment to support the diversity and scale of new user groups. This has introduced new challenges around governance, toolsets, data types, and data storage. As the imperative to democratize data access grows, these issues will intensify and many investment managers are considering how to better support this shift.

A few new technologies have emerged in investment management to support better data access—one of which is the concept of a data fabric. A data fabric is an architectural layer that provides a unified view of data across different systems and silos. While a data lakehouse offers underlying data storage, a data fabric acts as a layer capable of integrating data from various sources, incorporating data catalog and metadata management, and transforming data for a variety of uses. A data fabric is not one singular technology, but rather an ecosystem of tools, processes, and frameworks. Examples of tools that may exist as part of a data fabric include data integration platforms, data virtualization solutions, or data catalogs. Among the technologies in this ecosystem is also the emerging concept of data mesh.

In a data mesh, a functional data product team (e.g., a distribution data team or a portfolio management data team) is responsible for managing their own data. This includes owning the data, defining its schema, and ensuring its quality. This de-centralizes data ownership and governance and empowers data users with better access and agility. 

Designing a modern data operating model

We’re at an interesting crossroads in our industry where the potential of available technologies is capturing significant mindshare but the reality of most operating models does not provide the groundwork to harness them—yet.

In advising our clients on their plans for transformation, we find that the underlying strategy behind people, process, and technology has never been more important. These elements are inextricably linked to data operating models and must be the driver behind modernizing our approach to data architecture, operations, analytics, and governance. Stay tuned for more thought leadership from Citisoft's data team this year as we dive deeper on how to design, plan for, and achieve modern, scalable data operating models.