Leveraging Scale to Unleash the Power of Unstructured Data


Harnessing the rich and varied insights contained within unstructured data can create a game-changing, competitive advantage for asset managers.  By leveraging the power of technology, unstructured data can be utilized to predict earnings and enable active managers to stay ahead of the market. But what is unstructured data and how best can asset managers exploit it?

Unstructured data, which typically accounts for approximately 80% of a company’s data asset, includes text, emails, spreadsheets, presentations, images, video, and audio files. While unwieldy, expensive to store, and difficult to analyze, each of these unstructured data categories contains information that can be mined, analyzed, and leveraged to identify anomalies and provide insights creating propriety value and improving investment returns.

By way of comparison, structured databases represent only about 20% of all data stored by an average company and are, maintained in tightly organized tables, inexpensive to store and relatively easy to analyze.  Historically, structured databases were designed to host all data that was deemed to be necessary to conduct, analyze, and report on a business.  Of course, these data can and do support the businesses for which they were designed but often there are quality issues or gaps that can degrade the results.

We are starting to see a confluence of exponential growth in AI technologies and advances in data storage and sophisticated analytics yielding opportunities for active managers to gain value from unstructured data.  New ways to filter unstructured data, including robotic process automation, natural language processing and other artificial intelligence techniques show real promise in harvesting the approximately 10% of relevant data ‘nuggets’ that can provide differentiating insights. The notion that the benefits of wrangling these unstructured data sources come at substantial cost and yield relatively uncertain results is moving into the rear-view mirror for many leading firms.  

But how can this be applied to active investment management?  Efficient market hypothesis (EMH) holds that all information relative to a stock’s price is available publicly and has been incorporated into the stock’s price and therefore, in the long run, it’s not possible to beat the market.  Active fund managers do not subscribe to this hypothesis, believing instead that by making informed buy, hold, and sell decisions they can outperform the benchmark. In the pursuit of better than benchmark results, any data advantage can deliver market outperformance.

By relying on various tools, experiences, analyses, and research, active managers seek to beat the benchmark indices.  The ability to leverage unstructured data, an option increasingly available, can make a difference.  As technology finds new ways to query data that previously could not be mined, analysts can leverage unstructured data to better predict and take actions, often before the market can react.  Examples of these unstructured sources include:

•    Transcripts of earnings and other conference calls
•    News articles 
•    Market analysis and other frequently published information
•    Text data such as EDGAR filings, Central Bank statements
•    ESG data embedded in sustainability reports or product reviews
•    Social media apps and investment blogs

Beyond traditional benefits of mining unstructured data, such as new product development, operational efficiency and targeted marketing, unstructured data is becoming a key strategic imperative that can be employed in the investment industry’s front, middle, and back office settings to provide firms with insights into identifying risk, balancing portfolios, and finding alpha.

As it pertains to investment management technology and operations, there is a real challenge around how to retrofit a data architecture that was built for structured data to make unstructured data a competitive advantage.

Before unstructured data can be harnessed, it must be stored.  The storage of both the raw unstructured data and the processed, valuable data present challenges for technology operations that were designed to store and analyze deeply structured, space efficient, databases.  Unstructured data storage, filtering and analytics require large capacity databases and high-powered processing capabilities.  Most organizations have and will choose to host these data externally.  It makes sense.  Cloud technology providers can deliver scalable storage solutions that can be throttled according to need with little or no lead time.  Managers who are designing future state architectures are increasingly turning to cloud-based providers.

There are many benefits to leveraging cloud technology.  Most importantly, cloud services tend to be easy to use and eliminate the need for companies to maintain their own storage and internal resources to support it.  Another advantage is the ease with which new users can access and leverage the technology as well as the simplicity of sharing resources and data.  Other benefits include ease of synchronization to numerous devices and of course ease of backup and disaster recovery.

However, when dealing with unstructured data and the need to analyze it, there is one cloud benefit that outshines the others: scalability.  It is the inherent scalability of cloud-based storage that makes it ideal for unstructured data management.  The sheer volumes of textual files, photographs, videos, unstructured spreadsheets, etc. require vast storage, often far beyond an operation manager’s initial, expectations.  Designing a means to store, analyze, and gain advantage from these voluminous data sources, requires vast computing power and storage of the kind that for most companies can only be gained by investing in cloud-based solutions. Indeed, more and more providers are offering single solutions that can ingest raw unstructured data, leverage, and customize existing algorithms, deliver the ‘nuggets’ of valuable insights and provide analysis that will ultimately increase revenues.

Operations managers considering their future state architectures and operating models should consider a future where unstructured data is mainstream and relied upon to make even the most routine business decisions.