Citisoft logo Citisoft logo
  • Capabilities
    • Capabilities

      From design to delivery, we're your partner at every step.

      • Services
        • Strategic Assessment
        • Operating Model Design
        • Vendor Evaluation
        • Systems Implementation
        • Outsourcing Transition
        • Program and Project Management
      • Practice Areas
        • Investments
        • Operations
        • Distribution
        • Data
      • Markets Served
        • Asset Managers
        • Asset Servicers
        • Asset Owners
        • Wealth Managers
        • Insurance
  • Insights
      • Blog
      • Resources
      Unlock the Industry's Transformation Agenda

      Get the full report to benchmark your transformation strategy and stay ahead of industry change.

      Read the report

  • About
      • Management Team
      • Vendor Relations
      • Events
      • Contact Us
      Unmatched Expertise

      From business transformation to on-the-ground delivery.

      Learn more

  • Careers
  • Submit an inquiry
  • Capabilities
    • Services
      • Strategic Assessment
      • Operating Model Design
      • Vendor Evaluation
      • Systems Implementation
      • Outsourcing Transition
      • Program and Project Management
    • Practice Areas
      • Investments
      • Operations
      • Distribution
      • Data
    • Markets Served
      • Asset Managers
      • Asset Servicers
      • Asset Owners
      • Wealth Managers
      • Insurance
  • Insights
    • Blog
    • Resources
  • About
    • Management Team
    • Vendor Relations
    • Events
    • Contact Us
  • Careers
  • Submit an inquiry

Citisoft Blog

    • Topics:
    • All
    • Investments
    • Operations
    • Data
    • Our Industry
    • Technology and Innovation
    • Vendors and Service Providers
    • Program and Project Management
    • Compliance and Regulation
    • Systems Implementation
    • Operating Model Design
    • Solutions Market Perspective Series
    • Outsourcing Transition
    • Strategic Assessment
    • Asset Managers
    • Corporate Social Responsibility
    • Vendor Evaluation
    • Distribution
    • Industry News
    • Asset Owners
    • Wealth Managers
    • Annual Outlook
background image
Article
•
May 21, 2019

Implementing Data Lakes: 3 Steps to Avoid Creating a Swamp

Chris Guild Chris Guild
implementing-data-lakes-3-steps-to-avoid-swamp-preview

Kayaker on lake in early morning with mountains in background

Kicking off a new data lake implementation and looking for the best place to start? Well, as the adage goes, prior preparation prevents poor performance. There are several factors asset managers should consider early on in an implementation to avoid turning the new lake into a one-way repository of useless data. It is essential to consider lake organization, data lifecycles, and usability to help ensure the lake is a strategic asset.

Prior to jumping into your data lake, it is important to establish a common definition. I have found the best way to describe a data lake is the “lake” is a data construct in its most natural state which accepts data from source systems in an untransformed format. The primary benefit of this approach, compared to a data warehouse or enterprise data management (EDM) tool, is that it helps overcome cost and storage problems that may exist within these other tools. To realize the full benefits of implementing a data lake, below are several points to consider before launching your implementation.

Lake Organization

As a business user, one of the first misconceptions of a data lake is that it’s an unstructured object. In an unstructured approach, it is easy to quickly diminish the ability to analyze data and derive meaningful insights to help drive your business.

Instead, consider that the data lake can be organized into smaller components called ponds. An easy way to wrap your head around this structure is to view ponds as spaces within the data lake to help begin organizing data. It is essential to recognize that a firm can organize its data in many ways. For example, the business can structure their lake by data types like analog, application, and textual.

Another core question a firm should answer before creating a lake is how will data move between ponds? As a best practice, a raw data pond can be put in place as a staging area for data that has not been classified or conditioned by the lake. Also, data should only transfer to an archive pond when it ends its useful life across ponds. As a result, a lake should include raw and archival data ponds to support the data lifecycle.

Data Lifecycle

Data, like so many other aspects of the financial services domain, goes through a lifecycle. The firm should consider core lifecycle questions like how data should move from pond to pond, the type of activity allowed in each pond, and the kind of conditioning that can occur in each pond.

In the case of data transformation in the lake, your firm should carefully consider how to move the data conditioning process. A data lake is not simply a repository for data, but also a system that focuses on transforming raw data into information that is used in analytical processes. Organizing the lake into ponds allows the firm to apply different conditioning tasks, which is why most firms should start with creating a plan for data ponds. It is also critical to note that data has a shelf life for a business. Up to a point, the data is useful to the firm. But eventually that usefulness runs its course, at which point data can move into an archival pond.

Lake Usability

A data lake that is created without assessing how the data will be used limits its strategic potential. The lake can quickly become a one-way path where data is loaded but cannot be extracted to add value to the firm. There are several steps a firm can take to mitigate this risk:

  • Establish Context: Certain data types can lack context if viewed in a vacuum. It is essential to ensure data, like text, is given proper context to prevent ambiguity.
  • Create Metadata: Generate a map of the data that resides in the lake. This will allow data users to better understand and utilize the conditioned data in the lake.
  • Develop a Metaprocess: Produce and tag data processes in the lake. This step answers questions like when data was created, how much was generated in the lake, and who produced the data.

These steps also allow the team to document how the data in the lake can be integrated, thus limiting the impact of data viewed in silos and helps ensure data will interact across ponds.

Remember: the first step to preventing your lake from turning into a swamp is to plan the lake’s structure, internal processes, and uses. These key points will also help asset managers ensure that the lake does not become a one-way data repository. With a few planning steps up front, you can build a data lake that is a strategic asset to your firm.

Tags:

  • Data
Chris Guild
Chris Guild

Chris is a Citisoft Principal Consultant. He has over 18 years of experience in the financial services industry. He has expertise in order management, trading, trade support, settlement, custody, investment accounting, reconciliations, performance measurement, and data management. Chris has managed a variety of projects across service providers and investment managers. Prior to Citisoft, Chris led the middle office functions at a premier investment management organization.

Comments

Related posts

David Higgins and Spencer Baum
Blog • Jul 24, 2025 Solutions Market Perspective Series: An Interview with SS&C GIDS’ Spencer Baum
Blog • Jun 04, 2025 From Data Onboarding to Sharing: Enabling Offensive Agility with Today’s Cloud Data Platforms
Citisoft logo
  • Resources
  • Blog
  • About
  • Careers
  • Contact Us

Subscribe to Insights

Subscribe to our monthly newsletter to have insights sent directly to your inbox.

Subscribe
©2025 Citisoft. All rights reserved.
  • Terms of Use
  • Privacy Policy
  • Notice at Collection
  • Linkedin
  • Twitter