Articles

How Attentive Transformed Reporting with Corpus

Article illustration

Introduction

In this post, we'll be going behind the scenes to see how we created the initial version of Corpus — our business metrics platform — and how that allowed us to power Attentive’s Reports Library. Corpus serves as a single source of truth for reporting data with APIs that can be easily used in the frontend.

We’ll discuss how we went from scattered definitions and chaotic reporting to a centralized platform that allowed us to create a reporting product that customers love to use.

"The reporting and super easy user-friendly dashboards are amazing!"

We’ll start by taking a look at the problems that kicked it all off, our scrappy team’s approach to tackling it head-on, the tools we used, the puzzle pieces that made it all happen, and of course, the final picture: our shiny new platform. At the end, we’ll talk about some lessons we've learned along the way. So grab your favorite beverage, kick back, and let's dive into the nitty-gritty of Corpus.

Understanding the challenge

Attentive is a fast-growing company where change can happen quickly. This can pose challenges when it comes to reporting and the state of data.

Before Corpus, we had a legacy reporting system that served us well for many years. It was a platform where teams could define their reports using raw SQL (and there was some very gnarly SQL!), reports could be activated for a user, and the user could use the reporting UI to request a report. Minutes, or even many minutes later, they would receive an email with an Excel workbook export of the given report SQL. The report consisted of many rows, sometimes thousands, that the customer would have to parse through and visualize themselves.

The legacy reports aggregated the data every time they ran, and in some cases, they would fail due to large amounts of data or other issues. After they failed, the customer would have to request them again or contact Customer Support for help.

Customers wanted to get their data quickly in an easy-to-digest format, which wasn't possible with the current platform.

The SQL logic powering those reports? It was confined to each individual report and could not be easily reused in other reports or in the UI. If we wanted to reuse any data in other services or the UI, we would have to copy-paste the logic. As the business grew and evolved, the duplicated logic needed to change as well. How do you ensure the logic stays consistent across the board? And what do you do next month when the UI and reports no longer match? Which one is correct? It was becoming very difficult to maintain and have accurate data.

We needed a solution that would provide a single source of truth and could expose that data anywhere it's needed.

To satisfy customer needs, decrease the load on Customer Support and other internal teams, we wanted to improve the reporting product. We wanted to build The Reports Library:

A configurable report builder that lets clients create their own in-app and downloadable reports by selecting metrics, timescales, filters, and other options. This feature empowers clients to create and customize the reports they need, without requiring Attentive CS to manually create those reports.

Our awesome designers then worked on coming up with initial designs for our new platform. This was a cross-functional team effort where all of the engineers, our product managers, and designers closely collaborated to come up with the final designs for our new reporting platform.

Figma designs for the reports library and a report detail page

Unfortunately, the existing reporting platform could not achieve this very easily. This is when we decided that, before we could satisfy the product needs, we needed to build a better metrics platform - Corpus. And we could build it while dogfooding ourselves to construct the Reports Library.

Given the above problems and requests, we summarized the requirements for the Corpus platform as follows:

  • Central definition of metrics: Metrics are centrally defined to ensure a single source of truth.
  • Ownership and maintenance by source teams: Metrics are created and maintained by the teams that own the data, leveraging their deep understanding , and allowing for parallelization of new metric definitions.
  • Discoverability of existing metrics: We aim to make existing metrics easily discoverable to prevent duplication of data.
  • Simplified data access: To make metric data readily available, we implemented a dynamic API that allows ergonomic querying. This API abstracts away the complexities of data sourcing and the processes of aggregating, grouping, and filtering data across dimensions.
  • Observability: Since data integrity can be compromised by various issues, such as delays, inconsistencies, and infrastructure failures, it's crucial to have robust observability measures in place to alert us to any data problems.

Research

There was one problem: We were a team of five engineers, and we had never built something like this‌ before. Where and how do we start? Research!

We spent a lot of time understanding existing data engineering tools and reading blog posts about solutions and experiences. Out of all the research we did, one particular blog series resonated with us - AirBNB Minerva. Given our own requirements, an excerpt from Part 2 of the Minerva blog series was exactly what we needed:

  • Standardized: Data is defined unambiguously in a single place. Anyone can look up definitions without confusion.
  • Declarative: Users define the “what” and not the “how”. The processes by which the metrics are calculated, stored, or served are entirely abstracted away from end users.
  • Scalable: Minerva must be both computationally and operationally scalable.
  • Consistent: Data is always consistent. If definition or business logic is changed, backfills occur automatically and data remains up-to-date.
  • Highly available: Existing datasets are replaced by new datasets with zero downtime and minimal interruption to data consumption.
  • Well tested: Users can prototype and validate their changes extensively well before they are merged into production.

Unfortunately, Minerva is not open source, but at least we now had our North Star! We just had to figure out the simplest puzzle pieces to start with.

Building the puzzle pieces

Given our requirements and the North Star, we worked on defining the smallest puzzle pieces we would need to satisfy the platform requirements.

The aggregation piece

We wanted to make the metric queries performant and minimize the sizes of the datasets. To achieve this, we decided to pre-aggregate all the data. The vast majority of data from other systems, be it databases or events, was already available in Snowflake. This meant that Snowflake was the best place to do all the raw data aggregations.

Based on ‌reporting requirements, we needed to support multiple levels of aggregation, from 15 minutes to yearly.

Now that we knew where we would be doing our data aggregations, we had to figure out how.

Our BI teams already utilized dbt to model their datasets and used Airflow to run those models. Perfect!

Dbt is a modern data transformation engine which allows analysts to easily materialize data views. By simply writing select statements, tables can be defined, materialized, and referenced in other models. It gives us many benefits that we don't have to build ourselves. We could define models using SQL, and dbt would be able to manage the relationships between all the models, execute them, and automatically handle merging of the data. And by using dbt, we could also take advantage of dbt docs to catalog and have data lineage for any metrics we defined. A lot of freebies that allow us to move much quicker!

By creating custom Airflow DAGs, we could run the dbt models in two different ways:

  • Incremental load: run the dbt models incrementally on a configurable schedule. We can run every 5 minutes, or all the way up to hourly. This would refresh the data based on lookback days. For example, with a 2-day lookback, we would re-aggregate and merge the last two days of data.
  • Full refreshes: run the dbt models and refresh the entire metric from the time the data was available. This is useful if we come across an issue with the data or if a metric definition has changed and we need to start fresh.

The dbt tool gave us a lot, but we still had to ensure that each model was able to handle some specific logic needs:

  • Timezone conversions: all the data is stored in UTC, but in some instances, we need it to be in the customer time zone. The dbt models need to do this conversion as part of the data aggregation.
  • Configurable lookback windows: metrics can be configured to re-aggregate historical data for a specified number of days. This allows capturing any late-arriving data.
  • Update changed data only: the lookback windows recalculate existing data. Even if there was some late-arriving data, dbt would update all the data within that window. In order to minimize the number of rows impacted, this logic only merged the data that changed. This significantly minimized the number of changed rows and allows for more performant data syncs.

While the data warehouse is great for storing and aggregating a large amount of data, it's not best suited for powering the data in the UI, where we could have hundreds or thousands of customers looking at metrics.

As part of the Airflow DAG that handled aggregation, we also synced the data from Snowflake to a PostgreSQL database. Using PostgreSQL also allowed us to properly index the metric data and improve the overall query performance.

We used another off-the-shelf tool to handle this, called Meltano. Meltano allows us to define ETL pipelines and automatically sync the data from Snowflake to PostgreSQL based on the last modified date. And since we only update the last modified date when and if data changes, we were able to minimize the amount of data being copied.

Full metric refreshes were a different beast, and we discovered early on it's not something Meltano could perform efficiently. For some of the larger metrics, if using Meltano, full refreshes could take hours. Instead of Meltano, we ended up handling full refreshes by exporting the data from Snowflake to S3 and then loading it into PostgreSQL from S3. We can now do full refreshes in minutes! Load the CSV into a temporary table, run a table swap, and we have an entire metrics table refreshed without any downtime. Magic!

Metric definition piece

We can now execute the dbt models via Airflow to aggregate data in our data warehouse. Awesome! What about our platform users?

Dbt is a great tool, but we didn't expect every engineer to learn its syntax just to define their metrics, especially given the additional logic required for each dbt model. Plus, if we ever needed to introduce new logic across all models, hard-coded definitions would complicate maintenance. We also needed a way to catalog metrics and their metadata.

These needs were addressed by introducing the Metric Definition Language (MDL). MDL is a YAML file that defines metric metadata and how metrics are aggregated. Below is an example MDL for the Conversions metric:

dataSource: snowflake
table: attentive_dev.dw.attributable_conversions
metrics:
  - id: conversion_count
    field: id
    type: count
    name: 'Conversions'
    description: 'Number of conversions that resulted from a message based on your attribution window settings. A conversion typically means a purchase.'
    format: number
    connotation: positive


    filters: [ ]


    dimensions:
      - id: date
        field: converted
        type: timestamp
        name: 'Conversion Date'
        grains: [daily]
        inCompanyTimezone: true


      - id: company_id
        field: company_id
        type: list
        name: 'company id'
preAggregations:
  - id: conversion_count_aggregation
    lookback_days: 2
    type: groupBy
    metric: conversion_count
    dimensions: [ date, company_id ]


The Airflow DAGs would parse the MDLs, generate the dbt models, and aggregate the data. This allowed us to introduce new logic or bug fixes to all the metrics without impacting any of the definitions.

For the initial implementation, we decided to use denormalized tables instead of dimensional modeling. We knew we would eventually want to transition to dimension and fact tables, but for now, denormalized tables were sufficient. Once aggregated, the above metric would resemble the following table.

id date company_id conversion_count last_update
<unique id> 2024-03-01 123 6432 2023-03-02 23:13:34.531
<unique id> 2024-03-01 852 313 2023-03-02 23:13:34.531

As part of metric definitions, we also focused on creating self-serve tooling that would allow platform users to easily create and test new reports. We developed a Corpus CLI tool that enabled users to compile any new or existing metrics, aggregate data from any environment, and output the data to their own personal schema in Snowflake, which they could query and further validate.

Corpus' CLI tool is a Python tool that reuses much of the existing aggregation code to work with metrics without needing Airflow. This allowed us to have consistency with processes run locally and processes in a production environment.

Additionally, it allowed for rerunning the data and automatically comparing the new output against existing data. This made it much easier to validate issues and see the impacts of metric definition changes.

The data access piece

While the aggregated data in our data warehouse was usable directly in internal reporting via Looker, legacy reports, and other internal processes, we needed to make the PostgreSQL data available to the UI and other services. Enter our query layer, metrics-api!

We exposed two mechanisms to get the data:

  • gRPC endpoints for other services.
  • GraphQL for the UI.

The APIs support obtaining metric catalog, such as human-readable metric and dimension names, as well as retrieving the actual data in multiple formats.

The API can also be used to further filter, group, and aggregate the metric data as needed. For example, the following query would return a month's worth of daily data for the given company ID and include the monthly total value as aggregateValue.

{
	node(id: $companyId) {
		metrics(
			metricIds: ["conversion_count"],
			groupings: [
				{ dimensionId: "date", granularity: TIME_DIMENSION_GRANULARITY_DAILY }
			],
			filters: [
				{ 
					dimensionId: "date", 
					startDate: "2024-03-01", 
					endDate: "2024-03-31"
				},
				{
					dimensionId: "company_id",
					operator: FILTER_OPERATOR_EQUALS,
					value: $companyId
				}
			]) {
			definition { 
				metricId
			}
			aggregateValue
			groupedValues {
				groupingDimensionValues
				value
			}
		}
	}
}

The presentation piece

Now that all the data was easily queryable, we needed to display it.

We worked closely with our Product and Design team members to create a suite of React components that worked natively with the metrics-api GraphQL queries. Using Storybook allowed us to build these components in isolation without needing the APIs to exist. From line and bar charts to date pickers and reporting tables, these components were meant to be easily reused within the Reports Library and by any other team in other parts of the Attentive UI. A few component examples are below:

Metric tile, horizontal bar chart, and metric line chart (all from Storybook)

The Reports 2.0 piece

While the legacy reports could be updated to use the new metrics data in our data warehouse, to implement the Reports Library, we also needed a new way to define the reports that used Corpus and that could be accessed directly in the UI.

In the new world, instead of each report being one large SQL query, reports were now collections of metrics, their filters, and dimensions. The new reporting platform would be able to take a report definition, query the metrics API to get the data, and use the new presentation components to display a nice, user-friendly report that allows in UI filtering, grouping, and other advanced features. Below is a sample report definition for one of the more important reports, Attributable Revenue:

{
  "id": "attributed_revenue",
  "name": "Attributed Revenue",
  "domain": "billing",
  "class": "attributed_revenue_class",
  "metrics": [
    "conversion_count",
    "conversion_total_revenue",
    "average_conversion_revenue"
  ],
  "filters": [
    {
      "dimensionId": "date",
      "type": "timestamp",
      "datePeriod": "LAST_30_DAYS"
    }
  ],
  "groupings": [
    {
      "dimensionId": "date",
      "type": "timestamp",
      "granularity": "daily"
    }
  ],
  "summaryMetricIds": [
    "conversion_count",
    "conversion_total_revenue",
    "average_conversion_revenue"
  ],
  "summaryMetrics": {
    "summaryMetricIds": [
      "conversion_count",
      "conversion_total_revenue",
      "average_conversion_revenue"
    ]
  }
}

Putting it all together

You might have noticed that we have five unique puzzle pieces, and we have five Engineering members on the team. This wasn't an accident!

Once the puzzle piece blueprints were ready and we were sure they would all fit together, each of us took on our respective piece and implemented it in parallel:

  1. One engineer focused on building the Airflow DAGs that handle the aggregations.
  2. One engineer led the project and focused on the metric-api.
  3. One engineer focused on building the new Reports 2.0 architecture.
  4. One engineer focused on building all the new UI components.
  5. The team lead focused on building out the CLI tooling and providing support/glue where needed.

With a lot of ongoing teamwork, communication, and collaboration with our Product and Design teams, we smoothed out any rough edges and put all the pieces together. They fit perfectly! We got our final picture.

The final picture

All the puzzle pieces are now in place! We can define metrics, aggregate them, serve them via APIs, and have components to display them. What does that actually look like?

Corpus and Reports Library architecture

This is the architectural diagram of the initial version of Corpus. Since then, we’ve added many more features and improvements, but this initial version solved the problem we set out to resolve. We had a standardized, declarative, performant way to define and consume our metrics.

Powered by our new metrics platform - Corpus, we could now build the Reports Library! Customers no longer have to parse through thousands of Excel Workbook rows and create their own visualizations. They can self-serve their reporting needs right in the UI, getting a quicker view into their marketing performance. This reflects our core value of Championing the Customer. Our success is defined by our customers' success, and by understanding their needs, we craft sustainable and extraordinary solutions to help them thrive.

A real Reports Library report

And since we built the platform to be easily usable by other teams, they could use Corpus for their own metrics and fully remove many legacy systems. Our mission was a success; we have built the single source of truth. The data customers see in their reports is the same data they'll see throughout the Attentive UI.

As a summary in numbers, the Corpus platform currently:

  • Has over 300 metrics.
  • Handles average 56 requests per second with a p95 of 609ms.
  • The largest metric has over 3.4 billion rows.
  • Total Postgres size: 3.2TB.

Lessons learned

This was our initial version of the platform, and we built something we never had before. Since then, we've added many new features to support both our internal platform users and our external customers. However, with that initial version, there were a few important things we learned:

Teamwork

We had to work together, not be afraid to challenge and question each other to make sure what we were building made sense. Even though we spent a lot of upfront time planning, not every detail could be captured.

It was crucial to cultivate a comfortable environment within the team, where issues could be raised early if things veered off course. Despite being a fully remote team, constant synchronization and communication were maintained to ensure alignment as all the pieces came together. This cohesive effort was vital to the success of our platform.

Effective teamwork extended beyond the Engineering realm; involvement of Product and Design team members was equally essential. Through collaborative discussions and necessary compromises, we ensured that every element fit together seamlessly, even if it meant adjusting our initial plans.

Research

Since we initially weren't sure what to build, early research was critical to our overall success. Delving into existing tooling and solutions, both internally and externally, was instrumental in charting our course. This approach not only helped us find our North Star but also identified tools within Attentive that could help us build our platform.

Utilizing existing tooling helped us quickly build our initial version and be able to optimize when needed. Despite the extended research phase delaying the start of the platform's development, it proved instrumental in defining component contracts, enabling parallel development, and ensuring a cohesive end product. This early research is what helped all the puzzle pieces fit.

Internal use for  validation

We initially used the Corpus platform ourselves to build the Reports Library. This internal usage served as a crucial validation step, allowing us to identify any shortcomings and refine the platform before introducing it to other teams. By experiencing the platform firsthand, we gained invaluable insights into its functionality, usability, and areas needing improvement.Ultimately, using the platform ourselves proved instrumental in enhancing its quality, reliability, and user experience. This approach shows the significance of iterative development and continuous improvement, to ensure that the final product meets the needs and expectations of its intended users.

User support

Our experience highlights the crucial role of user support in driving platform adoption and success. As we worked with other teams to define the initial metrics for the Reports Library, we had to deal with diverse datasets. This emphasized the need for meticulous data validation and cleanup processes. By providing user support and implementing self-service tooling early on, we not only facilitated smoother data validation but also empowered teams to independently create their own metrics.

Looking back, these lessons reshaped our project management approach, emphasizing the importance of teamwork, research, validation, and user-centric design principles. They served as guiding principles throughout our journey, enabling us to navigate challenges and ultimately achieve our goals.

Ready to help us tackle our next challenge, together? Find your perfect role.

View all articles

Discover what it's like to work for us —
our culture, benefits, values, and more.