Datadog - Akshay Murthy

Datadog | Core Data Visualization & Monitoring

May 2023 – August 2023

Project Overview

Project Deliverables

- User Research
- User Interface
- Customer Personas
- Low-Fidelity Mockups
- Interactive High-Fidelity Prototype
- Final Presentation

My role

- Product Designer
- User Researcher

Project Context

- Datadog Internship Project, Core Data Visualization
- Duration: 12 Weeks
- Team: Akshay Murthy, Karl Sluis (Manager), Kemper Smith (Mentor), Maxime Matheron (Product Manager)
- Tools + Skills: Figma, Java D3, DRUIDS Design System, User Research, Rapid Prototyping, User-Centered Design, Cross-Functional Collaboration

Background

The Problem

The existing monitor creation/configuration flow within Datadog lacks adequate visual feedback and guidance for users. Given the number of customer support tickets, users are unsure about how their monitors are being evaluated, how to improve their monitors for better accuracy, and select optimal threshold values.

Purpose

Monitors are a core offering to the Datadog product. The current monitor creation/edit flow within Datadog’s platform confuses some users. This project has identified data visualization and clearer use of affordance as an opportunity to help explain how monitors work, whether a given monitor will evaluate and alert by the user’s expectations, and ultimately improve the user experience and reduce complaint tickets. This project aims to address ~70% of all user created monitors, while utilizing the DRUIDS design system and existing design patterns to enhance the overall experience.

What are Monitors?

Monitors help SREs (Site Reliability Engineers) understand when critical performance changes are occurring in their applications. Users can create monitors that actively check metrics, integration availability, network endpoints, and more.

The Design Process

Project Goals

The main goal is to offer SREs a unified solution for configuring and editing monitors with additional visual feedback and guidance. In addition, help them to freely explore and test monitors without needing to create a new monitor for every edit. While keeping these goals in mind, it was important to focus on maintaining the the user flow and various core interactions of the existing product to minimize user confusion.

Existing Product Research + Component Analysis

To better understand Datadog’s DRUIDS design system and the design patterns for other Datadog services, I analyzed three different services which included datchdog, APM, and Logs Management. These products had similar, yet slightly different design patterns which was used as a source of information for the Monitors page. I also annotated aspects of our existing platform by identifying potential areas of improvement and components that appeared to be confusing.

Competitor Product Research

I analyzed other observability platforms including Grafana, Splunk, New Relic, and Cisco, which helped inform various design decisions and considerations for what needed to be changed to provide an enhanced and more direct monitor configuration experience. From my research, I identified components such as data tables, tooltips, tiles, and modals that all served a different purpose in communicating pieces of information, and enhancing a user’s understanding. Additionally, I took note of the interaction patterns on hover and clicked states, as they often correlated with the data point portrayed on the visualization.

User Journey Map

Below is a user journey map that outlines the various jobs, touchpoints, pains, and gains an SRE experiences while going through the monitor creation process. It’s important to note that the red squares indicate the most pressing opportunity spaces in the user journey, as highlighted by RUM (Real-User Monitoring) analysis and understanding customer support tickets.

Existing Design

Current Platform Demo

This video highlights the existing design of the monitor creation/edit page, which was the main focus of the project.

Areas for Improvement

After thorough analysis of the existing product, and understanding the main pain points SREs experience while creating new monitors and configuring them, there are 5 key areas that this project will focus on addressing.

Preliminary Development

Wireframes

I began the design phase by creating preliminary wireframes detailing the different overlays and layouts, focusing on information hierarchy and data visibility. Given that this project emphasized the data visualization aspect, I had to design tooltip and overlay concepts with high granularity.

Low-Fidelity

To visualize the best layout for the page, which included the time-series graph, monitor creation stepper, monitor title, and high-level metrics, I created multiple lo-fi layout variations to help identify the most optimal structure. Based on existing design patterns, a 3-block layout was the most optimal for displaying each component clearly in a single view.

High-Fidelity Prototype

Using Datadog’s DRUIDS design system I developed the high-fidelity prototypes for the project which consisted of 30+ screens. Below, I have outlined five key aspects of my prototype that address the problems above.

Part 1: Three-Block Layout w/ Sticky Scrolling

Timeseries evaluation graph remains sticky at the top of the page.
Monitor configuration stepper follows a normal scrolling pattern and sits below the graph component.
Collapsable side panel with high-level summary metrics about the monitor, shows group breakdown of top alerting groups and notification recipients.

Part 2: Event Overlay + Physical Evaluation Window

Rolling evaluation window dynamically changes according to the user-selected query window.
Event overlay bar, in greyscale or fullcolor, displays volume of notifications at a given time period, indicating transition states.
Enhanced tooltip upon hover shows the breakdown of each type of notification at a given time.

Part 3: High-Level Metrics Overview w/ Dynamic Comparison

Notification breakdown to see the total volume of ALERT, WARN, RECOVER and NO DATA states.
Stacked Toplist graph displays breakdown of notification volume by type according to monitor group or notification recipient.
Dynamic comparison enables metrics and monitor groups to update according to user real-time changes; allows users to test and compare different monitor configurations to reach optimal outcome.

Part 4: Single & Multiple Series Cumulative Windows

A dynamic window visually expands according to current time during evaluation period
New timeseries visual to illustrate SUM aggregation method, aggregates data points into a single line as a rollup

Part 5: Improving the Overall Experience

Zooming states with the addition of zoom in, zoom out, and reset buttons near timeseries graph
Editable monitor title using existing edit design pattern
Dynamic graph subheader that changes according to user configuration, describes how/what the monitor is evaluating.

Design Considerations + Tradeoffs

During my design process, I went through countless iterations of tooltip designs, table layouts, and card structures. Ultimately, however, I prioritized readability, scale, spatial awareness to ensure that key pieces of information were communicated, but didn’t take up unnecessary space that would otherwise be used for other components on the page.

Visual Design Language

I leveraged Datadog’s DRUID Design System within Figma to execute the high-fidelity prototypes for this project. The design system consisted of common design patterns, components, and accessible color schemes that were apparent across Datadog’s core product interfaces, so it was essential to maintain this consistency to create a familiar experience that resonated with the user base. Additionally, I carefully referenced the DRUID’s documentation to craft functional implementation notes for ease of handoff, and understand when to use different component states.

Design Components

Numbered section stepper was used in the monitor configuration section which includes the type of detection method, metric definition, and other customizable parameters for monitor creation. Any changes made here would dynamically update the timeseries graph and side panel.

Collapsable side panel component was used to display the monitor overview and stacked toplist graph information groups. An ample amount of space was needed to accommodate the variable size of the toplist graph according to monitor states and groups.

Data table component was used to display the types of monitor states at any given time, as displayed on the timeseries graph. The data table updates dynamically according to time range and user edits (pictured is a default table).

Measuring Success

To track whether this project was successful and helped alleviate the primary pain points of our user, there were two main metrics we tracked:

Project Reflection

Challenges

Understanding the complexities of Datadog’s product, use cases, and design patterns in a short timeline.
Focusing exclusively on the data visualization components such as overlays, tooltips, indicators, etc. to communicate information.
Creating space-aware layouts given the sheer amount of data and information that needed to be communicated.
Conflicting feedback from engineering and design teams on aspects like table layout or back-testing data.
Catering new features around technically knowledgable user base, understanding user skill level.

Next Steps

Gather further engineering feedback from engineering and partner customers.
Get further information about cumulative rollup methods (single and multiple series), and understand the implementation need for arrow indicator above the evaluation window.
Validate proposed solution with more customer feedback using supporting tickets.
Analyze the feasibility of a new event overlay tooltip to display notification types for both single and multiple-series.