What is Data Enrichment?

Core Concept

💡
AI Data Labeling: Textual data is rich with information and meaning. The function of enrichment is to extract this information. Each enrichment task (i.e., prompt) extracts a different aspect of meaning from the raw text of a conversation. We refer to these extraction tasks as "dimensions of analysis" or simply dimensions.

Unstructured Data

Raw conversational text between customers and support agents or chatbots that contains hidden insights and patterns.

AI Enrichment

The process of using large language models to extract meaning and add labels to unstructured text data.

Video: Core - What is Data Enrichment [1:30]

Data Enrichment Explained

From Unstructured to Structured Data

We refer to the textual data in a conversation—the messages between a customer and a chatbot or agent—as unstructured data. Dimension Labs allows you to add structure by adding labels.

Example: Raw Conversation Data

Session ID	Conversation Text
112203	User: Love your new product Bot: Amazing, thank you!
231107	User: Track order Bot: Happy to help with that.

The above table is a dataset with two columns: a unique session ID and conversation text. Within the textual data lies additional information. Using AI we can identify, extract, and add this information to the dataset as a new column. We call this process enrichment.

The Enrichment Process

Each conversation transcript is analyzed by a large language model given a specific analytical task (prompts).

🔍
Example: Sentiment Analysis

Task: Sentiment Analysis

Prompt: "Identify the sentiment of this conversation."

Output Dimension: sentiment_cluster: {positive, negative, neutral}

The AI applies labels according to the prompt instructions, which are then added as new columns in your dataset.

Result: Enriched Data with Sentiment

Session ID	Conversation Text	Sentiment
112203	User: Love your new product Bot: Amazing, thank you!	`positive`
231107	User: Track order Bot: Happy to help with that.	`positive`

⚡
Note: We prefer to use the term "data labeling" to describe the enrichment process. While the output is similar to coding text, text mining, or AI classification these terms offer a less precise description of the underlying technology.

Dynamic Enrichment

The enrichment process is more versatile and easier to customize than many existing approaches for classifying data (i.e., machine learning models). AI may run multiple out-of-the-box and/or custom analytical tasks concurrently.

Moreover, these tasks may include both predefined and open-ended or "dynamic" labeling. Here's an example of dynamic reason labeling:

🎯
Example: Reason Analysis

Task: Reason Analysis

Prompt: "Detail the reason the customer engaged with support."

Output Dimension: reason_session: {dynamic, based on conversation content}

Result: Adding Dynamic Label Enrichment

Session ID	Conversation Text	Sentiment	Contact Reason
112203	User: Love your new product Bot: Amazing, thank you!	`positive`	`new product feedback`
231107	User: Track order Bot: Happy to help with that.	`positive`	`order tracking inquiry`

⚡
Note: Unlike sentiment analysis with predefined categories, reason analysis generates dynamic labels based on conversation content. These organic labels can be consolidated into themes through data mapping.

Why Enrichment Works

The Problem: Too Much Text

A single conversation is easy for a person to interpret. Thousands are not. As volume grows, it becomes:

Challenging to detect consistent patterns across all conversations.
Impossible to quantify why customers engage, what they feel, and how issues are resolved.

Legacy tools like keyword searches or static machine-learning models can’t keep up. They rely on brittle rules, struggle with entity-specific nuance, and deliver low accuracy compared to modern language models.

The Solution: Adding Structure

AI enrichment uses large language models (LLMs) to read each conversation the way a person would—then labels it to add structure.

Each analytical task (prompt) extracts one aspect of the text. These labels are added as new columns in your dataset (dimension). Thesedimensions of analysis are powerful and highly customizable.

Examples:

Task: Determine the emotional tone of conversations

Prompt: "What is the sentiment of this text?"

Output Options: positive, negative, neutral

Use Case: Track customer satisfaction and identify trends in emotional responses.

The Benefit: Scaling Human Judgment

The enrichment process mirrors traditional human data coding, but at machine speed.

Human Approach

Researchers apply sentiment and topic codes by hand using detailed guides.

AI Enrichment

LLMs apply the same logic automatically across thousands of sessions.

The result: human-level comprehension at enterprise scale with consistent labeling and faster turnaround.

The Next Level: Dynamic Labeling

Discover what you didn't know.

Traditional systems can only apply predefined options. AI enrichment can generate new labels dynamically, based on what appears in the text.

For example: Instead of choosing from fixed categories, the AI may surface organic themes such as “bonus code confusion” or “crypto withdrawal delay.”

Traditional Approaches

❌ Keyword searches miss context
❌ Static ML models lack flexibility
❌ Manual coding doesn't scale
❌ Predefined categories limit discovery

Dynamic Enrichment

✅ Understands context and meaning
✅ Adapts to new patterns automatically
✅ Processes unlimited volume instantly
✅ Generates dynamic insights organically

This enables genuine discovery—finding issues and trends you didn’t predefine.

Dynamic labeling is only possible through advanced generative transformer models (GPTs) that understand language contextually.

Reliability and Precision

🎯
LLMs identify patterns: AI enrichment avoids issues of hallucination because it leverages the pattern recognition and reasoning skills of LLMs with minimal generative output.

When used for enrichment LLMs deliver on:

Consistency: No variation between human coders
Contextual Understanding: Captures nuance and meaning, not just keywords
Reliability: Low hallucination rates in recognition-based tasks

Validation studies confirm that hallucination rates in enrichment tasks are extremely low.

Why AI Enrichment is More Reliable Than Generation

Recognition vs. Generation

Generation Tasks (writing, creativity): Higher risk of hallucination
Recognition Tasks (labeling, classification): Extremely low hallucination rates

Pattern-Based Analysis Large language models excel at identifying consistent patterns in text structure, making them highly reliable for enrichment tasks.

Contextual Understanding Unlike simple keyword matching, LLMs understand:

Conversational flow and context
Implicit meaning and sentiment
Industry-specific terminology
Customer intent and emotion

Multi-Channel Enrichment

Dimension Labs Conversational Schema (DLCS) is used to combine different sources of messy, textual data into a "single source of truth" with a consistent format meant specifically for unstructured data. This process of normalization and standardization is meant specifically for analyzing conversational data and harmonizing it with other sources of unstructured text, allowing for high-quality omni-channel data enrichment.

The result: An ability to create data pipelines derived from the full universe of your customer interactions—a process in which data enrichment is the first step.

In Summary

Textual data holds valuable, nuanced meaning.
AI enrichment extracts that meaning through labeled dimensions.
Dynamic labeling reveals new, evolving patterns in customer feedback.
LLM precision ensures accuracy and trustworthiness at scale.

Enrichment transforms unstructured conversations into structured intelligence—allowing you to see what customers mean, not just what they say.