ObservabilityCON 2026: Grafana Leverages AI to Transform Observability

April 23, 2026

ObservabilityCON is the premier event for the Grafana Cloud ecosystem and modern observability. The 2026 edition marked a clear turning point: artificial intelligence is no longer just an add-on; it has become the core driver of the solution. From cost reduction and tool simplification to incident management automation, here are the key takeaways.

Adaptive Telemetry: Less Noise, More Value

Controlling observability costs

One of the major challenges facing technical teams today is the rising cost of observability. The larger the infrastructure, the more the volume of collected data—metrics, logs, and traces—skyrockets, and with it, the bill.

GrafanaCloud addresses this issue with Adaptive Telemetry, an approach that involves collecting only the data that is actually useful. Rolled out gradually since 2023, it now covers:

Metrics (since 2023)
Logs (since 2024)
Traces and profiles (since 2025)

The idea is simple: rather than storing everything and sorting it later, the system determines in real time whether data is worth keeping, based on relevance criteria (anomalies, delays, errors, etc.).

In the same vein, Grafana is announcing Grafana Focus, a project developed in collaboration with the FinOps Foundation, aimed at unifying and standardizing the interpretation of observability metrics. This tool is eagerly awaited by FinOps teams and CIOs who struggle to reconcile infrastructure costs with actual usage.

Please note: There is not yet an official Terraform provider for these new tools.

Another new feature aimed at enterprise customers: Grafana Bring Your Own Cloud. Unlike the traditional SaaS model, where data passes through Grafana’s servers, this managed service allows large organizations to store their data directly in their own cloud tenant. This model meets data sovereignty requirements and the strict security policies of certain industries.

Reducing complexity: Grafana becomes more accessible

Grafana is an extremely powerful tool, but historically it has been difficult to get the hang of. As one speaker put it: "Using Grafana is like woodworking—you can do anything, but you have to start from scratch."

To address this issue, several initiatives have been proposed:

Ready-to-use dashboards for the most common use cases: Linux monitoring, Kubernetes, web applications... No need to build your own dashboards from scratch.
Database Observability: a new feature designed for in-depth database analysis. It allows you to view and analyze SQL queries, identify slow queries, and understand the impact of the database on an application’s overall performance.
Knowledge Graph: a horizontal platform that automatically maps services, their dependencies, and their interactions. It helps answer questions such as "Which services are affected if this database goes down?" or "What is the critical path between these two microservices?"
Insights: an engine that automatically detects unusual latency, configuration changes, and abnormal behavior without requiring manual setup of alert rules for each scenario.
Root Cause Analysis Workbench: a centralized portal that aggregates all detected issues and suggests corrective actions. The goal is to provide teams with a unified view of ongoing incidents, regardless of their source.
Fleet Manager & Instrumentation Hub: Using a lightweight agent, you can monitor all compute resources—including containers, EC2 instances, Lambda functions, and managed services—and centralize all data in Grafana Cloud.

Grafana Assistant: AI for Everyone

This is undoubtedly the most anticipated and talked-about announcement at this year’s ObservabilityCON. Grafana Assistant is an AI assistant built directly into the Grafana Cloud platform. Officially available since January 2026, it is priced at $20 per user per month.

Already widespread adoption

With 17,000 daily users, the enthusiasm is palpable. Feedback from the community has been very positive, and new use cases are emerging every day during POC phases. One particularly striking example was cited: during a POC, an incident occurred in production, and it was the assistant that identified the root cause before the teams even had time to respond.

Who is this for?

The assistant was designed to be useful for everyone, not just experts:

Innovative features

The assistant does more than just answer questions. It’s gaining new capabilities:

Trace analysis: In addition to metrics and logs, the assistant can now analyze traces to identify distributed performance issues.
SQL connection: Ability to query relational databases directly from the wizard.
Rule System: permanent instructions given to the AI to define its behavior, such as "always respond in French," "prioritize critical alerts," or "always follow these naming conventions." A powerful tool for standardizing practices across an organization.
MCP (Model Context Protocol): The assistant can connect to external tools such as GitHub or Jira, and more generally to any MCP-compatible tool. This paves the way for fully integrated incident management workflows.
Infrastructure Memory: a set of AI agents that continuously scan the infrastructure, store their knowledge in a database, and enable the assistant to provide much more contextualized and accurate responses. The knowledge is cached, which also improves response times.
Slack & Teams Integration: The assistant can interact directly within the team's communication channels. It's native ChatOps—no more switching between tools to get a diagnosis.
Playbooks: You’ll be able to teach the agent procedures specific to your organization. Specifically, you can describe the steps to follow for a specific type of incident, and the assistant will apply them automatically.
Learning from past data: The assistant can draw on past post-mortems and investigations to improve its future analyses. The more your organization uses it, the more accurate it becomes.
Generating Mermaid diagrams: The wizard can generate architecture or flow diagrams directly within the interface.
RBAC compliance: The assistant operates within the limits of each user's permissions. There is no risk that a user will access data they are not authorized to view via the AI.

Automated investigations: the flagship feature

The Investigations feature is one of the most impressive. When a service degradation is detected in one or more services, the user can trigger a full investigation. Within 5 to 10 minutes, the agent:

Collects and correlates available signals (metrics, logs, traces)
Formulate hypotheses and test them
Identifies the likely root cause
Generates a structured report with a flowchart, visualizations, and recommended actions

It’s not a black box: the assistant explains its reasoning step by step, allowing teams to validate or correct its analysis.

Grafana IRM: Streamlining Incident Management

Incident management is often spread across multiple tools: an alerting tool, a ticketing system, a communication channel, a post-mortem tool… Grafana IRM (Incident Response & Management) aims to centralize everything into a single platform.

The complete lifecycle of an incident

Here's how incident management works with IRM:

Detection: An alert is triggered based on SLOs or configured alerting rules.
Notification: The appropriate personnel are automatically contacted based on on-call schedules, and the issue is resolved without manual intervention.
Incident creation: An incident is created either through the mobile app or directly in Grafana. A dedicated Slack channel is automatically created.
Investigation: Since the incident, you can trigger a Grafana Assistant investigation. The results are sent directly to the Slack channel.
Documentation: An LLM generates a structured incident report, with the option to use custom templates.
360° view: The team has a comprehensive overview of metrics and logs related to the incident, the SLOs for the affected service, historical average resolution times, service owners, and dependencies on other services.
Post-mortem: The report has been generated and can be fed into Grafana Assistant to improve future investigations.

Integrations and Availability

IRM integrates bidirectionally with existing ITSM tools: ServiceNow, Jira, and others. Incidents can be created and updated on both sides.

MRI is not available as open-source software. AlertManager remains the open-source alternative for alert management.
Some features are available in the Free Tier for the first three users.

Adaptive Telemetry: Intelligence for Data Collection

Adaptive Traces: Keep Only What Matters

Traces are extremely valuable data for understanding the behavior of a distributed application. They allow you to track a request from start to finish, across all the services it passes through, and to pinpoint exactly where and why an error or slowdown occurs.

The problem? Logs are large, verbose, and expensive to store. In a high-traffic system, logging 100% of the data is neither financially viable nor technically necessary.

Adaptive Traces solves this dilemma with a smart approach:

We wait until the trace is complete before evaluating it
A set of criteria is applied: Does the trace show an anomaly? Abnormal latency? An error?
If so, it is retained. Otherwise, it is discarded.
AI continuously refines selection criteria using predictive telemetry rules

The result: less storage, lower costs, but a cleaner and more actionable signal.

Profiles: Dive into the Code

Profiles (via Grafana Pyroscope) are a complementary dimension that is often underutilized. While traces show where a problem occurs in the architecture, profiles reveal why it occurs in the code—which function is consuming too much CPU, which memory allocation is excessive, and which call is causing a bottleneck.

Grafana now offers intelligent profile sampling: profiles are collected intensively only when anomalies are detected, allowing you to focus analysis where it’s truly needed—especially during releases or periods of high traffic.

OpenTelemetry on Grafana Cloud: Simplifying Deployment

OpenTelemetry has become the de facto standard for instrumenting modern applications. It provides a unified framework for collecting metrics, logs, and traces, regardless of language or platform. But implementing it remains complex: there are many components to assemble, environment-specific configurations, and pipelines to build…

As one speaker put it: "It's like having a huge box of LEGO in front of you, without the instructions."

Grafana now offers a streamlined process for deploying OpenTelemetry directly from the interface, using a layered approach:

1. The infrastructure layer

Grafana Alloy (Grafana's telemetry collector) unifies the collection of logs and metrics by enriching the data with environment metadata (cluster, namespace, pod, region, etc.). It serves as a unified pipeline for Prometheus and Loki, greatly simplifying configuration.

2. The services layer

This provides a clear overview of the health of each service and the system topology—how services communicate with one another, what their dependencies are, and what the data flows are.

3. The application layer

Instrumentation libraries are available for most common programming languages. The level of maturity varies by language; for example, REST protocols remain more difficult to instrument than gRPC calls.

What to watch out for

GrafanaCON Open Source will be held this year in Barcelona, with a focus on the open-source community and contributions to the ecosystem.
Grafana Tempo, Grafana’s distributed tracing engine, deserves special attention in the coming months, particularly given the developments surrounding Adaptive Traces.
The question of open-source alternatives to Grafana Assistant remains unanswered; no equivalent tool was officially mentioned during the event.
Teams using the Enterprise version can request the integration of features that were initially available only in the cloud, including the AI Assistant. Deployment remains straightforward: a license in the form of a hash is all that’s needed, even in a client environment.
Following the discontinuation of MinIO, Grafana has not taken a position on a potential technology shift. We will need to monitor official channels in the coming weeks.

In summary

ObservabilityCON 2026 confirms that Grafana is no longer positioning itself merely as a visualization tool, but as an intelligent, integrated observability platform. AI, via Grafana Assistant, is now at the heart of the product and accessible to everyone; it is capable of analyzing all signals and is becoming increasingly autonomous in resolving incidents.

For organizations looking to modernize their approach to observability, the signs are clear: the future lies in adaptive telemetry, automatic signal correlation, and assistants capable not only of diagnosing issues but also of taking action.

Antoine ORRU
DevOps Consultant

October 14, 2025

WAX CONF 2025: Standardization and automation of architectures at scale

Learn more

September 23, 2025

From protocol to flaw: Keepalived reveals the limitations of VRRP