
Open-source alternatives to Datadog in 2026 — Grafana and Prometheus for observability you can predict
TL;DR
Datadog meters infrastructure monitoring at $15 per host per month and APM at $31 per host, so the invoice climbs as you add hosts.
The bill is also unpredictable: Datadog charges the whole month at your peak host count, so a five-day spike sets the price for all 30 days.
Grafana, Prometheus, and Alertmanager are the standard open-source monitoring stack — together about 147,000 GitHub stars — covering metrics, dashboards, and alerting.
On DANIAN that trio runs for €9 per app, per month — €27 flat, no per-host meter, in any of 21 regions.
Honest scope: this stack does metrics, dashboards, and alerts. It does not do APM tracing or log aggregation. Those are separate problems, covered below.
Why teams are leaving Datadog in 2026
Datadog is a capable platform. The reason teams leave is rarely the product. It is the invoice. Infrastructure monitoring starts at $15 per host per month, APM adds $31 per host, and custom metrics and log ingestion meter on top. The total moves every time your infrastructure does, and it is hard to forecast.
The billing model is what catches people out. Datadog meters your host count every hour. It then discards the top 1% of hours — about 7 hours in a 720-hour month — and bills the whole month at the next-highest hour. A short traffic spike that doubles your hosts sets the price for all 30 days, even after you scale back down. In Kubernetes the billing unit is the node, and an agent misconfigured as a sidecar can count every pod as a separate host.
A worked example makes it concrete. Say you normally run 40 hosts. A product launch autoscales you to 110 hosts for three days, then you scale back. Because the high-water mark survives once the top 1% of hours is dropped, the whole month bills near 110 hosts, not 40. That is the difference between roughly $600 and roughly $1,650 for infrastructure alone, for a spike that lasted three days. Custom metrics add another lever: each host includes a fixed allotment, and going over bills per hundred custom metrics. Teams that tag heavily often discover the overage only when the invoice lands.
The layering compounds the surprise. A team running 10 hosts pays $150 a month for infrastructure on annual billing, or $180 month-to-month. Turn on APM and that becomes $460 a month. Add log ingestion at $0.10 per gigabyte and custom-metric overages, and the number keeps climbing. Each APM host includes one million indexed spans; past that, indexed spans cost $1.70 per million. None of these meters is visible until the invoice arrives.
At mid-market scale the figures get serious. A 50-host deployment with APM and 100 GB of logs a day runs roughly $2,570 to $3,070 a month, before custom-metric overages. The pattern that pushes teams to look elsewhere is consistent: the bill is large, and worse, it is unpredictable from one month to the next.
None of this means Datadog is the wrong tool for everyone. It puts metrics, traces, logs, real-user monitoring, and security in one pane. Its integration catalogue is large and its onboarding is quick. If you need all of that managed under one roof today, Datadog is a reasonable choice — you are paying for breadth and for someone else running it. The real question is narrower. Do you need the whole platform, or are you paying platform prices for the infrastructure-monitoring slice you actually use?
What "alternative to Datadog" actually means
"Alternative" splits three ways, and each trades cost against effort differently. You can move to a cheaper usage-based SaaS and keep paying per gigabyte or per host. You can self-host the open-source stack and run it yourself. Or you can run the same open-source stack with someone else operating it.
A cheaper usage-based SaaS. Several monitoring vendors undercut Datadog on price and bill on data volume rather than per host. That can lower the number. It keeps the meter, though, and with it the same month-to-month uncertainty. You are swapping one usage bill for a smaller usage bill, not removing the variable.
Self-host the open-source stack. The software is free, and standing it up takes 30 to 90 minutes. Keeping it healthy is the ongoing part. You manage storage growth as metrics accumulate, tune retention and cardinality so the database stays fast, run version upgrades, and carry the on-call when the monitoring itself breaks. High cardinality — too many unique label combinations — is the failure mode that quietly fills disks and slows queries, and managing it is a skill. Running the stack with high availability means duplicate Prometheus servers and a layer like Thanos in front. For a team with the time and the appetite, this is a genuinely good path. For a team that would rather not, it is a second system to operate.
Managed open source. You run the same Grafana, Prometheus, and Alertmanager, and someone else patches, backs up, and monitors them. On DANIAN that is €9 per app per month. The trade is that you do not get root SSH on the underlying box — you get the running apps, your dashboards, your data, and a human on chat when something needs attention.
One thing to settle before going further. The Grafana–Prometheus–Alertmanager stack covers metrics, dashboards, and alerting. That is the infrastructure-monitoring slice of what Datadog sells. It does not do application performance monitoring — the distributed tracing of a request as it moves through your services. It does not aggregate logs. For centralized logs the standard open-source companion is a log backend such as Grafana Loki, which is on our roadmap rather than in the catalogue today. If APM and log search under one managed pane are non-negotiable for you right now, weigh that as you read on.
The shortlist
Three projects do the work, and they are designed to fit together. Prometheus collects and stores the metrics. Grafana turns them into dashboards and alerts you can read. Alertmanager decides who gets paged and how. All three are open source, battle-tested, and run in production from home labs to Fortune 500 fleets.
Prometheus — the metrics engine
Prometheus is the de-facto standard for metrics in cloud-native systems. It was built at SoundCloud in 2012 and was the second project to graduate from the Cloud Native Computing Foundation, after Kubernetes. It scrapes metrics from your services on a schedule, stores them in a time-series database, and evaluates alert rules written in PromQL, its query language. It is Apache-2.0 licensed, with around 64,000 GitHub stars.
The data model is the reason it spread. Prometheus pulls metrics from HTTP endpoints every 15 to 30 seconds, across node hardware, container platforms, and your own applications. A large ecosystem of exporters covers common infrastructure out of the box — node_exporter for servers, cAdvisor for containers, blackbox_exporter for probing endpoints — and in dynamic environments it discovers new targets automatically rather than waiting for a config edit. PromQL lets you turn raw counters into the numbers you care about: request rates, error percentages, and 95th- or 99th-percentile latency. Recording rules pre-compute the heavy queries so dashboards stay fast. We run managed Prometheus hosting for €9 a month. One instance scrapes many targets, so the price does not move as you add hosts. Best for: collecting and storing the metrics everything else reads.
An honest limit. Prometheus's local storage is built for recent data, not years of history at very large scale. Long retention or billions of active series is where teams reach for Thanos or Mimir, which is a bigger undertaking. For SMB and mid-size fleets, a single instance handles the load with room to spare.
Grafana — the dashboards
Grafana is the open-source standard for dashboards. It does not store metrics itself. It queries data sources — Prometheus, and more than 140 others — and renders panels, dashboards, and alerts. It is what most teams picture when they think of the monitoring screen. It is AGPL-3.0 licensed, with around 74,000 GitHub stars.
What makes it stick is flexibility. Template variables let one dashboard serve every host or service without duplication. Dashboards can be defined as code and version-controlled, so a monitoring view is reproducible rather than hand-built. Its unified alerting can raise alerts directly from a panel and route them, which matters for the next section. And because it speaks to more than 140 data sources, the same Grafana that reads Prometheus can also read a SQL database or a cloud metrics API, so one screen covers more than one system. We run managed Grafana hosting for €9 a month. Best for: the single visual pane your team actually looks at, plus dashboard-level alerts.
Prometheus Alertmanager — the routing layer
Alertmanager takes the alerts Prometheus fires and decides what happens next. It groups related alerts so one incident does not page you fifty times, removes duplicates, silences noise during planned maintenance, and routes notifications to Slack, PagerDuty, email, and other channels. Without it, alerts are raw signals. With it, the right person gets the right page. It is Apache-2.0 licensed, with around 8,500 GitHub stars.
The routing tree is the useful part. You can send database alerts to one team and network alerts to another, escalate by severity, and use inhibition rules to suppress downstream alerts when an upstream cause already fired — so a single network outage does not bury you in a hundred pages. Receivers connect to Slack, PagerDuty, email, webhooks, and more, and Alertmanager can run in a clustered mode so the routing itself does not become a single point of failure. We run Alertmanager for alert routing at €9 a month. Best for: turning raw alerts into a sane on-call experience.
An honest note on scope. Grafana can send alerts on its own. If your routing needs are simple, Grafana plus Prometheus is enough, and you can add Alertmanager later, when grouping, silencing, and multi-channel routing start to matter. Many teams begin with two apps and grow into the third.
How the three fit together
The stack is a short pipeline, and seeing it end to end makes the per-app pricing obvious. Each piece has one clear job, the data flows in one direction, and the components are loosely coupled. That means you can understand it, debug it, and reason about cost without untangling a single monolith. Here is the flow.
Exporters and your applications expose metrics on an HTTP endpoint.
Prometheus scrapes those endpoints on a schedule and stores the series in its database.
Prometheus evaluates PromQL alert rules against the stored metrics and fires when a threshold breaks.
Alertmanager receives those alerts, groups and deduplicates them, and routes the notification to Slack, email, or PagerDuty.
Grafana queries Prometheus and draws the dashboards your team watches day to day.
One Prometheus instance can scrape dozens or hundreds of targets. That is why monitoring 5 hosts and monitoring 50 hosts cost the same on a flat per-app plan, while a per-host meter charges you for each one. The separation of concerns is also why the stack ages well: each component is independently maintained and replaceable, so you are not locked into one vendor's release cycle.
Datadog and the managed stack, side by side
The clean comparison is between Datadog's infrastructure-monitoring tier and the open-source trio, because that is the slice the trio replaces. The table below puts the per-host meter next to a flat per-app price. APM is shown separately, since the open-source stack does not cover it.
| Component | Its job | DANIAN price | GitHub stars | License |
|---|---|---|---|---|
| Prometheus | Metrics collection and storage (PromQL, time-series database) | €9 / month | ~64,000 | Apache-2.0 |
| Grafana | Dashboards, visualization, and panel-level alerts | €9 / month | ~74,000 | AGPL-3.0 |
| Alertmanager | Alert routing, grouping, deduplication, silencing | €9 / month | ~8,500 | Apache-2.0 |
| Stack total | Metrics, dashboards, and alerts — fully managed | €27 / month flat | ~147,000 | Open source |
The second table shows where the money goes as you grow. Datadog's infrastructure column is the like-for-like number against the trio. The infrastructure-plus-APM column is there for context, because it is where Datadog's total lands once you add tracing — a capability the open-source stack does not replace.
| Hosts monitored | Datadog infrastructure (Pro, annual) | Datadog infrastructure + APM | DANIAN stack (flat) |
|---|---|---|---|
| 5 hosts | $75 / month | $230 / month | €27 / month |
| 10 hosts | $150 / month | $460 / month | €27 / month |
| 25 hosts | $375 / month | $1,150 / month | €27 / month |
Read the infrastructure column as the honest comparison: €27 flat against $150 a month at 10 hosts, for metrics, dashboards, and alerting. The infrastructure-plus-APM column shows Datadog's number with tracing added, but the trio does not replace APM, so treat that column as context rather than like-for-like. Prices are from Datadog's pricing page on annual billing; month-to-month infrastructure is $18 per host.
The DANIAN figure assumes the three base instances. A Prometheus instance holding long retention or very high metric volume may need a resource upgrade, which is a flat add-on. We do not upgrade your resources or bill you for it without your explicit consent — never a silent per-host charge.
What you give up by leaving Datadog
An honest comparison names the trade, not just the saving. Datadog is one platform doing many jobs, and the open-source trio is three tools doing one slice of them well. Moving costs you real things, and they matter for some teams more than the bill does.
You give up the single pane. Datadog correlates metrics, traces, and logs in one place, so a latency spike can link straight to the slow query and the error log behind it. With the trio, metrics live in Prometheus and Grafana; tracing and logs are separate systems you would add and stitch together yourself. You also give up application performance monitoring out of the box — following one request across services to find the slow hop. The trio does not do that.
You give up the breadth around the edges too: real-user monitoring, synthetic checks, security signals, database monitoring, and a very large catalogue of one-click integrations. Datadog's onboarding is fast precisely because so much is pre-built. If your team is small and would spend more engineering time wiring open-source pieces than the Datadog bill costs, staying is the rational call. The open-source stack wins when your need is infrastructure monitoring, your pain is the meter, and you would rather run a focused stack than rent a platform you use a fraction of.
How to pick — three questions
The right answer depends on what you need monitored and who is going to run it. These three questions usually settle it, and they matter more than any feature checklist. Answer them honestly before you compare a single price, because the wrong tool at any price is the expensive one.
Do you need APM and log search, or metrics, dashboards, and alerts? If you need full tracing and centralized logs under one managed roof today, a platform like Datadog, or a cheaper usage-based SaaS, fits better for now. If you need solid infrastructure monitoring, the open-source trio covers it.
Is your problem the size of the bill, or its unpredictability? Flat per-app pricing fixes the unpredictability outright. A spike in your traffic does not change your monitoring bill, because the price is not tied to your host count.
Do you have someone to run the stack, or do you want it run for you? Self-hosting is genuinely fine if you have the time and the on-call appetite. If you would rather not patch, back up, and babysit three services, that is what managed hosting is for.
FAQ
What is the best open-source alternative to Datadog?
There is no single tool. The open-source answer is a stack: Prometheus for metrics, Grafana for dashboards, and Alertmanager for alert routing. Together they cover the infrastructure-monitoring slice of Datadog — metrics, dashboards, and alerts. They do not replace Datadog's APM tracing or log management, which are separate tools.
Can Prometheus and Grafana fully replace Datadog?
For infrastructure monitoring, yes: metrics, dashboards, and alerting are covered. For the rest of Datadog, no. The trio does not do APM tracing, log aggregation, real-user monitoring, or synthetic checks. If you only use Datadog's host-monitoring tier, the swap is clean. If you lean on the wider platform, you would add separate tools.
What's the difference between Prometheus and Grafana?
They do different jobs and are used together. Prometheus collects and stores your metrics in a time-series database and evaluates alert rules. Grafana does not store anything — it queries Prometheus and renders the dashboards you look at. Prometheus is the engine; Grafana is the screen. Most teams run both.
Are Prometheus and Grafana really free?
Yes. Both are open source — Prometheus under Apache-2.0, Grafana under AGPL-3.0 — and free to download and run. What costs money is operating them: the server, the storage, patching, backups, and on-call. On DANIAN you pay €9 per app per month for the operation, not for the software itself.
Is Grafana free for commercial use?
Yes. Grafana is open source under the AGPL-3.0 licence and free to run for commercial purposes. The licence has obligations if you modify Grafana and distribute it as a network service to others, but running it for your own monitoring is unaffected. Prometheus and Alertmanager are Apache-2.0, which is more permissive still.
Do I need Grafana if I already use Prometheus?
Prometheus has a built-in expression browser for ad-hoc queries, but it is not a dashboarding tool. For readable dashboards, shared views, and panel-level alerts, Grafana is the standard companion. You can run Prometheus alone for collection and alert rules, but most teams add Grafana so the metrics are actually legible.
Do I need all three apps?
Not always. Grafana plus Prometheus is the core: collection and dashboards, with Grafana handling simple alerts. Add Alertmanager when you need grouping, silencing, deduplication, and multi-channel routing. Many teams start with two apps and add the third once on-call gets busy.
What is PromQL?
PromQL is Prometheus's query language. It turns raw metrics into the numbers you act on: request rates, error percentages, and 95th- or 99th-percentile latency. It powers both Grafana dashboards and Prometheus alert rules. It takes a little learning, but recording rules let you pre-compute the heavy queries so dashboards stay fast.
What can I monitor with Prometheus and Grafana?
Servers, containers, Kubernetes clusters, databases, message queues, and your own applications. A large ecosystem of exporters exposes metrics from common infrastructure, and your code can expose its own. Typical signals include CPU, memory, disk, request rate, error rate, and latency — anything you can express as a time series.
Does this work with Kubernetes?
Yes. Prometheus is the de-facto metrics standard for Kubernetes and was the second project to graduate from the Cloud Native Computing Foundation, after Kubernetes itself. It discovers and scrapes nodes, pods, and services automatically as they come and go, so you are not editing config every time the cluster scales.
Can I send alerts to Slack, email, or PagerDuty?
Yes. Alertmanager routes alerts to Slack, email, PagerDuty, webhooks, and other channels, and it groups, deduplicates, and silences them so one incident does not page you fifty times. Grafana can also send alerts directly from a dashboard panel if your routing needs are simple.
What does this stack not do?
It does not do application performance monitoring — the distributed tracing of requests through your services — and it does not aggregate logs. It covers metrics, dashboards, and alerting. For centralized logs you would add a separate log backend, which is on our roadmap rather than in the catalogue today.
What about application performance monitoring (APM)?
The trio does not do APM. APM follows a single request as it moves across your services to find the slow hop, and that is a separate problem. The open-source path is OpenTelemetry instrumentation feeding a tracing backend such as Jaeger or Grafana Tempo — distinct tools, not part of this three-app stack.
Does this stack do uptime or status-page checks?
Not directly — it watches systems you run, not public endpoints from the outside. For external uptime checks and a public status page, managed Uptime Kuma hosting is the open-source tool, and we run it too. It pairs well with this stack.
What about long-term retention or very large scale?
Prometheus's local storage is built for recent data, not years of history or billions of series. At that scale, teams add Thanos or Mimir, which is a larger project. For SMB and mid-size fleets, a single managed Prometheus instance handles the load comfortably.
Is Grafana plus Prometheus hard to set up?
Standing the stack up takes 30 to 90 minutes. Keeping it healthy is the ongoing part: storage growth, retention and cardinality tuning, version upgrades, and on-call when the monitoring itself breaks. If you want the tooling without that operational load, the managed version removes it for €9 per app per month.
How is managed hosting different from running it myself?
The software is identical. The difference is who carries the work. Self-hosted, you patch, back up, monitor, and answer the 2am page. Managed, we run the server and you use Grafana and your data, with help on chat when you need it. You trade root access on the box for not operating it.
How long does it take to migrate from Datadog?
Less than most people expect. You deploy Prometheus and point exporters at your hosts, rebuild your key dashboards in Grafana, and wire your alerts. The work scales with how many dashboards and alert rules you have, not with the migration itself. A focused team moves the core in days, not months.
How does €9 per app compare to Datadog at 10 hosts?
Datadog infrastructure monitoring at 10 hosts is $150 a month on annual billing, before APM, logs, or custom-metric overages. The Grafana–Prometheus–Alertmanager stack on DANIAN is €27 a month, flat, however many hosts Prometheus scrapes from a single instance.
Does the price stay flat as I add more servers?
Yes. One Prometheus instance scrapes many targets, so monitoring 5 servers and monitoring 50 cost the same on a flat per-app plan. The stack stays €27 a month regardless of host count. Very high retention or metric volume may need a resource upgrade, which is a flat add-on we apply only with your consent.
Can I take my dashboards and data with me if I leave?
Yes. Grafana dashboards export as JSON. Your Prometheus data and alert rules are yours to download. The software is open source and the configuration is portable, so there is nothing to leave locked behind. The apps are yours; we operate them for you.
Do I need a credit card to start?
No. The trial is 7 days, with no card required. Deploy Prometheus, Grafana, and Alertmanager, point them at something real, and see whether the dashboards and alerts fit how your team works. Billing starts only if you keep the apps past the trial.
What to do this week
If the Datadog invoice is the thing keeping you up, the move is small. Deploy managed Prometheus and point it at your hosts and services. Add managed Grafana and build one dashboard for the metrics you check most. Wire Alertmanager — or Grafana's own alerts — to your Slack or email so a real problem reaches you. You will have metrics, dashboards, and alerts running for €27 a month, flat, with us patching and backing up the stack.
You keep the standard open-source tooling the whole industry uses. You drop the per-host meter and the surprise invoice. And if you outgrow the base instances, the price moves in flat, consented steps — not with every host you add.
Sources: Datadog pricing (datadoghq.com/pricing); Grafana (grafana.com); Prometheus (prometheus.io); Grafana on GitHub (github.com/grafana/grafana); Alertmanager documentation (prometheus.io/docs/alerting). Prices verified June 2026; observability pricing changes — check the source pages before quoting.
