Observability Azure

Application Insights with OpenTelemetry: Distributed Tracing and Adaptive Sampling for .NET

The classic Application Insights SDK is in maintenance mode. New feature work in Azure Monitor goes to the OpenTelemetry-based distro, and if you are still running Microsoft.ApplicationInsights.AspNetCore you are accumulating a migration you will eventually be forced to do under worse conditions. The good news is that the new path is not a rewrite — it is a different package, a different startup call, and a model where your telemetry is System.Diagnostics.Activity objects exported to the same backend you already query with KQL. This is a working tour of that migration, the sampling behavior that actually controls your bill, and the correlation fields that make the application map and transaction diagnostics work.

1. Migrating from the classic SDK to the Azure Monitor OTel distro

The classic SDK exposed TelemetryClient, TelemetryConfiguration, and ITelemetryInitializer. The distro replaces those with the OpenTelemetry APIs (ActivitySource, Meter, ILogger) and a single registration call that wires the TracerProvider, MeterProvider, and log exporter to Azure Monitor in one shot.

Remove the old package and add the distro:

dotnet remove package Microsoft.ApplicationInsights.AspNetCore
dotnet add package Azure.Monitor.OpenTelemetry.AspNetCore

Azure.Monitor.OpenTelemetry.AspNetCore is the distro. It transitively pulls the OpenTelemetry SDK plus the ASP.NET Core, HttpClient, and SQL client instrumentation libraries, so a default web app gets request, dependency, and exception telemetry with one call. (For non-web workloads — workers, console apps — use Azure.Monitor.OpenTelemetry.Exporter and build the providers yourself.)

// Program.cs
using Azure.Monitor.OpenTelemetry.AspNetCore;

var builder = WebApplication.CreateBuilder(args);

builder.Services.AddOpenTelemetry().UseAzureMonitor();

var app = builder.Build();

Set the connection string out of band — never hardcode it. The distro reads APPLICATIONINSIGHTS_CONNECTION_STRING from the environment automatically:

export APPLICATIONINSIGHTS_CONNECTION_STRING="InstrumentationKey=00000000-0000-0000-0000-000000000000;IngestionEndpoint=https://<region>.in.applicationinsights.azure.com/;LiveEndpoint=https://<region>.livediagnostics.monitor.azure.com/"

The connection string carries a regional ingestion endpoint. The bare instrumentation key (classic style) is deprecated for ingestion routing — always use the full connection string, and on Azure App Service / Functions set it as an app setting so the platform injects it.

Two breaking-change classes will bite during migration. First, TelemetryClient.TrackEvent / TrackMetric have no direct equivalent: custom events map to nothing in OTel, and you replace custom metrics with a Meter. Second, any ITelemetryInitializer or ITelemetryProcessor you wrote does not load — the equivalent is an OpenTelemetry Processor<Activity>, covered in section 5.

2. Dependency, request, and exception telemetry with the activity model

Everything is an Activity. An incoming HTTP request becomes a server Activity (ActivityKind.Server), an outbound HttpClient call or SQL query becomes a client Activity (ActivityKind.Client), and the distro maps these to the requests and dependencies tables on export. You do not write that mapping; the instrumentation libraries emit the activities and the Azure Monitor exporter translates OTel semantic conventions into the Application Insights schema.

For your own code, define an ActivitySource and create spans explicitly. The source name must be registered or its activities are dropped:

using System.Diagnostics;

public static class Telemetry
{
    public static readonly ActivitySource Source = new("Orders.Api", "1.0.0");
}

// In a handler
using var activity = Telemetry.Source.StartActivity("ReserveInventory");
activity?.SetTag("order.id", orderId);
activity?.SetTag("order.line_count", lineCount);

Register the source so the SDK samples and exports it:

builder.Services.AddOpenTelemetry()
    .UseAzureMonitor()
    .WithTracing(t => t.AddSource("Orders.Api"));

Exceptions are recorded on the active activity, not as a separate top-level call. The ASP.NET Core instrumentation sets the activity status to Error on an unhandled exception and records the exception event; that surfaces in the exceptions table joined to its parent request. To record a handled exception explicitly:

try
{
    await reservationClient.ReserveAsync(order);
}
catch (ReservationConflictException ex)
{
    activity?.AddException(ex);            // .NET 9+ first-class API
    activity?.SetStatus(ActivityStatusCode.Error, ex.Message);
    // ... compensate
}

The exporter populates operation_Name on a request from the route template (for example POST Orders/{id}), not the raw URL. If your operation_Name shows full URLs with IDs baked in, you have unparameterized routes and your app map will fragment into thousands of distinct operations.

3. Adaptive vs fixed-rate ingestion sampling and how itemCount is preserved

This is the section that controls your bill, and it is the one most teams get wrong. There are two distinct sampling models, and the distro does not use the classic one.

Adaptive sampling was a classic-SDK feature: the SDK watched the telemetry rate and dynamically dialed the sampling percentage up or down to hit a target items-per-second. It is implemented by AdaptiveSamplingTelemetryProcessor and is not available in the Azure Monitor OpenTelemetry distro. If a migration doc or an old runbook tells you to configure adaptive sampling on the new distro, it is stale.

What the distro gives you is fixed-rate sampling via a single property:

builder.Services.AddOpenTelemetry().UseAzureMonitor(o =>
{
    // Keep 25% of traces. Range 0.0 - 1.0. 1.0 = keep everything.
    o.SamplingRatio = 0.25F;
});

SamplingRatio drives an ApplicationInsightsSampler that is trace-ID-based and consistent: the keep/drop decision is a hash of the trace ID, so every span of a given trace makes the same decision across every service that uses the same ratio. That consistency is the whole point — it keeps traces intact rather than slicing them, the same property tail sampling in a Collector has to engineer with a load-balancing exporter. The cost is that the decision is blind: at 0.25 you keep 25% of errors and 25% of slow requests along with 25% of the boring ones.

The critical concept is itemCount. When the sampler keeps one trace out of four, it does not pretend the other three never happened. It stamps the surviving telemetry with an itemCount of 4 (the inverse of the sampling rate). Azure Monitor’s aggregations and the portal’s metrics multiply by itemCount to reconstruct the true population. So requests/count and dependencies/duration percentiles stay statistically correct even though you ingested a quarter of the rows. If you write raw KQL that does count() without weighting, you will undercount by 4x — section 6 shows the fix.

Property Classic adaptive Distro fixed-rate
Mechanism dynamic, targets items/sec static ratio, trace-ID hash
Consistency across services per-node, not guaranteed guaranteed (same ratio + trace ID)
Available in OTel distro no yes (SamplingRatio)
itemCount stamped yes yes

Set the same SamplingRatio in every service of a transaction. Mismatched ratios re-introduce the broken-trace problem: a downstream service at 1.0 keeping spans whose root was dropped at 0.1.

4. Cross-component correlation: operation_Id, parentId, and the app map

Distributed correlation rides on W3C Trace Context. The traceparent HTTP header carries a 16-byte trace ID and an 8-byte span ID; the HttpClient instrumentation injects it on outbound calls and the ASP.NET Core instrumentation extracts it on inbound. You do not configure this — it is on by default in the distro, and it is the reason a request landing on service A and fanning out to B and C stitches into one transaction.

The OTel trace ID becomes operation_Id on export. Every span sharing a trace ID shares operation_Id, and parentId (the parent span ID) chains them into a tree. The fields map like this:

OpenTelemetry App Insights column Meaning
TraceId operation_Id one value per end-to-end transaction
SpanId id this span
ParentSpanId operation_ParentId the calling span
service.name cloud_RoleName the node on the app map
service.instance.id cloud_RoleInstance the specific replica

cloud_RoleName is what the application map draws as a node, so set it deliberately. The distro derives it from the service.name resource attribute; override it through resource configuration rather than letting it default to the process name:

using OpenTelemetry.Resources;

builder.Services.AddOpenTelemetry()
    .UseAzureMonitor()
    .ConfigureResource(r => r.AddService(
        serviceName: "orders-api",
        serviceNamespace: "commerce",
        serviceInstanceId: Environment.MachineName));

If two deployments report the same service.name, they collapse into one node on the map and you lose the ability to see traffic between them. Conversely, per-pod role names explode the map. Name by logical service, and let cloud_RoleInstance carry the replica identity.

5. Custom dimensions, telemetry processors, and PII scrubbing

Tags you set on an activity (activity.SetTag(...)) land in the customDimensions property bag, queryable as customDimensions.key. That is the right place for business context — tenant ID, SKU, feature flag — that you want to slice by in KQL.

When you need to mutate or drop telemetry before export — the classic ITelemetryProcessor job — implement an OpenTelemetry BaseProcessor<Activity> and register it on the tracer provider. The canonical use is scrubbing PII out of URLs and tags so it never reaches the ingestion endpoint:

using System.Diagnostics;
using OpenTelemetry;

public sealed class PiiRedactingProcessor : BaseProcessor<Activity>
{
    public override void OnEnd(Activity activity)
    {
        // Strip a query string that may carry tokens or emails.
        var url = activity.GetTagItem("url.full") as string;
        if (url is not null && url.Contains('?'))
        {
            activity.SetTag("url.full", url[..url.IndexOf('?')]);
        }

        // Drop a tag entirely by overwriting with null.
        if (activity.GetTagItem("enduser.email") is not null)
        {
            activity.SetTag("enduser.email", null);
        }
    }
}

Register it. Order matters: scrub on OnEnd so you see the final, fully-tagged activity:

builder.Services.AddOpenTelemetry()
    .UseAzureMonitor()
    .WithTracing(t => t.AddProcessor<PiiRedactingProcessor>());

Two rules a principal review should enforce. Redaction belongs in a processor, not scattered through handlers, so there is one auditable place that data leaves the boundary clean. And redaction must run client-side, before export — Azure Monitor’s workspace-level data collection transformations can also drop columns at ingestion, but by then the PII has already crossed the network to the regional endpoint, which most compliance regimes treat as a disclosure.

6. Querying requests/dependencies tables and building transaction diagnostics in KQL

In a workspace-based Application Insights resource, the tables are AppRequests, AppDependencies, AppExceptions, AppTraces, and AppPerformanceCounters. (The portal’s transaction view uses the same data; KQL gives you control the UI does not.)

Reconstruct a full transaction from an operation_Id. Union the three tables, normalize timing, and order by start to read the call tree top to bottom:

let opId = "0123456789abcdef0123456789abcdef";
union AppRequests, AppDependencies, AppExceptions
| where OperationId == opId
| project TimeGenerated, ItemType = Type, Name, Target,
          DurationMs = DurationMs, Success, ParentId, Id
| order by TimeGenerated asc

Find the slowest dependency type across the fleet — and weight by ItemCount so sampling does not skew the percentile:

AppDependencies
| where TimeGenerated > ago(1h)
| summarize p95 = percentile(DurationMs, 95),
            calls = sum(ItemCount)          // ItemCount, not count()
          by DependencyType = Type, Target
| order by p95 desc

That sum(ItemCount) is the rule for any sampled environment: raw count() counts ingested rows, sum(ItemCount) reconstructs the real population. The same applies to failure rates:

AppRequests
| where TimeGenerated > ago(30m)
| summarize total = sum(ItemCount),
            failed = sumif(ItemCount, Success == false)
          by OperationName = Name
| extend failureRate = round(100.0 * failed / total, 2)
| where total > 100
| order by failureRate desc

To stitch failures back to their root request — the diagnostic the transaction view is built for — join exceptions to their parent request on OperationId:

AppExceptions
| where TimeGenerated > ago(1h)
| project OperationId, ProblemId, ExceptionType = Type,
          OuterMessage, ExcParentId = ParentId
| join kind=inner (
    AppRequests
    | project OperationId, RequestName = Name, ResultCode, Url
  ) on OperationId
| summarize occurrences = count() by ProblemId, ExceptionType, RequestName
| order by occurrences desc

Enterprise scenario

A payments platform team running about forty .NET microservices on AKS migrated off the classic SDK and immediately tripped two wires at once. First, half the services migrated in sprint N and the rest in sprint N+1; during the overlap the migrated services ran the distro at SamplingRatio = 0.1 while the un-migrated ones still ran classic adaptive sampling targeting five items/second. The two algorithms made independent keep/drop decisions on the same traces, so the application map showed dependencies that dead-ended — a request kept by service A calling a service B span that had been dropped. The transaction view was unusable for exactly the cross-service failures the team cared about.

The constraint was that they could not pause the migration and could not run unsampled at their volume (the workspace was already near a meaningful daily ingestion spend). The fix had two parts. They standardized on distro fixed-rate sampling everywhere and set an identical ratio via a shared environment variable injected by a Helm chart, so trace-ID-consistent sampling held across the whole mesh:

# values.yaml fragment applied to every service's Deployment
env:
  - name: APPLICATIONINSIGHTS_CONNECTION_STRING
    valueFrom:
      secretKeyRef: { name: appinsights, key: connectionString }
  - name: OTEL_SAMPLING_RATIO        # read in Program.cs, applied to SamplingRatio
    value: "0.10"

Second, for the four services on the regulated payment-authorization path, blind 10% sampling was unacceptable because a dropped error was a lost audit trail. They could not afford full retention fleet-wide, so they kept the cheap consistent head sampling for the bulk and added a second-tier OpenTelemetry Collector doing tail-based sampling only for those services — keeping 100% of error and high-latency traces and a small probabilistic floor of the rest — exporting to the same Azure Monitor resource. The result: the app map stitched cleanly again, the regulated path retained every error, and ingestion dropped roughly 80% versus their pre-migration unsampled baseline. The lesson the team wrote into their migration runbook: a sampling ratio is a fleet-wide contract, not a per-service setting.

Verify

Confirm the pipeline end to end before you call the migration done.

Controlling data volume cap, daily cap alerts, and ingestion cost

Sampling reduces volume probabilistically; the daily cap is the hard backstop that stops ingestion when a workspace exceeds a configured GB/day, protecting against a runaway loop that floods telemetry. Set it on the Log Analytics workspace and alert before you hit it:

az monitor log-analytics workspace update \
  --resource-group rg-observability \
  --workspace-name law-prod \
  --daily-quota-gb 50

The cap fires a _LogOperation event when reached, and after the cap data is dropped until the next UTC day — which means a cap is a circuit breaker, not a budgeting tool, because hitting it blinds you exactly when something is going wrong. Alert at ~80% of the cap so an engineer reacts before data loss. Track ingestion by table to find what to sample harder:

Usage
| where TimeGenerated > ago(7d)
| where IsBillable == true
| summarize BillableGB = sum(Quantity) / 1000 by DataType
| order by BillableGB desc

If AppDependencies or AppTraces dominate, lower SamplingRatio, drop noisy dependency types in a processor, or move verbose logs below the exported log level before you touch the cap.

Checklist

application-insightsopentelemetryazure-monitordotnetobservability

Comments

Keep Reading