# Span Metrics Connector

<!-- status autogenerated section -->
| Status        |           |
| ------------- |-----------|
| Distributions | [contrib] |
| Issues        | [![Open issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aopen%20label%3Aconnector%2Fspanmetrics%20&label=open&color=orange&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aopen+is%3Aissue+label%3Aconnector%2Fspanmetrics) [![Closed issues](https://img.shields.io/github/issues-search/open-telemetry/opentelemetry-collector-contrib?query=is%3Aissue%20is%3Aclosed%20label%3Aconnector%2Fspanmetrics%20&label=closed&color=blue&logo=opentelemetry)](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues?q=is%3Aclosed+is%3Aissue+label%3Aconnector%2Fspanmetrics) |
| Code coverage | [![codecov](https://codecov.io/github/open-telemetry/opentelemetry-collector-contrib/graph/main/badge.svg?component=connector_spanmetrics)](https://app.codecov.io/gh/open-telemetry/opentelemetry-collector-contrib/tree/main/?components%5B0%5D=connector_spanmetrics&displayType=list) |
| [Code Owners](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/main/CONTRIBUTING.md#becoming-a-code-owner)    | [@portertech](https://www.github.com/portertech), [@Frapschen](https://www.github.com/Frapschen), [@iblancasa](https://www.github.com/iblancasa) \| Seeking more code owners! |
| Emeritus      | [@albertteoh](https://www.github.com/albertteoh) |

[alpha]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-stability.md#alpha
[contrib]: https://github.com/open-telemetry/opentelemetry-collector-releases/tree/main/distributions/otelcol-contrib

## Supported Pipeline Types

| [Exporter Pipeline Type] | [Receiver Pipeline Type] | [Stability Level] |
| ------------------------ | ------------------------ | ----------------- |
| traces | metrics | [alpha] |

[Exporter Pipeline Type]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#exporter-pipeline-type
[Receiver Pipeline Type]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md#receiver-pipeline-type
[Stability Level]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/docs/component-stability.md#stability-levels
<!-- end autogenerated section -->

⚠️ Breaking Change Warning: 
The default duration metrics unit will change from `ms` to `s` to adhere to the OpenTelemetry semantic conventions and a feature gate `connector.spanmetrics.useSecondAsDefaultMetricsUnit` is also added.

Currently, the feature gate is disabled by default, so the unit will remain `ms`. After one release cycle, the unit will switch to `s` and the feature gate will also be enabled by default.

## Overview

Aggregates Request, Error and Duration (R.E.D) OpenTelemetry metrics from span data.

**Request** counts are computed as the number of spans seen per unique set of
dimensions, including Errors. Multiple metrics can be aggregated if, for instance,
a user wishes to view call counts just on `service.name` and `span.name`.

```
traces.span.metrics.calls{service.name="shipping",span.name="get_shipping/{shippingId}",span.kind="SERVER",status.code="Ok"}
```

**Error** counts are computed from the Request counts which have an `Error` Status Code metric dimension.

```
traces.span.metrics.calls{service.name="shipping",span.name="get_shipping/{shippingId},span.kind="SERVER",status.code="Error"}
```

**Duration** is computed from the difference between the span start and end times and inserted into the
relevant duration histogram time bucket for each unique set dimensions.

```
traces.span.metrics.duration{service.name="shipping",span.name="get_shipping/{shippingId}",span.kind="SERVER",status.code="Ok"}
```

Each metric will have _at least_ the following dimensions because they are common
across all spans:

- `service.name`
- `span.name`
- `span.kind`
- `status.code`
- `collector.instance.id`

The `collector.instance.id` dimension is intended to add a unique UUID to all metrics, ensuring that the spanmetrics connector
does not violate the **Single Writer Principle** when spanmetrics is used in a multi-deployment model.
Currently, `collector.instance.id` must be manually enabled via the feature gate: `connector.spanmetrics.includeCollectorInstanceID`.
More detail, please see [Known Limitation: the Single Writer Principle](#known-limitation-the-single-writer-principle)

## Span to Metrics processor to Span to metrics connector

The spanmetrics connector replaces [spanmetrics](https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/processor/spanmetricsprocessor/v0.95.0/processor/spanmetricsprocessor/README.md) processor with multiple improvements
and breaking changes. It was done to bring the `spanmetrics` connector closer to the OpenTelemetry
specification and make the component agnostic to exporters logic. The `spanmetrics` processor
essentially was mixing the OTel with Prometheus conventions by using the OTel data model and
the Prometheus metric and attributes naming convention.

The following changes were done to the connector component.

Breaking changes:
- The `operation` metric attribute was renamed to `span.name`.
- The `latency` histogram metric name was changed to `duration`.
- The `_total` metric prefix was dropped from generated metrics names.
- The Prometheus-specific metrics labels sanitization was dropped.

Improvements:
- Added support for OTel exponential histograms for recording span duration measurements.
- Added support for the milliseconds and seconds histogram units.
- Added support for generating metrics resource scope attributes. The `spanmetrics` connector will
generate the number of metrics resource scopes that corresponds to the number of the spans resource
scopes meaning that more metrics are generated now. Previously, `spanmetrics` generated a single
metrics resource scope.

## Configurations

If you are not already familiar with connectors, you may find it helpful to first
visit the [Connectors README].

The following settings can be optionally configured:

- `histogram` (default: `explicit`): Use to configure the type of histogram to record
  calculated from spans duration measurements. Must be either `explicit` or `exponential`.
  - `disable` (default: `false`): Disable all histogram metrics.
  - `unit` (default: `ms`): The time unit for recording duration measurements.
  calculated from spans duration measurements. One of either: `ms` or `s`.
  - `dimensions`: additional attributes to add as dimensions to the `traces.span.metrics.duration` metric, 
  which will be included _on top of_ the common and configured `dimensions` for span attributes and resource attributes.
  - `explicit`:
    - `buckets`: the list of durations defining the duration histogram time buckets. Default
      buckets: `[2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 400ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]`
  - `exponential`:
    - `max_size` (default: `160`) the maximum number of buckets per positive or negative number range.
- `dimensions`: the list of dimensions to add to `traces.span.metrics.calls`, `traces.span.metrics.duration` and `traces.span.metrics.event` metrics with the default dimensions defined above.
  Each additional dimension is defined with a `name` which is looked up in the span's collection of attributes or
  resource attributes (AKA process tags) such as `ip`, `host.name` or `region`.
  
  If the `name`d attribute is missing in the span, the optional provided `default` is used.
  
  If no `default` is provided, this dimension will be **omitted** from the metric.
- `calls_dimensions`: additional attributes to add as dimensions to the `traces.span.metrics.calls` metric, 
  which will be included _on top of_ the common and configured `dimensions` for span attributes and resource attributes.
- `exclude_dimensions`: the list of dimensions to be excluded from the default set of dimensions. Use to exclude unneeded data from metrics. 
- `dimensions_cache_size`: this setting is deprecated, please use aggregation_cardinality_limit instead.
- `include_instrumentation_scope`: a list of instrumentation scope names to include from the traces.
- `resource_metrics_cache_size` (default: `1000`): the size of the cache holding metrics for a service. This is mostly relevant for
   cumulative temporality to avoid memory leaks and correct metric timestamp resets.
- `aggregation_temporality` (default: `AGGREGATION_TEMPORALITY_CUMULATIVE`): Defines the aggregation temporality of the generated metrics. 
  One of either `AGGREGATION_TEMPORALITY_CUMULATIVE` or `AGGREGATION_TEMPORALITY_DELTA`.
- `namespace` (default: `traces.span.metrics`): Defines the namespace of the generated metrics. If `namespace` provided, generated metric name will be added `namespace.` prefix.
- `metrics_flush_interval` (default: `60s`): Defines the flush interval of the generated metrics.
- `metrics_expiration` (default: `0`): Defines the expiration time as `time.Duration`, after which, if no new spans are received, metrics will no longer be exported. Setting to `0` means the metrics will never expire (default behavior).
- `metric_timestamp_cache_size` (default `1000`): Only relevant for delta temporality span metrics. Controls the size of the cache used to keep track of a metric's TimestampUnixNano the last time it was flushed. When a metric is evicted from the cache, its next data point will indicate a "reset" in the series. Downstream components converting from delta to cumulative, like `prometheusexporter`, may handle these resets by setting cumulative counters back to 0.
- `exemplars`:  Use to configure how to attach exemplars to metrics.
  - `enabled` (default: `false`): enabling will add spans as Exemplars to all metrics. Exemplars are only kept for one flush interval.rom the cache, its next data point will indicate a "reset" in the series. Downstream components converting from delta to cumulative, like `prometheusexporter`, may handle these resets by setting cumulative counters back to 0.
  - `max_per_data_point` (default: `5`): The maximum number of exemplars to attach to a single metric data point.
- `events`: Use to configure the events metric.
  - `enabled`: (default: `false`): enabling will add the events metric.
  - `dimensions`: (mandatory if `enabled`) the list of the span's event attributes to add as dimensions to the `traces.span.metrics.events` metric, which will be included _on top of_ the common and configured `dimensions` for span attributes and resource attributes.
- `resource_metrics_key_attributes`: Filter the resource attributes used to produce the resource metrics key map hash(It's only used to build the hash key, not copy the attributes to metrics resource attributes).
   Use this in case changing resource attributes (e.g. process id) are breaking counter metrics.
- `aggregation_cardinality_limit` (default: `0`): Defines the maximum number of unique combinations of dimensions that will be tracked for metrics aggregation. When the limit is reached, additional unique combinations will be dropped but registered under a new entry with `otel.metric.overflow="true"`. A value of `0` means no limit is applied.
- `add_resource_attributes` (default: `false`): Add the resource attributes to the resulting metrics. This option enables the old behavior before the `connector.spanmetrics.excludeResourceMetrics` feature gate was introduced. When set to `true`, resource attributes will be included in the metrics even if the feature gate is enabled. See [GitHub issue #42103](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/42103) for more context.

The feature gate `connector.spanmetrics.legacyMetricNames` (disabled by default) controls the connector to use legacy metric names.

## Examples

The following is a simple example usage of the `spanmetrics` connector.

For configuration examples on other use cases, please refer to [More Examples](#more-examples).

The full list of settings exposed for this connector are documented in [spanmetricsconnector/config.go](../../connector/spanmetricsconnector/config.go).

```yaml
receivers:
  nop:

exporters:
  nop:

connectors:
  spanmetrics:
    histogram:
      dimensions:
        - name: url.scheme
          default: https
      explicit:
        buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]  
    dimensions:
      - name: http.method
        default: GET
      - name: http.status_code
    calls_dimensions:
      - name: http.url
        default: /ping
    exemplars:
      enabled: true
    exclude_dimensions: ['status.code']
    aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"    
    metrics_flush_interval: 15s
    metrics_expiration: 5m
    events:
      enabled: true
      dimensions:
        - name: exception.type
        - name: exception.message
    resource_metrics_key_attributes:
      - service.name
      - telemetry.sdk.language
      - telemetry.sdk.name
    include_instrumentation_scope:
      - express

service:
  pipelines:
    traces:
      receivers: [nop]
      exporters: [spanmetrics]
    metrics:
      receivers: [spanmetrics]
      exporters: [nop]
```

### Using `spanmetrics` with Prometheus components

The `spanmetrics` connector can be used with Prometheus exporter components.

For some functionality of the exporters, e.g. like generation of the `target_info` metric the
incoming spans resource scope attributes must contain `service.name` and `service.instance.id`
attributes.

Let's look at the example of using the `spanmetrics` connector with the `prometheusremotewrite` exporter:

```yaml
receivers:
  otlp:
    protocols:
      http:
      grpc:

exporters:
  prometheusremotewrite:
    endpoint: http://localhost:9090/api/v1/write
    target_info:
      enabled: true

connectors:
  spanmetrics:
    namespace: span.metrics

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [spanmetrics]
    metrics:
      receivers: [spanmetrics]
      exporters: [prometheusremotewrite]
```

This configures the `spanmetrics` connector to generate metrics from received spans and export the
metrics to the Prometheus Remote Write exporter. The `target_info` metric will be generated for each
resource scope, while OpenTelemetry metric names and attributes will be [normalized](../../exporter/prometheusremotewriteexporter/README.md)
to be compliant with Prometheus naming rules. For example, the generated `calls` OTel sum metric can
result in multiple Prometheus `calls_total` (counter type) time series and the `target_info` time series.
For example:

```
target_info{job="shippingservice", instance="...", ...} 1
calls_total{span_name="/Address", service_name="shippingservice", span_kind="SPAN_KIND_SERVER", status_code="STATUS_CODE_UNSET", ...} 142
```

### More Examples

For more example configuration covering various other use cases, please visit the [testdata directory](../../connector/spanmetricsconnector/testdata).

[Connectors README]: https://github.com/open-telemetry/opentelemetry-collector/blob/main/connector/README.md

## Known Limitation: the Single Writer Principle

Proper configuration of the `spanmetricsconnector` ensures compliance with the [Single Writer Principle](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#single-writer),
which is a core requirement in the OpenTelemetry metrics data model. Misconfiguration, however, 
may allow multiple components to write to the same metric stream, resulting in data inconsistency, 
metric conflicts, or the dropping of time series by metric backends.

### Why this happens

This issue typically arises when:

* Multiple pipelines use the same instance of the `spanmetricsconnector`
* The connector is instantiated more than once without ensuring the resulting metric streams are distinct
* The `resource_metrics_key_attributes` field is not configured correctly or includes common/shared attributes across all instances

### Recommendations

To reduce the risk of conflicting writes:

* Add `resource_metrics_key_attributes` to your configuration.
```
connectors:
  spanmetrics:
    resource_metrics_key_attributes:
      - service.name
      - telemetry.sdk.language
      - telemetry.sdk.name
```
* Manually enable the feature gate: `connector.spanmetrics.includeCollectorInstanceID` to produce uniquely identified metrics.
* For exporters like Prometheus, which rely on the single writer assumption, use a dedicated pipeline with a single `spanmetricsconnector` instance

More context is available in [GitHub issue #21101](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/21101).

### About `resource_metrics_key_attributes`

The `resource_metrics_key_attributes` setting are used to build the key map that determines how metrics are grouped.

If this field is left empty, the connector will use **all** available attributes to compute the resource metric hash.

To avoid problems, be cautious when choosing which attributes to include.

Avoid attributes that:

* **Change frequently** – such as `request_id`, `timestamp`, or `trace_id`. These increase cardinality and create excessive metric streams.
* **Are shared across all sources** – values like `true`, `default`, or `team:backend` offer no uniqueness and can lead to multiple writers sharing the same stream.
* **Are optional or inconsistently applied** – if an attribute is only present in some spans, this can fragment metric streams (e.g., one stream with the attribute and one without).

Instead, use attributes that are stable, present in all spans, and meaningfully distinguish each stream. Good examples include `cluster_id`, `region`, or `deployment_environment`.
