The PM Counter Categories Group is similar to the Utilization Group except the following metrics are applied:
These counters reflect possible errors that indicate traffic congestion in the fabric.
When congestion or a packet that has seen congestion is detected, one of these counters will be incremented and then depending on the issue reported, the packet will just have to wait or in an extreme case, it may be dropped.
These counters reflect congestion in the fabric specific to communication between the Subnet Manager and Subnet Manager Agents using the management VL (VL 15).
The category is calculated exactly as the Congestion category using the same weights and the correct VL15 utilization counters.
These counters reflect errors in the Physical (PHY) and Link Layers, as well as errors in Firmware. The typical cause is a hardware problem such as a poor connection, marginal cable, incorrect length/model cable for signal rate, or damaged/broken hardware (for example, bad connectors).
When a bad packet is detected, one of these counters are incremented and the Link Layer discards the packet.
During the link training sequence, assorted errors may be observed. This is a normal part of the link training and clock synchronization process. Hence, errors observed as part of rebooting nodes or moving cables should not be considered a problem.
These counters occur when an unexpected idle flit is transmitted or received. the term flit refers to a Flow Control Digit or Flit, the smallest unit of information on which flow control may be performed. Intel® Omni-Path Fabric packets are divided into flits of 64 bits for transmission across a link. The flit excludes any headers; the 64 bits is the payload size.
The transmit port will send idle flits until it can continue sending the rest of the packet. The category is calculated as follows: