Class PoissonLikelihoodCostFunction
- Namespace
- SignalSharp.CostFunctions.Cost
- Assembly
- SignalSharp.dll
Represents a cost function based on the Poisson negative log-likelihood for the PELT algorithm. This cost function is sensitive to changes in the rate (mean) of events in count data.
public class PoissonLikelihoodCostFunction : CostFunctionBase, ILikelihoodCostFunction, IPELTCostFunction
- Inheritance
-
PoissonLikelihoodCostFunction
- Implements
- Inherited Members
Remarks
This cost function assumes that the data within each segment represents counts following a Poisson distribution
independently for each dimension, with a constant rate parameter (λ
) for that segment and dimension.
It calculates the cost based on the negative log-likelihood of the segment data given its estimated rate (Maximum Likelihood Estimate - MLE).
The MLE for the rate λ
in a segment [start, end) of length n = end - start
for a given dimension is the sample mean:
λ_hat = (Sum_{i=start}^{end-1} signal[dim, i]) / n = S / n
, where S
is the sum of counts for that dimension.
The likelihood metric used for BIC/AIC calculations, proportional to -2 * log-likelihood
, is calculated as:
Metric(start, end) = Sum_dimensions [ 2 * ( S - S * log(S) + S * log(n) ) ]
where S = Sum_{i=start}^{end-1} signal[dim, i]
is the sum of counts in the segment for that dimension, and n = end - start
is the segment length.
This formula assumes the convention 0 * log(0) = 0
, which is handled by setting the metric contribution to 0 when S=0
for a dimension.
The term Sum log(signal[dim, i]!)
from the full likelihood is omitted as it depends only on the data points themselves.
The ComputeCost(int?, int?) method returns this same metric value.
This cost/metric is calculated efficiently using precomputed prefix sums of the signal,
allowing O(D)
calculation per segment after an O(N*D)
precomputation step during Fit(double[,]),
where D is the number of dimensions and N is the number of time points.
Consider using the Poisson Likelihood cost function when:
- Your data represents counts of events per interval (e.g., website hits per day, defects per batch, calls per hour) for one or more dimensions.
- You expect changes in the average rate of these events.
- The data within segments can be reasonably approximated by a Poisson distribution (variance roughly equals mean).
- The input data contains non-negative values (counts cannot be negative).
Note: While the function accepts double
inputs, Poisson counts are theoretically non-negative integers. This implementation requires input data to be effectively non-negative (values >= -Epsilon
). Values slightly below zero but within tolerance will be clamped to zero. Significantly negative values will cause an exception during Fit(double[,]).
Constructors
PoissonLikelihoodCostFunction()
Initializes a new instance of the PoissonLikelihoodCostFunction class.
public PoissonLikelihoodCostFunction()
Properties
SupportsInformationCriteria
Indicates that this cost function provides likelihood metrics suitable for BIC/AIC.
public bool SupportsInformationCriteria { get; }
Property Value
Methods
ComputeCost(int?, int?)
Computes the cost for a segment [start, end) based on the Poisson negative log-likelihood.
The cost is Sum_dimensions [ 2 * ( S - S * log(S) + S * log(n) ) ]
, where S is the sum of counts and n is the length.
public override double ComputeCost(int? start = null, int? end = null)
Parameters
start
int?The start index of the segment (inclusive). If null, defaults to 0.
end
int?The end index of the segment (exclusive). If null, defaults to the length of the data.
Returns
- double
The computed cost for the segment.
Remarks
Calculates the cost in O(D)
time using precomputed prefix sums, where D is the number of dimensions.
Handles the segmentSum = 0
case correctly based on the limit x*log(x) -> 0
as x -> 0
, resulting in zero cost contribution for dimensions with zero total count.
Must be called after Fit(double[,]). This method returns the same value as ComputeLikelihoodMetric(int, int).
// Assuming 'counts' data from Fit example
var poissonCost = new PoissonLikelihoodCostFunction().Fit(counts);
double costSegment1 = poissonCost.ComputeCost(0, 4); // Cost for segment with lower counts
double costSegment2 = poissonCost.ComputeCost(4, 7); // Cost for the segment with higher counts
// Example with zero-sum segment
double[,] zeroCounts = { { 0, 0, 0, 5, 5 } };
var zeroCost = new PoissonLikelihoodCostFunction().Fit(zeroCounts);
double costZeroSeg = zeroCost.ComputeCost(0, 3); // Should be 0.0
Exceptions
- UninitializedDataException
Thrown when prefix sums are not initialized (Fit(double[,]) not called).
- ArgumentOutOfRangeException
Thrown when the segment indices (
start
,end
) are out of bounds.- SegmentLengthException
Thrown when the segment length (
end - start
) is less than 1.
ComputeLikelihoodMetric(int, int)
Computes the likelihood metric for a segment [start, end) based on the Poisson negative log-likelihood.
The metric is Sum_dimensions [ 2 * ( S - S * log(S) + S * log(n) ) ]
, where S is the sum of counts and n is the length.
public double ComputeLikelihoodMetric(int start, int end)
Parameters
start
intThe start index of the segment (inclusive).
end
intThe end index of the segment (exclusive).
Returns
- double
The computed likelihood metric for the segment.
Remarks
Calculates the metric in O(D)
time using precomputed prefix sums.
Handles the segmentSum = 0
case correctly.
Must be called after Fit(double[,]). This method returns the same value as ComputeCost(int?, int?).
Exceptions
- UninitializedDataException
Thrown when prefix sums are not initialized (Fit(double[,]) not called).
- ArgumentOutOfRangeException
Thrown when the segment indices (
start
,end
) are out of bounds.- SegmentLengthException
Thrown when the segment length (
end - start
) is less than 1.
Fit(double[,])
Fits the cost function to the provided count data by precomputing prefix sums.
public override IPELTCostFunction Fit(double[,] signalMatrix)
Parameters
signalMatrix
double[,]The count data array to fit (rows=dimensions, columns=time points). Values must be effectively non-negative (>= -Epsilon).
Returns
- IPELTCostFunction
The fitted PoissonLikelihoodCostFunction instance.
Remarks
This method performs O(N*D)
computation to calculate prefix sums, enabling O(D)
cost/metric calculation per segment later.
It must be called before cost/metric computation methods.
It validates that all input data points are non-negative within a small tolerance (Epsilon
). Values slightly below zero but within tolerance will be clamped to zero for the sum.
// Example: Number of website hits per hour
double[,] counts = { { 5, 8, 6, 7, 25, 30, 28, 10, 9, 12 } };
var poissonCost = new PoissonLikelihoodCostFunction();
poissonCost.Fit(counts);
// Example with near-zero value
double[,] countsNearZero = { { 5, 8, 1e-10, 7, 25, 30, -1e-11, 10, 9, 12 } };
poissonCost.Fit(countsNearZero); // Should work
Exceptions
- ArgumentNullException
Thrown if
signalMatrix
is null.- ArgumentException
Thrown if any data point in
signalMatrix
is less than -Epsilon
.
GetSegmentParameterCount(int)
Gets the number of parameters estimated for a Poisson model segment. This is 1 parameter (the rate 'λ') per dimension.
public int GetSegmentParameterCount(int segmentLength)
Parameters
segmentLength
intThe length of the segment (unused).
Returns
- int
Number of parameters: Number of dimensions * 1.