npstat is hosted by Hepforge, IPPP Durham
NPStat  5.10.0
npsi Namespace Reference

Classes

class  MinuitDensityFitFcn1D
 
class  JohnsonFit
 
class  MinuitLocalLogisticFcn
 
class  MinuitLOrPEBgCVFcn1D
 
class  MinuitNeymanOSDE1DFcn
 
class  MinuitPLogliOSDE1DFcn
 
class  MinuitQuantileRegression1DFcn
 
class  MinuitQuantileRegressionNDFcn
 
class  MinuitSemiparametricFitFcn1D
 
class  MinuitUnbinnedFitFcn1D
 
class  ScalableDensityConstructor1D
 

Functions

template<typename InputData , typename OutputData >
void fitCompositeJohnson (const InputData *input, unsigned long nInput, unsigned nBins, double xmin, double xmax, double qmin, double qmax, double minlog, const npstat::LocalPolyFilter1D *const *filters, unsigned nFilters, OutputData *smoothedCurve, unsigned lenCurve, bool *intitialFitConverged, unsigned *filterUsed)
 
template<class Numeric1 , class Numeric2 >
void minuitLocalQuantileRegression1D (std::vector< std::pair< Numeric1, Numeric1 > > inputPoints, double symbetaPower, double bandwidthInCDFSpace, unsigned polyDegree, double cdfValue, double xmin, double xmax, Numeric2 *result, unsigned nResultPoints, bool verbose=false)
 
template<class Point , class Numeric , class BooleanFunctor , typename Num2 , unsigned StackLen2, unsigned StackDim2>
void minuitUnbinnedLogisticRegression (npstat::LogisticRegressionOnKDTree< Point, Numeric, BooleanFunctor > &reg, const unsigned maxdeg, npstat::ArrayND< Num2, StackLen2, StackDim2 > *result, const npstat::BoxND< Numeric > &resultBox, unsigned reportProgressEvery=0)
 
template<typename Numeric , unsigned StackLen, unsigned StackDim, typename Num2 , unsigned StackLen2, unsigned StackDim2>
void minuitLogisticRegressionOnGrid (npstat::LogisticRegressionOnGrid< Numeric, StackLen, StackDim > &reg, const unsigned maxdeg, npstat::ArrayND< Num2, StackLen2, StackDim2 > *result, unsigned reportProgressEvery=0)
 
template<typename Numeric , typename Num2 , unsigned StackLen2, unsigned StackDim2>
void minuitQuantileRegression (npstat::QuantileRegressionBase< Numeric > &qrb, unsigned polyDegree, npstat::ArrayND< Num2, StackLen2, StackDim2 > *result, const npstat::BoxND< Numeric > &resultBox, unsigned reportProgressEvery=0, double upFactor=1.0)
 
template<typename Numeric , typename Num2 , unsigned StackLen2, unsigned StackDim2, typename NumHisto >
void minuitQuantileRegressionIncrBW (npstat::QuantileRegressionBase< Numeric > &qrb, unsigned polyDegree, npstat::ArrayND< Num2, StackLen2, StackDim2 > *result, const npstat::BoxND< Numeric > &resultBox, const npstat::HistoND< NumHisto > &predictorHisto, double minimalSampleFraction, unsigned reportProgressEvery=0, double upFactor=1.0)
 
template<typename Numeric >
double boundaryBandwidth1D (const npstat::HistoND< Numeric > &histo, const double filterDeg, const int m)
 
double featureBandwidth1D (const double featureSize, const double filterDeg, const double effectiveNBg, const int m)
 
template<typename Numeric >
double minHistoBandwidth1D (const npstat::HistoND< Numeric > &histo, const double featureSize, const double filterDeg, const double nbg, const int m)
 
template<class Numeric1 , class Numeric2 >
void weightedLocalQuantileRegression1D (std::vector< npstat::Triple< Numeric1, Numeric1, double > > inputPoints, double symbetaPower, double bandwidthInCDFSpace, unsigned polyDegree, double cdfValue, double xmin, double xmax, Numeric2 *result, unsigned nResultPoints, bool verbose=false)
 

Detailed Description

Namespace "npsi" (nonparametric statistics interface) is used for classes and functions in the NPStat package which rely on the Minuit function minimization package. See http://www.cern.ch/minuit/

Function Documentation

◆ fitCompositeJohnson()

template<typename InputData , typename OutputData >
void npsi::fitCompositeJohnson ( const InputData *  input,
unsigned long  nInput,
unsigned  nBins,
double  xmin,
double  xmax,
double  qmin,
double  qmax,
double  minlog,
const npstat::LocalPolyFilter1D *const *  filters,
unsigned  nFilters,
OutputData *  smoothedCurve,
unsigned  lenCurve,
bool *  intitialFitConverged,
unsigned *  filterUsed 
)

Density estimation by the transformation method using the following sequence of steps:

  1. Johnson system is fitted to the input sample between quantiles that correspond to parameters "qmin" and "qmax". Typical values of these parameters are 0.05 and 0.95.
  2. The sample is transformed according to the cumulative distribution of the fitted Johnson system.
  3. The transformed sample is smoothed with a bunch of filters with different bandwidth values. The best filter (bandwidh) is then chosen using pseudo-likelihood cross-vaidation.
  4. BinnedCompositeJohnson density is made using the results of these fits. This density is scanned into the "smoothedCurve" array.

Function arguments are as follows:

input, nInput – Array of input data points (typically floats or doubles) and the number of points in this array.

nBins – Number of bins for the histogram which will be used for fitting parameters of the Johnson system.

xmin, xmax – Range (support) of the estimated density.

qmin, qmax, minlog – Parameters passed to the JohnsonFit class.

filters, nFilters – A collection of smoothers to try on the transformed density. All of them will be used and the smoother with the best cross-validation pseudo-likelihood will be chosen to build the final result.

smoothedCurve, lenCurve – The array in which the smoothed values will be stored. The coordinates correspond to the bin centers of a histogram with "lenCurve" bins between "xmin" and "xmax".

intitialFitConverged – Can be used to find out whether the initial Johnson system fit converged successfully. This parameter can also be NULL.

filterUsed – On output, will contain the number of the best filter from "filters" (or can be NULL).

◆ minuitLocalQuantileRegression1D()

template<class Numeric1 , class Numeric2 >
void npsi::minuitLocalQuantileRegression1D ( std::vector< std::pair< Numeric1, Numeric1 > >  inputPoints,
double  symbetaPower,
double  bandwidthInCDFSpace,
unsigned  polyDegree,
double  cdfValue,
double  xmin,
double  xmax,
Numeric2 *  result,
unsigned  nResultPoints,
bool  verbose = false 
)

High-level driver functions for performing local 1-d quantile regression fits using Minuit2 as a minimization engine

The arguments are as follows:

inputPoints – are the points for which the regression should be performed. Predictor is the first member of the pair and response is the second. As a side effect of this function, the input points will be sorted in the increasing order. This is why the vector of input points is non-const.

symbetaPower – the power parameter for "SymmetricBeta1D". 3 and 4 are good values to try.

bandwidthInCDFSpace – Approximate fraction of sample points which will participate in each fit. Due to robustness requirements (obtaining limited bandwidth in coordinate space), the bandwidth in the CDF space must be less than 0.5 (and, of course, positive).

polyDegree – this defines the degree of the polynomial that will be fitted to the quantile curve. It does not make much sense to go beyond 3 here.

cdfValue – which quantile to use in the regression

xmin, xmax – the result will be calculated between xmin and xmax in equidistant steps

result – array where the result will be stored

nResultPoints – number of coordinate points to use to build the result. The interval (xmin, xmax) will be split into "nResultPoints" bins. The coordinates at which the fits are performed are taken from the middle of those bins (as in a histogram). Naturally, array "result" must have at least "nResultPoints" elements.

verbose – this switch can be turned on for debugging purposes

◆ minuitQuantileRegression()

template<typename Numeric , typename Num2 , unsigned StackLen2, unsigned StackDim2>
void npsi::minuitQuantileRegression ( npstat::QuantileRegressionBase< Numeric > &  qrb,
unsigned  polyDegree,
npstat::ArrayND< Num2, StackLen2, StackDim2 > *  result,
const npstat::BoxND< Numeric > &  resultBox,
unsigned  reportProgressEvery = 0,
double  upFactor = 1.0 
)

High-level driver function for performing local quantile regression fits using Minuit2 as a minimization engine. The weight function is assumed to be symmetric in each dimension.

Function arguments are as follows:

qrb – Naturally, an instance of the npstat::QuantileRegressionBase template. Carries the information about the dataset, the kernel, the bandwidth, and the quantile to fit for. For more details, look at the LocalQuantileRegression.hh header.

polyDegree – Degree of the local polynomial to fit. Can be 0, 1 (local linear regression), or 2 (local quadratic regression).

result – Grid which will hold the results on exit. It defines the number of points in each dimension and provides the storage space.

resultBox – Coordinates of the grid boundaries. The points for which the regression is performed will be positioned inside this box just like histogram bin centers.

reportProgressEvery – Print out a message about the number of grid points processed to the standard output every "reportProgressEvery" points. The default value of 0 means that such printouts are disabled.

upFactor – A factor for the Minuit UP parameter, to multiply by the value estimated internally. Don't change the default unless you really understand what you are doing.

For this function, it is assumed that the constant bandwidth is set up already, with the weight function which was used to create the orthogonal polynomials.

◆ minuitQuantileRegressionIncrBW()

template<typename Numeric , typename Num2 , unsigned StackLen2, unsigned StackDim2, typename NumHisto >
void npsi::minuitQuantileRegressionIncrBW ( npstat::QuantileRegressionBase< Numeric > &  qrb,
unsigned  polyDegree,
npstat::ArrayND< Num2, StackLen2, StackDim2 > *  result,
const npstat::BoxND< Numeric > &  resultBox,
const npstat::HistoND< NumHisto > &  predictorHisto,
double  minimalSampleFraction,
unsigned  reportProgressEvery = 0,
double  upFactor = 1.0 
)

High-level driver function for performing local quantile regression fits using Minuit2 as a minimization engine. The weight function is assumed to be symmetric in each dimension.

This function is similar to minuitQuantileRegression. However, it sometimes automatically increases the bandwidth: it makes sure that the regression box has at least the minimal fraction of points inside it, as specified by the "minimalSampleFraction" parameter. The fraction is calculated from the "predictorHisto" histogram whose dimensionality and axis order should coincide with the regression predictors. It is expected that this histogram will contain the predictor variables for the sample actually used in the regression.

"minimalSampleFraction" must be <= 1.0. 0 or negative values will result in the constant bandwidth use, just like in the minuitQuantileRegression function.

◆ minuitUnbinnedLogisticRegression()

template<class Point , class Numeric , class BooleanFunctor , typename Num2 , unsigned StackLen2, unsigned StackDim2>
void npsi::minuitUnbinnedLogisticRegression ( npstat::LogisticRegressionOnKDTree< Point, Numeric, BooleanFunctor > &  reg,
const unsigned  maxdeg,
npstat::ArrayND< Num2, StackLen2, StackDim2 > *  result,
const npstat::BoxND< Numeric > &  resultBox,
unsigned  reportProgressEvery = 0 
)

High-level driver function for performing local logistic regression fits using Minuit2 as a minimization engine. It is assumed that the constant bandwidth is set up already, with the weight function which was used to create the orthogonal polynomials. The weight function is assumed to be symmetric in each dimension.

◆ weightedLocalQuantileRegression1D()

template<class Numeric1 , class Numeric2 >
void npsi::weightedLocalQuantileRegression1D ( std::vector< npstat::Triple< Numeric1, Numeric1, double > >  inputPoints,
double  symbetaPower,
double  bandwidthInCDFSpace,
unsigned  polyDegree,
double  cdfValue,
double  xmin,
double  xmax,
Numeric2 *  result,
unsigned  nResultPoints,
bool  verbose = false 
)

High-level driver functions for performing local 1-d quantile regression fits for weighted points using Minuit2 as a minimization engine

The arguments are as follows:

inputPoints – are the points for which the regression should be performed. Predictor is the first member of the triple, response is the second, and weight is the third. As a side effect of this function, the input points will be sorted in the increasing order of the predictor. This is why the vector of input points is non-const.

symbetaPower – the power parameter for "SymmetricBeta1D". 3 and 4 are good values to try.

bandwidthInCDFSpace – Approximate fraction of sample points which will participate in each fit. Due to robustness requirements (obtaining limited bandwidth in coordinate space), the bandwidth in the CDF space must be less than 0.5 (and, of course, positive).

polyDegree – this defines the degree of the polynomial that will be fitted to the quantile curve. It does not make much sense to go beyond 3 here.

cdfValue – which quantile to use in the regression

xmin, xmax – the result will be calculated between xmin and xmax in equidistant steps

result – array where the result will be stored

nResultPoints – number of coordinate points to use to build the result. The interval (xmin, xmax) will be split into "nResultPoints" bins. The coordinates at which the fits are performed are taken from the middle of those bins (as in a histogram). Naturally, array "result" must have at least "nResultPoints" elements.

verbose – this switch can be turned on for debugging purposes