npstat Namespace Reference
Detailed DescriptionNamespace "npstat" (nonparametric statistics) is used for all classes and functions in the stand-alone part of the NPStat package Typedef Documentation◆ ArrayShape
This type will be used to specify array length in each dimension ◆ LinearMapper1d
Typedefs for typical template parameters ◆ StaticDiscreteDistribution1DReader
Factory for deserializing one-dimensional discrete distributions ◆ StaticDistribution1DReader
Factory for deserializing one-dimensional distribution functions ◆ StaticDistributionNDReader
Factory for deserializing multivariate distribution functions ◆ StaticDistributionTransform1DReader
Factory for deserializing one-dimensional distribution functions ◆ StaticLocalPolyFilter1DReader
Factory for deserializing one-dimensional distribution functions ◆ StaticStorableMultivariateFunctorReader
The reader factory for descendants of StorableMultivariateFunctor ◆ StaticUnfoldingFilterNDReader
Factory for deserializing multivariate distribution functions Enumeration Type Documentation◆ anonymous enum
Possible modes for cross validation calculations ◆ EigenMethod
EIGEN_SIMPLE: use simple LAPACK driver for calculating eigenvalues and eigenvectors (such as DSYEV) EIGEN_D_AND_C: use divide and conquer LAPACK driver (such as DSYEVD) EIGEN_RRR: use "Relatively Robust Representations" driver (e.g., DSYEVR) ◆ OrthoPolyMethodMethod to generate the recurrence coefficients ◆ SvdMethod
SVD_SIMPLE: use simple LAPACK driver for calculating SVD (such as DGESVD) SVD_D_AND_C: use divide and conquer LAPACK driver (such as DGESDD) Function Documentation◆ absDifference()
template<typename T >
Absolute value of the difference between two numbers. Works for all standard numeric types, including unsigned and complex. ◆ absValue()
template<typename T >
Absolute value of a number. Works for all standard numeric types, including unsigned and complex. ◆ AD()
The code for the distribution of Anderson-Darling test statistic comes from "Evaluating the Anderson-Darling Distribution" by G. Marsaglia and J. Marsaglia, Journal of Statistical Software, vol. 9, issue 2, pp. 1-5 (2004). ◆ aicc()
Akaike information criterion corrected for the sample size ◆ amiseOptimalBwGauss()
template<typename Real >
Estimate optimal bandwidth for filters generated by the Gaussian distribution used as the weight function. The arguments are as follows: filterDegree – Degree of the filter (in the LOrPE sense). This degree is less by 2 than the order of the kernel in the Turlach's paper. For example, filter of degree 0 corresponds to kernel of order 2 (original Gaussian), filter of degree 2 corresponds to kernel of order 4, etc. Here, the degree must be even. Maximum possible filter degree is 42. npoints – Number of points in the data sample. fvalues – Array of scanned values of the reference density. Note that the contents of this array are NOT preserved. arrLen – Number of elements in the array "fvalues". h – Step with which the scan of the reference density was performed. expectedAmise – If this argument is provided, it will be filled with the expected AMISE value. The formulae used involve numerical evaluations of the derivatives of the scanned reference density. For a given filterDegree, the code needs to calculate the derivative of order filterDegree + 2. Make sure the scan step is chosen appropriately for this kind of calculation (see, for example, the "Numerical Recipes" book in order to understand the issues involved). ◆ amiseOptimalBwSymbeta()
template<typename Real >
Estimate optimal bandwidth for filters generated by the symmetric beta distribution used as the weight function. The "power" argument is the power of the symmetric beta, limited here to unsigned integers. Maximum possible power value is 10. The meaning of all other arguments is the same as in the function npstat::amiseOptimalBwGauss. ◆ amisePluginBwGauss()
Plug-in bandwidth for filters generated by the Gaussian distribution used as the weight function ◆ amisePluginBwSymbeta()
Plug-in bandwidth for filters generated by the symmetric beta distribution used as the weight function. The "power" argument is the power of the symmetric beta, limited here to unsigned integers. Maximum possible power value is 10. ◆ amisePluginDegreeGauss()
AMISE-optimal filter degree for the Gaussian kernel ◆ amisePluginDegreeSymbeta()
AMISE-optimal filter degree for symmetric beta kernels ◆ approxAmisePluginBwGauss()
Approximate version of "amisePluginBwGauss" for use with non-integer filter degrees ◆ approxSymbetaBandwidthRatio()
Approximate continuous version of "symbetaBandwidthRatio" for use with non-integer filter degrees (uses a polynomial fit to the original). Can be used in combination with "approxAmisePluginBwGauss" to derive a continuous version of symmetric beta bandwidth. ◆ arrayArgmax()
template<class Arr >
Index of the maximum array element ◆ arrayArgmin()
template<class Arr >
Index of the minimum array element ◆ arrayCentralMoments()
template<typename Numeric >
Function for calculating multiple sample central moments and estimating their uncertainties (if requested). The array "moments" should have at least "maxOrder+1" elements. If provided, the array "momentUncertainties" should also have at least "maxOrder+1" elements. Upon exit, moments[0] will be set to 1 and moments[1] will be set to the sample mean. For k > 1, moments[k] will be set to the central moment of order k. The corresponding uncertainties will be estimated approximately, to O(1/n), by substituting the sample moments for the population moments in the asymptotic formulae. The relevant formulae can be found, for example, in the monograph "Mathematical Methods of Statistics" by Harald Cramer. ◆ arrayCoordCovariance()
template<class Array , unsigned Len>
Calculate the array covariance matrix. The "limits" argument has the same meaning as in the arrayCoordMean function. ◆ arrayCoordMean()
template<class Array >
Calculate the mean along each coordinate using array values as weights. The coordinate ranges are defined by the given box, and the array points are assumed to be in the centers of the bins, like in a histogram. The results are stored in the "mean" array whose length (given by lengthMean) must not be smaller than the rank of the array. For this function, the array dimensionality can not exceed CHAR_BIT*sizeof(unsigned long) which is normally 64 on 64-bit machines. ◆ arrayCumulants()
template<typename Numeric >
Function for calculating multiple sample cumulants (k-statistics). The array "cumulants" should have at least "maxOrder+1" elements. Upon exit, cumulants[k] will be set to the sample cumulant of order k. Currently, "maxOrder" argument can not exceed 6. Formulae for these can be found, for example, in "Kendall's Advanced Theory of Statistics" by Stuart and Ord. ◆ arrayEntropy()
template<class Numeric >
This function returns negative sum of p_i ln(p_i) over array elements p_i, divided by the total number of array elements. All elements must be non-negative, and there must be at least one positive element present. Useful for calculating entropies of distributions represented on a grid. For example, mutual information of two variables is just the entropy of their copula. ◆ arrayIsDensity()
template<class Arr >
Check whether all array elements are non-negative and, in addition, there is at least one positive element ◆ arrayIsNonNegative()
template<class Arr >
Check if all array elements are non-negative ◆ arrayLengthFromShape()
The number of elements in the array with the given shape ◆ arrayMax()
template<class Arr >
Maximum array element ◆ arrayMin()
template<class Arr >
Minimum array element ◆ arrayMinMax()
template<class Arr >
Minimum and maximum array element ◆ arrayMoment()
template<typename Numeric >
Function for calculating sample moments w.r.t. some point ◆ arrayMoments()
template<typename Numeric >
Function for calculating multiple sample moments w.r.t. some point. The array "moments" should have at least "maxOrder+1" elements. Upon exit, moments[k] will be set to moment of order k. ◆ arrayQuantiles1D()
template<class Numeric >
Quantiles for 1-d arrays in which array values are used as weights. Essentially, the values are treated as histogram bin contents. All input "qvalues" should be between 0 and 1. The results are returned in the corresponding elements of the "quantiles" array. The function will work faster if "qvalues" are sorted in the increasing order. If you expect to call this function more than once using the same data, it is likely that you can obtain the same information more efficiently by creating the "BinnedDensity1D" object with its interpolationDegree argument set to 0 and then using its "quantile" method. ◆ arrayShape1D()
template<class Array >
One-dimensional mean, variance, skewness, and kurtosis using array values as weights. Any of the argument pointers can be NULL in which case the corresponding quantity is not calculated. The code estimates population shape quantities rather than sample quantities (so that it can be best used with equidistantly binned histograms and such). ◆ arrayStats()
template<typename Numeric >
This function estimates population mean, standard deviation, skewness, and kurtosis. The array must have at least one element. A numerically sound two-pass algorithm is used. Any of the output pointers can be specified as NULL if the corresponding quantity is not needed (this might speed the code up). ◆ assignResamplingWeights()
Imagine that you have a sample of size "nPoints". If you would resample it with replacement in the usuall manner, some points would be chosen a few times, and some not at all. On output of this function, the vector "weightsToFill" will be filled with counts how many times each point of the sample is chosen. Naturally, the size of the "weightsToFill" vector will be set to "nPoints". This is an alternative representation of the resampling procedure which can be useful in certain scenarios. ◆ besselK()
Modified Bessel function of the second kind ◆ betaKernelsBandwidth()
template<typename Real >
AMISE optimal bandwidth for density estimation by beta kernels. The arguments are as follows: npoints – Number of points in the data sample. fvalues – Array of scanned values of the reference density. It is assumed that the density is scanned at the bin centers on the [0, 1] interval. nValues – Number of elements in the array "fvalues". returnB2Star – If "true", the function will return b_2* from Chen's paper (and corresponding AMISE), otherwise it will return b_1* (using corrected algebra). expectedAmise – If this argument is provided, it will be filled with the expected AMISE value. The generalized Bernstein polynomial degree is simply the inverse of the bandwidth. ◆ bilinearSection()
This function finds the contours of the intersection of a bilinear interpolation cell (specified by values at the corners of the unit square) with a given constant level. To be used as a low-level building block of contouring procedures which work with interpolated surfaces. The function assumes that the parameters z00, z10, z11, and z01 are the values at the corners (x,y) = (0,0), (1,0), (1,1), and (0,1), respectively (i.e., in the counterclockwise order starting from the origin). "level" is the crossing level for which the contours are found. "nPointsToSample" is the number of points to have in each contour. If this is 1, a fast and crude check will be made and only one point produced (at the center of the cell) if the level crosses the cell anywhere. More reasonable curve representations typically need at least 5 points. The return value of the function is the number of contours found (could be 0, 1, or 2). "section1" is filled with "nPointsToSample" points if this value is at least 1. "section2" is filled if this value is 2. All coordinates will be between 0 and 1. Appropriate shifting and scaling is left up to the user of this function. ◆ binomialCoefficient()
This code calculates the binomial coefficient C(N, M) trying to avoid overflows. Throws either std::overflow_error or std::invalid_argument if things go wrong. ◆ bivariateNormalIntegral()
Bivariate normal cumulative probability ◆ calculateEmpiricalCopula()
template<class Point , class Array >
In the "calculateEmpiricalCopula" function the assumption is that the result index 0 corresponds to copula argument 0.0 and that the maximum array index corresponds to copula argument 1.0. Class Point should be subscriptable. ◆ canFillArrayCentersPreservingAreas()
template<typename Num1 , unsigned StackLen1, unsigned StackDim1, typename Num2 , unsigned StackLen2, unsigned StackDim2>
Check whether the shapes of two arrays are compatible for running the "fillArrayCentersPreservingAreas" function on them ◆ cdfUncertainty()
Uncertainty of a cdf value (binomial) ◆ cdKernelSensitivityMatrix() [1/2]
template<class Fcn2D >
Template class Fcn2D should provide a method "Real operator()(Real x, Real y) const", where "Real" is one of floating point types (long double works best). In the returned matrix, row numbers correspond to the "output" polynomial degrees (y space) and column numbers to the "input" polynomial degrees (x space). "chebyshevDegree" is the degree to use for modeling the convolved density with Chebyshev polynomials (this is needed for fast calculation of its cumulative distribution function). Set "normalizeKernel" parameter to "false" if the kernel is already known to be normalized, that is, Int_ymin^ymax K(x, y) dy = 1 for every x. ◆ cdKernelSensitivityMatrix() [2/2]
template<class Fcn2D , class InPoly >
Template class Fcn2D should provide a method "Real operator()(Real x, Real y) const", where "Real" is one of floating point types (long double works best). It is assumed that the interface of class "InPoly" is similar to that of npstat classes AbsClassicalOrthoPoly1D or ScalableClassicalOrthoPoly1D. In the returned matrix, row numbers correspond to the "output" polynomial degrees (y space) and column numbers to the "input" polynomial degrees (x space). "chebyshevDegree" is the degree to use for modeling the convolved density with Chebyshev polynomials (this is needed for fast calculation of its cumulative distribution function). Set "normalizeKernel" parameter to "false" if the kernel is already known to be normalized, that is, Int_ymin^ymax K(x, y) dy = 1 for every x. ◆ chebyshevDerivativeCoeffs()
template<typename Numeric >
Converts Chebyshev series coefficients into coefficients for the function derivative. The array of coefficients must be at least degree+1 long, and the buffer for the derivative coefficients must be at least degree long. ◆ chebyshevIntegralCoeffs()
template<typename Numeric >
Converts Chebyshev series coefficients into coefficients for the function integral. The 0's degree coefficient will be chosen in such a way that the integral is 0 at xmin. The array of coefficients must be at least degree+1 long, and the buffer for the integral coefficients must be at least degree+2 long. ◆ chebyshevMonomialCoeffs()
template<typename Numeric1 , typename Numeric2 >
Convert series for the Chebyshev polynomials of the first kind into the series for monomials. The arrays of coefficients must be at least degree+1 long. ◆ chebyshevSeriesCoeffs()
template<typename Functor , typename Numeric >
Generate Chebyshev series coefficients for a given functor on the interval [xmin, xmax]. The array of coefficients must be at least degree+1 long. The functor will be given a long double argument. ◆ chebyshevSeriesSum() [1/2]
template<typename Numeric >
Series for the Chebyshev polynomials of the first kind. The array of coefficients must be at least degree+1 long. ◆ chebyshevSeriesSum() [2/2]
template<typename Numeric >
Series for the Chebyshev polynomials of the first kind on the given interval [xmin, xmax] ◆ clearBuffer()
template<typename T >
Clear a buffer (set all elements to the value produced by the default constructor) ◆ closeWithinTolerance()
Check if two doubles are within certain relative tolerance from each other. The "tol" argument which specifies the tolerance must be non-negative. ◆ ComboFunctor1()
template<typename Result1 , typename Arg1 , typename Result2 , typename Arg2 , template< typename > class Operator>
Create a functor which operates on the results produced by two other functors. Note that only references to f1 and f2 are stored. Lifetimes of f1 and f2 must be longer than the lifetime of this functor. The combo functor will have the result type and the argument type of f1. ◆ constrained_least_squares()
template<typename Numeric >
Constrained least squares problem (DGGLSE is used for doubles). "true" is returned on success and "false" on failure. ◆ continuousDegreeTaper()
The input argument "degree" must be non-negative (an exception will be thrown if a negative value is given). If "degree" is an exact integer, the resulting vector "vtaper" will have size degree + 1, with all elements set to 1.0. If "degree" is not an exact integer, the resulting vector "vtaper" will have size ceil(degree) + 1, with all elements except the last one set to 1.0. The last element will be set to sqrt(degree - floor(degree)). For a certain definition of the number of effective degrees of freedom, this results in a direct correspondence between the poly degree and the number of degrees of freedom at large bandwidth values. The resulting vector can be used as an argument of the LocalPolyFilter1D constructor, with constructor "taper" argument set to &vtaper[0] and "maxDegree" argument set to (vtaper.size() - 1). ◆ convertCentralMomentsToCumulants()
template<typename Real >
Arrays "cumulants" and "centralMoments" should have at least maxOrder+1 elements, with centralMoments[2] set to the variance. On exit, cumulants[0] will be set to 0, and cumulants[1] to centralMoments[1]. Currently, maxOrder can not exceed 20. ◆ convertCumulantsToCentralMoments()
template<typename Real >
Arrays "cumulants" and "centralMoments" should have at least maxOrder+1 elements, with cumulants[2] set to the variance. On exit, centralMoments[0] will be set to 1, and centralMoments[1] to cumulants[1]. Currently, maxOrder can not exceed 20. ◆ convertHistoToDensity()
template<typename Histo >
Reset negative histogram bins to zero and then divide histogram bin contents by the histogram integral. If the "knownNonNegative" argument is true, it will be assumed that there are no negative bins, and their explicit reset is unnecessary. This function will throw std::runtime_error in case the histogram is empty after all negative bins are reset. This function is not a member of the HistoND class itself because these operations do not necessarily make sense for all bin types. Making such operation a member would make creation of HistoND scripting API (e.g., for python) more difficult. ◆ convertToGridAxis() [1/3]
Convert dual histogram axis into dual grid axis ◆ convertToGridAxis() [2/3]
Convert uniform histogram to uniform grid axis axis ◆ convertToGridAxis() [3/3]
Note that conversion from non-uniform histogram axis into the grid axis is always possible, but it is loosing information (we are writing the positions of bin centers only, not edges) ◆ convertToHistoAxis() [1/2]
The conversion from non-uniform grid axis to non-uniform histogram axis is only unambiguous when some additional info is available. Here, in particular, we are asking for the position of the axis minimum. This function will throw std::invalid_argument in case the conversion is impossible. ◆ convertToHistoAxis() [2/2]
Convert uniform grid axis to uniform histogram axis ◆ convertToSphericalRandom()
This function converts an N-dimensional random number set from a unit hypercube into a random direction in the N-dim space and a random number between 0 and 1 which can be later used to generate the distance from the origin. The method implementation depends crucially on the numerical accuracies of the "inverseGaussCdf" and "incompleteGamma" special functions (it is likely that both can be improved, especially the latter). The function returns the "remaining" random number. The "direction" array which must have at least "dim" elements is filled with the random direction vector of unit length. If "getRadialRandom" argument is set to "false", the radial random number is not generated and -1 is returned. Use this to increase the code speed if only the random direction itself is needed. ◆ convolutionHistoMap()
template<typename Histo >
Generate a density scanning map for subsequent use with the "DensityScanND" template when a density is to be convolved with the histogram data. Only histograms with uniform binning can be used here. The "doubleDataRange" should be set "true" in case the data will be mirrored (or just empty range added) to avoid circular spilling after convolution. ◆ copyBuffer()
template<typename T1 , typename T2 >
Copy a buffer (with possible type conversion on the fly) ◆ correctDensityEstimateGHU()
Normalize and correct a discretized density estimate which might not be non-negative everywhere. The method comes from the paper by Ingrid K. Glad, Nils Lid Hjort and Nikolai G. Ushakov, Scandinavian Journal of Statistics, Vol. 30, No. 2 (2003), pp. 415-427. ◆ covmatFromHessian()
Make a guess about the covariance matrix. This will be just the inverse of the Hessian in case the Hessian is positive-definite. ◆ cumulantUncertainties()
template<typename Real >
Array "cumulants" should have at least maxCumOrder+1 elements. Array "uncertainties" should have at least maxUncertOrder+1 elements. maxCumOrder must be larger than or equal to 2*maxUncertOrder. On exit, uncertainties[1] will contain the expected uncertainty of the first cumulant (i.e., the mean) for the sample of size "sampleSize", uncertainties[2] will contain the expected uncertainty of the second cumulant, etc. The formulae used can be found in Chapter 12 of Stuart and Ord. ◆ dawsonIntegral()
Dawson's integral exp(-x^2) Int_0^x exp(t^2) dt ◆ definiteIntegral_1()
Integrate[(a x + b)/Sqrt[x (1-x)], {x, xmin, xmax}] for the case both xmin and xmax are inside the [0, 1] interval ◆ densityFitLogLikelihood1D()
template<class Histo >
This function calculates log-likelihood of a 1-d histogram fit by a 1-d density (the value can then be maximized – see, for example, MinuitDensityFitFcn1D.hh header in the "interfaces" section). The length of the "binMask" array should be the same as the number of bins in the histogram. Only those bins will be used for which corresponding mask values are not 0. The "workBuffer" array should be at least as long as the number of bins in the histogram. "minlog" is the minimal value of log-likelihood which can be contributed by one data point (that is, by a part of the bin height which corresponds to 1 point). This should normally be a negative number with reasonably large magnitude. "nQuad" is the number of quadrature points to use for calculating the density integral inside each bin (should be supported by GaussLegendreQuadrature class). If this parameter is set to 0, cumulative density function will be used. "densityArea" and "enabledBinCount" can be used to obtain the area of the density and the bin count for masked bins. The code essentially assumes Poisson distribution of counts for each bin, constrained by the requirement that the sum of expected events should be equal to the sum of events observed (all of this is relevant for unmasked bins only). Note that, although histogram scaling will not affect the central value of the fit, it will affect the fit error determination. Therefore, it is important to use histograms with actual event counts rather than any kind of a scaled version. ◆ densityScanHistoMap()
template<typename Histo >
Generate a density scanning map for subsequent use with the "DensityScanND" template. Naturally, only histograms with uniform binning can be used here. ◆ destroyBuffer()
template<typename T >
Function for freeing memory buffers allocated by "makeBuffer" ◆ deviationsRestrictedSize()
template<typename Numeric , unsigned Len>
The following function returns the size of the leading principal submatrix whose absolute deviations from the unit matrix do not exceeed the requested tolerance. ◆ diag() [1/2]
template<typename Numeric >
Utility for making square diagonal matrices from the given array ◆ diag() [2/2]
template<typename Numeric >
Utility for making rectangular diagonal matrices from the given array. The length of the array must be at least min(nrows, ncols). ◆ distributionReadError()
Simple utility function which throws an appropriate IOException when "read" function fails for some I/O-capable statistical distribution class ◆ divideTransforms()
Divide two arrays of FFTW complex numbers ◆ doubleShape()
Multiply the size in each dimension by 2 ◆ dumpNtupleAsText()
template<typename T >
Function for dumping ntuples into text files, one row per line. By default, column values will be separated by a single white space. If "insertCommasBetweenValues" is "true" then column values will be separated by ", ". Only the data is dumped, not the info about the ntuple structure. This function will only work with T objects that have default constructors. "true" is returned on success, "false" on failure. ◆ dumpVectorAsText()
template<typename T >
Function for dumping vectors into text files, one element per line. Note that, while this function will work with T objects that do not have default constructors, it will not be possible to read such objects back. "true" is returned on success, "false" on failure. Dumping less than "nElementsToDump" elements is considered a success. ◆ edgeworthSeriesMethodName()
Method names corresponding to enums ◆ eigenMethodName()
Method names corresponding to enums ◆ empiricalCdf()
template<typename Data >
The inverse to the npstat::empiricalQuantile (if there are no duplicate entries in the data). The data vector must be sorted. ◆ empiricalCopulaDensity()
template<class Point , class Array >
In the "empiricalCopulaDensity" function the assumption is that the result indices correspond to points equdistant inside the unit hypercube. That is, array index 0 corresponds to copula argument 0.5/N, where N is the length of relevant dimension. Array index N-1 corresponds to 1.0 - 0.5/N. Class Point should be subscriptable. ◆ empiricalCopulaHisto()
template<class Point , typename Histo >
Function for building copula density out of sets of points by ordering the points and remembering the order in each dimension. The histogram is filled with point counts. All histogram axes should normally have minimum at 0.0 and maximum at 1.0. The input "data" vector will be reordered by this function. It is assumed that the "data" items have correct coordinates but that their order has not been established yet. A typical way to fill such a vector is to use the "fillOrderedPoints" function declared in the "OrderedPointND.hh" header. See also functions declared in the empiricalCopula.hh header which can perform similar tasks. This function assumes (but does not check) that the argument histogram is defined on the unit multivariate cube. The histogram will be reset before it is filled from the provided data. The histogram contents will not be normalized after the histogram is filled. If you want normalized result, call the "convertHistoToDensity" function (declared in the HistoND header). The dimensionality of the points can be larger than the dimensionality of the histogram. In this case only the leading dimensions are used. ◆ empiricalQuantile()
template<typename Data >
This function calculates an empirical quantile from a sorted vector of data points. If the "increaseRange" parameter is "true" then the value returned for x = 0.0 will be smaller than data[0] and the value returned for x = 1.0 will be larger than the largest data value. ◆ externalMemArrayND()
template<typename Numeric >
This function allows you to manage external data memory using ArrayND objects. It can be used, for example, to create arrays which live completely on the stack and still manage sizeable chunks of already allocated memory. It is up to the user of this function to ensure that the size of the memory buffer is consistent with the shape of the array. Note that the responsibility to manage external data passes to another ArrayND through a move constructor or through a move assignment operator but not through a normal copy constructor or assignment operator. Because of this feature, this function works only with compilers supporting C++11. ◆ factorial()
Simple factorial. Will generate a run-time error if n! is larger than the largest unsigned long. ◆ fill1DHistoWithCDFWeights()
template<class Axis , class Collection , class CoordExtractor >
Fill histogram with weighted values. Weights are calculated on the basis of position of an entry in a sorted collection (typically, std::vector). If the collection has length N, the entry with number m is assigned the "coordinate" x = m/(N - 1). Then the argument density is used to calculate a weight from this "coordinate". The coordinate with which the histogram is filled is calculated from each entry by a functor. This functor (represented by the CoordExtractor template parameter) should take (a const reference to) a collection element as its argument and return a double. It is assumed that the density has a finite bandwidth. The collection scan starts near the "coordinate" obtained from the density "location" function and proceeds up and down only until the first point with 0 weight is encountered. Typical densities useful with this function are TruncatedGauss1D and SymmetricBeta1D. The function returns the Kish's effective sample size for the histogram fills performed. ◆ fillArrayCentersPreservingAreas()
template<typename Num1 , unsigned StackLen1, unsigned StackDim1, typename Num2 , unsigned StackLen2, unsigned StackDim2>
Fill the second array from the first one in a special way. It is assumed that, in each dimension, the number of elements in the second array is a multiple of the number of elements in the first one. Only the "central" element in the target array will be filled, with the value increased in the proportion of the number of elements factor. Other elements of the target array will be filled with zeros. This function is useful in certain rebinning and smoothing scenarios. ◆ fillFlatFromFlat()
template<typename Num1 , typename Num2 >
Fill a flat vector from another flat vector ◆ fillHistoFromText()
template<class Histo >
Fill a histogram with data from a text file. The text file must contain columns of numbers. The array "columnsToUse" defines which columns to histogram. The dimensionality of this array is assumed to be equal to the dimensionality of the histogram. Empty lines, lines which consist of pure white space, and lines which start with an arbitrary amount of white space (including none) followed by '#' are ignored (considered comments). "true" is returned on success, "false" on failure. ◆ fillNtupleFromText()
template<typename T >
Function for filling ntuples from text files, one row per line. Will work with T objects that have default constructors. "true" is returned on success, "false" on failure. There may be more columns (but not less) in the file than in the ntuple. In this case extra columns are ignored. Empty lines, lines which consist of pure white space, and lines which start with an arbitrary amount of white space (including none) followed by '#' are ignored (considered comments). ◆ fillOrderedFromFlat()
template<typename Num2 , class Point >
Fill a vector of ordered points using flat data. Flat data is divided into chunks "stride" elements long. Then numbers are picked up from these chunks according to "dimsToUse" and "nDimsToUse" arguments. ◆ fillOrderedPoints()
template<class Point1 , class Point2 >
Fill a vector of ordered points using unordered points (but do not order them yet). Note that the output vector is not cleared before filling. Class Point1 should be subscriptable. Parameters "dimsToUse" and "nDimsToUse" specify how dimensions of Point2 should be constructed out of those in Point1. ◆ fillPointsFromFlat()
template<typename Num2 , class Point >
Fill a vector of points using flat data ◆ fillVectorFromText()
template<typename T >
Function for filling vectors from text files, one element per line. Will only work with T objects that have default constructors. "true" is returned on success, "false" on failure. Empty lines, lines which consist of pure white space, and lines which start with an arbitrary amount of white space (including none) followed by '#' are ignored (considered comments). ◆ findPeak3by3()
The "findPeak3by3" function assumes that the grid locations are -1, 0, and 1 in both directions. Correct shifting and scaling is up to the user of this function (use the "rescaleCoords" method of the Peak2D class). The function returns "true" if a valid peak is found inside the 3x3 mask and "false" otherwise. ◆ findPeak5by5()
The "findPeak5by5" function assumes that the grid locations are -2, -1, 0, 1, and 2 in both directions. Correct shifting and scaling is up to the user of this function (use the "rescaleCoords" method of the Peak2D class). The function returns "true" if a valid peak is found inside the 5x5 mask and "false" otherwise. ◆ findRootInLogSpace()
template<typename Result , typename Arg1 >
Numerical equation solving for 1-d functions using interval division. Input arguments are as follows: f – The functor making up the equation to solve: f(x) == rhs. The comparison operator "<" must be defined for the Result type. rhs – The "right hand side" of the equation. x0 – The starting point for the search. For Arg1, operation of multiplication by a double must be defined. The space searched for solution will be x0*c, where c is a positive constant (for example, Arg1 can be a vector). tol – Tolerance parameter. Typically, the found solution will be within a factor of 1 +- tol of the real one. x – Location where the solution will be stored. logstep – Initial step in the log space. The code will first try the points x0*exp(logstep) and x0/exp(logstep) to bound the root, and will descend along the slope from there. The function returns "true" if it finds the root, "false" otherwise. ◆ findRootNewtonRaphson()
template<typename Numeric >
Numerical equation solving for 1-d functions using the Newton–Raphson method. Input arguments are as follows: f – The functor making up the equation to solve, returning a pair. The first element of the pair is the function value (for which "rhs" is the desired value) and the second element is the derivative. rhs – The "right hand side" of the equation. x0 – The starting point for the search. tol – Tolerance parameter. Typically, the found solution will be within a factor of 1 +- tol of the real one. x – Location where the solution will be stored. deriv – Location to store the derivative at x (if desired). The function returns "true" if it finds the root, "false" otherwise. Typically, "Numeric" should be ether float or double. ◆ findRootUsingBisections()
template<typename Result , typename Arg1 >
Numerical equation solving for 1-d functions using interval division. Input arguments are as follows: f – The functor making up the equation to solve: f(x) == rhs. The comparison operator "<" must be defined for the Result type. rhs – The "right hand side" of the equation. x0, x1 – The starting interval for the search. tol – Tolerance parameter. Typically, the found solution will be within a factor of 1 +- tol of the real one. root – Location where the root will be written. This could also be a discontinuity point of f(x) or a singularity. The function returns "false" in case the initial interval does not bracket the root. In this case *root is not modified. ◆ findScanMinimum1D()
The input arrays must have at least two elements. Coordinates must be monotonously increasing. ◆ Functor0Ref()
template<typename Result >
Convenience function for creating Functor0RefHelper instances ◆ Functor1Ref()
template<typename Result , typename Arg1 >
Convenience function for creating Functor1RefHelper instances ◆ Functor2Ref()
template<typename Result , typename Arg1 , typename Arg2 >
Convenience function for creating Functor2RefHelper instances ◆ Functor3Ref()
template<typename Result , typename Arg1 , typename Arg2 , typename Arg3 >
Convenience function for creating Functor3RefHelper instances ◆ FunctorPowerFcn()
template<class Functor >
Utility function for making FunctorPowerFcnHelper objects ◆ FunctorPowerFcnCp()
template<class Functor >
Utility function for making FunctorPowerFcnCpHelper objects ◆ Gamma()
The gamma function for positive real arguments ◆ gaussianKurtosisUncertainty()
Kurtosis uncertainty for the Gaussian distribution ◆ gaussianMedianUncertainty()
Uncertainty of the median estimate for the Gaussian distribution ◆ gaussianResponseMatrix() [1/2]
Function arguments are as follows: unfoldedAxis – Discretization of the "unfolded" (that is, physical) space. Binning is expected to be non-uniform. observedAxis – Discretization of the "observed" space. Binning is expected to be non-uniform. gaussMeanFunction – this functor calculates the mean of the observed gaussian for the given value in the physical space. Use "npstat::Same<double>()" as an argument if you not not want to add an extra shift to the point location. gaussWidthFunction – this functor calculates the width of the observed gaussian for the given value in the physical space. Use "npstat::ConstValue1<double,double>(width)" as an argument if you want a constant width. nQuadraturePoints – the number of points to use for integration in each dimension. This should be one of the numbers of points supported by the GaussLegendreQuadrature class. This function will integrate the Gaussian density in the observed space and average it in the unfolded space using Gauss-Legendre quadratures with the given number of points in each space. ◆ gaussianResponseMatrix() [2/2]
Function arguments are as follows: unfoldedMin, unfoldedMax – boundaries for the "unfolded" (that is, physical) space. nUnfolded – number of subdivisions for the unfolded space. Uniform binning is used. observedMin, observedMax – boundaries for the "observed" space. nObserved – number of subdivisions for the observed space. Uniform binning is used. gaussMeanFunction – this functor calculates the mean of the observed gaussian for the given value in the physical space. Use "npstat::Same<double>()" as an argument if you not not want to add an extra shift to the point location. gaussWidthFunction – this functor calculates the width of the observed gaussian for the given value in the physical space. Use "npstat::ConstValue1<double,double>(width)" as an argument if you want a constant width. nQuadraturePoints – the number of points to use for integration in each dimension. This should be one of the numbers of points supported by the GaussLegendreQuadrature class. This function will integrate the Gaussian density in the observed space and average it in the unfolded space using Gauss-Legendre quadratures with the given number of points in each space. ◆ gaussianSkewnessUncertainty()
Skewness uncertainty for the Gaussian distribution ◆ gaussianStdevUncertainty()
Uncertainty of the standard deviation estimate for the Gaussian distribution ◆ gaussianVarianceUncertainty()
Uncertainty of the variance (not standard deviation!) estimate for the Gaussian distribution ◆ gegenbauerSeriesSum()
template<typename Numeric >
Series for Gegenbauer polynomials. "lambda" is the parameter. ◆ gen_matrix_eigensystem()
template<typename Numeric >
Determine eigenvalues and eigenvectors of a general matrix (LAPACK DGEEV routine is used for doubles) ◆ gen_matrix_eigenvalues()
template<typename Numeric >
Determine eigenvalues of a general matrix (LAPACK DGEEV routine is used for doubles) ◆ gen_matrix_svd()
template<typename Numeric >
Singular value decomposition (by DGESVD, etc) of M x N matrix. This function returns "true" on success or "false" on failure. ◆ gen_matrix_svd_dc()
template<typename Numeric >
Divide and conquer SVD (by DGESDD, etc) of M x N matrix. This function returns "true" on success or "false" on failure. ◆ getBoundaryFilter1DBuilder()
A factory method for creating AbsBoundaryFilter1DBuilder objects. Parameters are as follows: boundaryHandling – Definition of the boundary handling method. distro – Function to use as the weight for generating LOrPE polynomials. This weight is expected to be even and to peak at 0. Note that the filter builder will not own this pointer. It is the responsibility of the user of this code to make sure that the function exists at all times when the builder is used. stepSize – Step size (bin width) for the data grid on which density estimation will be performed. exclusionMask – Set values of "exclusionMask" != 0 if corresponding data points have to be excluded when weights are generated. If no exclusions are necessary, just leave this array as NULL. exclusionMaskLen – Length of the "exclusionMask" array. If it is not 0 then it must coinside with the "datalen" argument given to all future invocations of the "makeFilter" method. excludeCentralPoint – If "true", the central point of the weight will be set to zero. This can be useful for certain types of cross validation calculations. ◆ goldenSectionSearchInLogSpace()
template<typename Result , typename Arg1 >
Search for 1-d function minimum using the golden section method. Input arguments are as follows: f – The functor whose value we are minimizing. The comparison operator "<" must be defined for the Result type. x0 – The starting point for the search. For Arg1, operation of multiplication by a double must be defined. The space searched for minimum will be x0*c, where c is a positive constant (for example, Arg1 can be a vector). tol – Tolerance parameter. It will often be reasonable to set tol = sqrt(DBL_EPSILON). Typically, the found minimum will be within a factor of 1 +- tol of the real one. minimum – Location where the found minimum will be stored. fmin – Provide this if you want the function value at the minimum. logstep – Initial step in the log space. The code will first try the points x0*exp(logstep) and x0/exp(logstep) to bound the minimum, and will descend along the slope from there. "false" will be returned if the initial interval bounds a maximum instead. The function returns "true" if it finds the minimum, "false" otherwise. ◆ goldenSectionSearchOnAGrid()
Search for 1-d function minimum using the golden section method on a discrete grid. The purpose is to find a 3-cell interval whose middle cell corresponds to the lowest function value. This search can be useful for functions that are expensive to evaluate: it memoizes the results internally and will never call the function twice with the same argument. The function returns the search status. Input arguments are as follows: f – The functor whose value we are minimizing. axis – The axis which defines locations of the grid points. i0 – The starting grid cell for the search. initialStep – The initial step, in units of cells. imin – Location where the found minimum will be stored in case returned status in not MIN_SEARCH_FAILED. fMinusOne, fmin, fPlusOne – In case the status is MIN_SEARCH_OK, these will contain function values the at the grid cell which precedes the minimum, at the minimum cell, and at the next cell, respectively. In case the status is MIN_ON_LEFT_EDGE, *fMinusOne will be set to *fmin. In case the status is MIN_ON_RIGHT_EDGE, *fPlusOne will be set to *fmin. In case the status is MIN_SEARCH_FAILED, the results are undefined. ◆ griddedRobustRegression()
template<unsigned MaxDim, unsigned MaxReplace>
A generic framework for local robust regression on regular grids. It is designed, in particular, in order to implement local least trimmed squares (LLTS) but can be used with other loss types as well. Works by replacing (i.e., adjusting) grid values with the highest local loss in the whole grid by values fitted from local surroundings using, for example, local orthogonal polynomials. A two-run use is envisioned: during the first run, the user accumulates the "history" of grid replacements. Then the history is analyzed (typically, the local loss time series will exhibit a characteristic "knee" when all outliers have been detected and adjusted). During the second run, the regression is performed up to the desired history point. Automatic history analyzers might be added in the future, but for now it is best to just eyeball the history with the help of some convenient ploting package. Compared to other statistical methods in this package, this function is expected to be very slow (which is not atypical for robust techniques). Function template arguments are as follows. These parameters should be the same as those in the provided loss calculator. MaxDim – Maximum dimensionality of the data grid. The actual dimensionality must not exceed this parameter. MaxReplace – Maximum number of grid point adjustments performed in a single local adjustment cycle. Function arguments are as follows: in – Input data for regression. slidingWindowSize – Size of the local window for use by the loss calculator. The size in each dimension must be odd and larger than 1. slidingWindowDim – Number of elements in the slidingWindowSize array (and dimensionality of the predictor space). Naturally, the code will check that the input data dimensionality is compatible with this parameter. lossCalc – Local loss calculator. Will be called on data in all possible local windows. Should calculate local least trimmed squares or some other such quantity. See "WeightedLTSLoss.hh" and "TwoPointsLTSLoss.hh" for examples of such calculators. stopCallback – The function stops when this returns "true". The arguments will be the largest remaining loss and the number of replacements made so far. See the "GriddedRobustRegressionStop.hh" header for a simple example of such a callback. observationCallback – Callback to call with the current values of data (to monitor progress). Can be NULL. observationFrequency – How often to call the above callback. Set this argument to 0 to disable these calls completely. out – Replaced array (on exit). Can point to the same area as "in" (naturally, "in" will be destroyed in this case). replacementHistory – History of replacements. This argument can be NULL in which case the history is not generated. This function will not clear a previously filled history but will append to it. verbose – If "true", will print the location of the local window in which the replacements are made to the standard output. This function returns the total number of replacements made. ◆ halfShape()
Divide the size in each dimension by 2 (generate dynamic fault if odd) ◆ hermiteSeriesSumPhys()
template<typename Numeric >
Series for Hermite polynomials orthogonal with weight exp(-x*x). These are sometimes called the "physicists' Hermite polynomials". The weight exp(-x*x) is also used in the "GaussHermiteQuadrature" class. ◆ hermiteSeriesSumProb()
template<typename Numeric >
Series for Hermite polynomials orthogonal with weight exp(-x*x/2). These are sometimes called the "probabilists' Hermite polynomials". ◆ histoCovariance()
template<class Histo , unsigned Len>
Calculate covariance matrix for the histogrammed data. Upon exit, the size of the matrix will be N x N, where N is the histogram dimensionality. ◆ histoDensity1D()
template<class Histo >
Construct a "BinnedDensity1D" object out of a uniformly binned 1-d histogram. All histogram bins must be non-negative. ◆ histoDensityND()
template<class Histo >
Construct a "BinnedDensityND" object out of a uniformly binned multivariate histogram. All histogram bins must be non-negative. ◆ histoMean()
template<class Histo >
Calculate the mean of histogrammed data. The length of the "mean" array should be equal to or exceed the dimensionality of the histogram. ◆ histoQuantiles()
template<class Histo >
Calculate quantiles for the given histogram axis assuming that the histogram contents are non-negative. All other variables are marginalized. Overflow bins are ignored. ◆ incompleteBeta()
Regularized incomplete beta function ◆ incompleteGamma()
Incomplete gamma ratio ◆ incompleteGammaC()
Incomplete gamma ratio complement ◆ integralOfSymmetricBetaSquared() [1/2]
Integral of kernel squared for Gaussian (power < 0) and symmetric Beta kernels ◆ integralOfSymmetricBetaSquared() [2/2]
Integral of kernel squared for Gaussian (power < 0) and symmetric Beta kernels within some definite limits ◆ interpolate_cubic()
template<typename T >
Cubic interpolation. Assume that the function values are given at 0, 1, 2, and 3. ◆ interpolate_linear()
template<typename T >
Linear interpolation. Assumes that the function values are given at 0 and 1. ◆ interpolate_quadratic()
template<typename T >
Quadratic interpolation. Assume that the function values are given at 0, 1, and 2. ◆ interpolateHistoND() [1/2]
template<typename Float , class Axis >
The interpolation degree in this method can be set to 0, 1, or 3 which results, respectively, in closest bin lookup, multilinear interpolation, or multicubic interpolation. Value of the closest bin inside the histogram range is used if some coordinate is outside of the corresponding axis limits. ◆ interpolateHistoND() [2/2]
template<typename Float , class Axis >
Convenience function for interpolating histograms, with an explicit coordinate argument for each histogram dimension ◆ interpolation_coefficients()
template<typename T >
Get the coefficients of the interpolating polynomial. The interpolated function values are provided at 0, 1, ... The return value of the function is the number of coefficients (i.e., the polynomial degree plus one). On exit, the coefficients are placed into the "buffer" array in the order of increasing monomial degree. The length of the provided buffer must be sufficient to hold all these coefficients. ◆ intPower()
Precise integer function for the power. Will generate a run-time error if a^n exceeds the largest unsigned long. ◆ inverseExpsqIntegral()
Inverse of the integral Int_0^x exp(t^2) dt ◆ inverseGaussCdf()
Inverse cumulative distribition function for 1-d Gaussian ◆ inverseIncompleteBeta()
Inverse regularized incomplete beta function ◆ inverseIncompleteGamma()
Inverse incomplete gamma ratio ◆ inverseIncompleteGammaC()
Inverse incomplete gamma ratio complement ◆ invert_general_matrix()
template<typename Numeric >
Invert a general matrix (LAPACK DGETRF/DGETRI routines are used for doubles) ◆ invert_posdef_sym_matrix()
template<typename Numeric >
Invert a positive definite symmetric matrix (LAPACK DPOTRF/DPOTRI routines are used for doubles) ◆ invert_sym_matrix()
template<typename Numeric >
Invert a symmetric matrix (LAPACK DSYTRF/DSYTRI routines are used for doubles) ◆ invert_td_matrix()
template<typename Numeric >
Invert a tridiagonal matrix (LAPACK routines DGTSV/SGTSV are used for doubles) ◆ isMonotonous()
template<class Iter >
Check if the sequence of values is either non-increasing or non-decreasing ◆ isNonDecreasing()
template<class Iter >
Check if the sequence of values is not decreasing ◆ isNonIncreasing()
template<class Iter >
Check if the sequence of values is not increasing ◆ isStrictlyDecreasing()
template<class Iter >
Check if the sequence of values is strictly decreasing ◆ isStrictlyIncreasing()
template<class Iter >
Check if the sequence of values is strictly increasing ◆ isStrictlyMonotonous()
template<class Iter >
Check if the sequence of values is strictly increasing or decreasing ◆ isSubShape()
This function returns true if the number of elements is the same in both vectors and every element of the first vector does not exceed corresponding element of the second ◆ KDE1DFunctor()
template<typename Numeric >
A convenience function for making lightweight KDE functors ◆ KDE1DLSCVFunctor()
template<typename Numeric >
A convenience function for creating KDE1DLSCVFunctorHelper objects ◆ KDE1DRLCVFunctor()
template<typename Numeric >
A convenience function for creating KDE1DRLCVFunctorHelper objects ◆ kendallsTauFromCopula()
template<class Array >
Estimate Kendall's tau from an empirical copula. In this function, the array should represent an empirical 2-d copula (constructed, for example, by the calculateEmpiricalCopula function – see header file empiricalCopula.hh). Do not confuse it with the copula density. ◆ kernelSensitivityMatrix() [1/2]
template<class Fcn2D , class OutPoly >
Template class Fcn2D should provide a method "Real operator()(Real x, Real y) const", where "Real" is one of floating point types (long double works best). It is assumed that the interface of class "OutPoly" is similar to that of npstat classes AbsClassicalOrthoPoly1D or ScalableClassicalOrthoPoly1D. In the returned matrix, row numbers correspond to the "output" polynomial degrees (y space) and column numbers to the "input" polynomial degrees (x space). Set "normalizeKernel" parameter to "false" if the kernel is already known to be normalized, that is, Int_ymin^ymax K(x, y) dy = 1 for every x. ◆ kernelSensitivityMatrix() [2/2]
template<class Fcn2D , class InPoly , class OutPoly >
Template class Fcn2D should provide a method "Real operator()(Real x, Real y) const", where "Real" is one of floating point types (long double works best). It is assumed that the interfaces of classes "InPoly" and "OutPoly" are similar to those of npstat classes AbsClassicalOrthoPoly1D or ScalableClassicalOrthoPoly1D. In the returned matrix, row numbers correspond to the "output" polynomial degrees (y space) and column numbers to the "input" polynomial degrees (x space). Set "normalizeKernel" parameter to "false" if the kernel is already known to be normalized, that is, Int_ymin^ymax K(x, y) dy = 1 for every x. ◆ ldBinomialCoefficient()
Binomial coefficient as a long double. Has much larger dynamic range than the version which returns unsigned long. ◆ ldfactorial()
Factorial as a long double. Although imprecise, this has much larger dynamic range than "factorial". ◆ legendreSeriesSum()
template<typename Numeric >
Series using Legendre polynomials. 0th degree coefficient comes first. Although any value of x can be specified, the result is not going to be terribly meaningful in case |x| > 1. ◆ likelihoodStatisticName()
Statistic names corresponding to enums ◆ linear_least_squares()
template<typename Numeric >
Solve an overdetermined linear system in the least squares sense (DGELSD is used for doubles). This function returns "true" on success or "false" on failure. ◆ logfactorial()
Natural log of a factorial (using Stirling's series for large n) ◆ logLikelihoodPeak()
Function which summarizes properties of one-dimensional log-likelihoods. The number of points in the curve must be at least 3. Typical value of "down" is 0.5 for 1 sigma, 2.0 for 2 sigmas, 4.5 for 3 sigmas, etc. leftPointCoordinate and rightPointCoordinate are the leftmost and rightmost coordinates which correspont to the first and the last value of the "curve" array, respectively. It is assumed that all other point coordinates are equidistantly spaced in between. ◆ LOrPE1DFunctor()
template<typename Numeric , class Fcn >
A convenience function for making lightweight LOrPE functors ◆ LOrPE1DGlobLSCVFunctor()
template<typename Numeric , class BwFcn >
A convenience function for creating LOrPE1DLSCVFunctorHelper objects that perform global cross-validation (i.e., without CV localization function). ◆ LOrPE1DGlobRLCVFunctor()
template<typename Numeric , class BwFcn >
A convenience function for creating LOrPE1DRLCVFunctorHelper objects that perform global cross-validation (i.e., without CV localization function). ◆ LOrPE1DLSCVFunctor()
template<typename Numeric , class BwFcn , class WFcn >
A convenience function for creating LOrPE1DLSCVFunctorHelper objects ◆ LOrPE1DRLCVFunctor()
template<typename Numeric , class BwFcn , class WFcn >
A convenience function for creating LOrPE1DRLCVFunctorHelper objects ◆ LOrPE1DSimpleLSCVFunctor()
template<typename Numeric >
A convenience function for creating LOrPE1DLSCVFunctorHelper objects that perform global cross-validation (i.e., without CV localization function) and use constant bandwidth. ◆ LOrPE1DSimpleRLCVFunctor()
template<typename Numeric >
A convenience function for creating LOrPE1DRLCVFunctorHelper objects that perform global cross-validation (i.e., without CV localization function) and use constant bandwidth. ◆ lorpeBackground1D()
template<typename Numeric , typename NumIn , typename NumOut >
A driver function for density estimation from histograms using LOrPE in a composite signal plus background model in which signal is represented by a parametric distribution. The function arguments are: histo – Naturally, the histogram to fit. It is assumed that the histogram bins are not scaled and contain the actual unweighted event counts. fbuilder – This object will generate local polynomial filters using densities from the symmetric beta family as weights. This argument is not a LocalPolyFilter1D already so that some memoization is allowed (this might be useful for speeding up the calculation). bm – Boundary handling method. signal – Distribution to use for modeling the signal. Will be internally renormalized to that it integrates to 1 on the histogram support interval. signalFraction – Fraction of the signal in the sample. Must belong to the (-1, 1) interval. nIntegrationPoints – How many points to use in order to integrate the parametric signal density across each bin. If this is specified as 0 then the difference of cumulative densities at the bin edges will be used, otherwise Gauss-Legendre quadrature will be employed (so that the number of points must be either 1 or one of the numbers supported by the "GaussLegendreQuadrature" class). For properly implemented densities, 0 should be the best option. initialApproximation – Initial approximation for the background density (one array element per input histogram bin). Can be specified as NULL in which case the uniform density is used as the initial approximation. lenApproximation – Length of the initial approximation array. Must be equal to the number of histogram bins. m – Choose the kernel from the symmetric beta family proportional to (1 - x^2)^m. If m is negative, Gaussian kernel will be used, truncated at +- 12 sigma. bandwidth – The bandwidth for the kernel used to generate the LOrPE polynomials. maxDegree – Degree of the LOrPE polynomial. Interpretation of non-integer arguments is by the "continuousDegreeTaper" function. convergenceEpsilon – We will postulate that the iterations have converged if the L1 distance between the background distributions obtained in two successive iterations is less than this number. Must be non-negative. maxIterations – Maximum number of iterations allowed. signalDensity – Buffer for storing the signal density integrated over histogram bins. This result will be normalized to 1 on the histogram support interval. Can be specified as NULL if not needed. lenSignalDensity – Length of the "signalDensity" buffer. bgDensity – Buffer for storing the background density estimate. This result will be normalized to 1 on the histogram support interval. Can be specified as NULL if not needed. lenBgDensity – Length of the "bgDensity" buffer. workspace – If this function is called many times on the same histogram, it is recommended to reuse the same workspace buffer. This will save a few memory allocation calls for various internal needs. densityMinusOne – If provided, a buffer for storing the results in which the background density is estimated for every bin after removing one event from that bin (assuming that at least one event is present in that bin) or, depending on "cvmode", after removing the whole bin. Intended for subsequent use in cross validation. Note that requesting this calculation will slow the code down considerably. Strictly speaking, the returned numbers are calculated for non-empty bins only, they are not really a density, and they are intended purely for cross validation (there should be no attempt to normalize them). lenDensityMinusOne – Length of the "densityMinusOne" buffer. cvmode – If "densityMinusOne" array is provided, this parameter affects calculation of this density. For binned densities, this calculation can be performed in a number of ways which differ in their treatment of discretization effects. Possible modes are: CV_MODE_FAST – Remove the bin and use the same global background approximation to construct the EDF weights for each such bin. CV_MODE_MINUSBIN – Remove the bin and recalculate the background approximation without this bin by iterations. CV_MODE_MINUSONE – Reduce the bin value by 1 and recalculate the background approximation by iterations. CV_MODE_LINEARIZED – Faster, linearized version of CV_MODE_MINUSONE calculation (but might be less reliable). regularizationParameter – If this parameter is non-negative, the code will attempt to figure out the minimum reasonable value of "densityMinusOne" if that density is estimated to be zero at some point which has data present. This minimum density will be inversely proportional to pow(N, regularizationParameter). This feature can be useful in pseudo-likelihood cross validation scenarios. lastDivergence – If provided, *lastDivergence will be filled by the actual divergence between the two most recent iterations used to calculate the background density. The absolute value tells the difference. If the value is negative, this means that there were negative entries that were truncated to zero (in this case one should not expect very good convergence anyway). The function returns the iteration number for which convergence was established. If it is equal to "maxIterations" then the convergence target was not reached. ◆ lorpeBgCVLeastSquares1D()
template<typename Numeric , typename NumOut >
Function that can calculate the least squares cross validation quantity using the output generated by "lorpeBackground1D". This function is using a subset of "lorpeBackground1D" arguments. It is assumed that "lorpeBackground1D" calculations have converged and that "densityMinusOne" was calculated. ◆ lorpeBgCVPseudoLogli1D()
template<typename Numeric , typename NumOut >
Function that can calculate pseudo log-likelihood for cross validation using the output generated by "lorpeBackground1D". This function is using a subset of "lorpeBackground1D" arguments. It is assumed that "lorpeBackground1D" calculations have converged and that "densityMinusOne" was calculated. The "minlog" parameter limits the contribution into the log-likelihood from any single bin. ◆ lorpeBgLogli1D()
template<typename Numeric , typename NumOut >
Function that can calculate log-likelihood (for maximizing likelihood) using the output generated by "lorpeBackground1D" (possibly, after processing by "lorpeRegularizeBgDensity"). This function is using a subset of "lorpeBackground1D" arguments. It is assumed that "lorpeBackground1D" calculations have converged. The "minlog" parameter limits the contribution into the log-likelihood from any single bin. ◆ lorpeMise1D()
Function arguments are as follows: m – Choose the kernel from the symmetric beta family proportional to (1 - x^2)^m. If m is negative, Gaussian kernel will be used, truncated at +- 12 sigma. lorpeDegree – Degree of the LOrPE polynomial. Interpretation of non-integer arguments is by the "continuousDegreeTaper" function (see header file continuousDegreeTaper.hh). bandwidth – Kernel bandwidth. sampleSize – Number of data points in the sample. nintervals – Number of discretization intervals. The CPU time of the algorithm is O(nintervals^3). xmin, xmax – The support of the distribution. Can be arbitrary as long as the distribution is not 0 somewhere on it. distro – The distribution for which the MISE will be determined. bm – Method used to handle LOrPE weight at the boundary. oversample – The number of points to use for calculating the density integral on each discretization interval. Can be 0 (use cdf method for the density), 1 (use density value in the middle of the interval) or any number of integration points supported by the GaussLegendreQuadrature class. ISB – If provided, the location to which this pointer refers will be filled with the integrated squared bias. variance – If provided, the location to which this pointer refers with the variance component of the MISE. This function returns the estimated LOrPE MISE. ◆ lorpeRegularizeBgDensity1D()
template<typename Numeric , typename NumOut >
Function that can "regularize" the background density estimate generated by "lorpeBackground1D" for subsequent log-likelihood estimation. This procedure may be necessary in case there are data points at the locations in which the density was estimated to be 0 (or just too low). Parameter "minBgDensity1" specifies the minimum background-only density which is expected for any bin which has at least one background entry. The return value of the function tells in how many bins the background density had to be adjusted. ◆ make_ModulatedDistribution1D()
template<class Functor >
Convenience function for creating ModulatedDistribution1D objects ◆ make_NtNtupleFill()
template<typename Ntuple >
Helper utility function for making NtNtupleFill objects ◆ make_Triple()
template<class First , class Second , class Third >
Utility function for triples similar to std::make_pair ◆ makeBuffer()
template<typename T >
Function for allocating memory buffers if their size exceeds the size of the buffer available on the stack ◆ makeShape()
This convenience function will construct an array shape using an explicit list of indices ◆ maxBgPointsInWindow1D()
template<typename Numeric , typename NumIn >
Estimate the maximum number of background points contained inside the sliding window with the width "windowWidth". All other parameters have the same meaning as in the "lorpeBackground1D" function. ◆ maxFilterDegreeSupported()
Maximum filter degree supported by AMISE calculations ◆ meanUncertainty()
Uncertainty of the mean estimate ◆ mergeTwoHistos()
template<class H1 , class H2 , class H3 >
This function merges two histograms using a variable weight which usually changes smoothly from 0 to 1 (or from 1 to 0) along a certain direction in the space spanned by histogram axes. Here, the histograms are basically treated as uniform data grids. The function arguments are as follows: h1, h2 – The histograms to merge. Their axes do not have to be the same. w1 – Functor which calculates the weight for histogram 1 (histogram 2 will be assigned weight equal to 1.0 - w1). result – Points to the result histogram (its bin contents will be modified). Dimensionalities of h1, h2, and the result histogram must be the same. truncateWeight – If true, weights calculated by w1 will be truncated so that they are between 0 and 1. interpolationDegree – This argument will be passed to the "interpolateHistoND" functions which will be employed to determine bin contents of h1 and h2 at the bin locations of the result histo. ◆ mixtureModelCumulants()
The meaning of the "order" argument is the same as in the constructor of the EdgeworthSeries1D class. "N" is the sample size. The default values of the shape parameters play essenstially the same role as "None" does in Python. A sufficient number of shape parameters should be provided in order to generate the cumulants up to the order requested. The returned vector of cumulants can be used as the corresponding constructor argument of the EdgeworthSeries1D class. ◆ modifiedGramSchmidt()
template<typename Real , class SprodFunctor >
Numerically stable modified Gram-Schmidt procedure. Builds the orthogonal vectors in-place. All rows will be orthonormal on exit. The input data should represent a collection of input vectors, row-by-row. If this is treated as a matrix, input parameter "nVectors" is the number of rows, and "dim" is the number of columns. Normally, we must have nVectors <= dim. SprodFunctor is a functor which should have a method with the signature that looks something like Real operator()(const Real*, const Real*, unsigned long) const The parameter "vector0AlreadyNormalized" should normally be set to "true" if we are creating an orthogonal polynomial system. For OPSs, the first vector should consist of 1s, while the normalization is performed instead on the weight function. ◆ multiFill1DHistoWithCDFWeights()
template<class Axis , class Collection , class CoordWeightCalc >
Similar function which fills the histogram with a product of weights calculated by "weightCalc" and "coordWeightCalc". "weightCalc" works just like in the previous function. "coordWeightCalc" is a functor which is given two arguments: the collection element and the coordinate of the bin center. For each collection element, each bin of the histogram is filled with the product of the weights returned by "weightCalc" and "coordWeightCalc". The function returns the sum of the weights calculated by "weightCalc" as the first element of the pair and the sum of these weights squared as the second element. ◆ multinomialCovariance1D()
Input arguments are: fcn – the distribution to discretize sampleSize – the number of points in the sample nbins – number of discretization intervals xmin, xmax – discretization region. The input density will be normalized on this region. nIntegrationPoints – determines how many points per bin will be used to calculate the bin average. Can be 0 (use cdf difference at the bin edges), 1 (use density value at the center of the bin), or one of the numbers of points supported by the "GaussLegendreQuadrature" class. The returned matrix will have dimensions nbins x nbins. ◆ MultiplyByConst()
template<typename Result , typename Arg1 , typename Numeric >
Utility function for making MultiplyByConstHelper objects ◆ multiplyTransforms()
Multiply two arrays of FFTW complex numbers ◆ ndUnitSphereArea()
Area of the sphere of unit radius embedded in the n-dimensional space. Should be multuplied by R^(n-1) to get the area of the sphere with arbitrary radius. ◆ ndUnitSphereVolume()
Volume of the n-dimensional sphere of unit radius. Should be multuplied by R^n to get the volume of the sphere with arbitrary radius. ◆ neymanPearsonWindow1D()
neymanPearsonWindow1D function assumes that the signal density is unimodal and that the background density does not vary too quickly so that the S/B density ratio crosses the threshold only twice (once on each side of the signal peak). Function input arguments are as follows: signal – the signal distribution background – the background distribution searchStartCoordinate – starting point for the search. At this point the signal/background density ratio should be above the threshold. initialStepSize – initial step size for the search (in either direction) threshold – ratio of signal/background densities to search for. Must be positive. On exit, this function fills out the contents of leftBound, leftBoundStatus, rightBound, and rightBoundStatus. Checking the status of the window boundary calculation should be considered essential. The meaning of status values is as follows: OK – Found a good solution. The boundary corresponds to the given signal/background density ratio. SUPPORT_BOUNDARY – Support boundary of either signal or background density was reached before solution could be found. INDETERMINATE – The signal/background density ratio became indeterminate (for example, 0/0) before solution could be found. INVALID – The function was called with invalid input values. For example, the density ratio was below the threshold at the starting point. The function returns 0 in case the input arguments pass all sanity checks and an error code otherwise. For precise meaning of different error codes see comments to the return statements inside neymanPearsonWindow1D.cc file. Before calling this function, it may be useful to call "signalToBgMaximum1D". Then the "searchStartCoordinate" could be set to the position of S/B maximum determined by "signalToBgMaximum1D" while the threshold must be set to something smaller than the S/B density ratio at the maximum. ◆ normalDensityDerivative()
Order n derivative of Gaussian density with mean 0 and sigma 1 ◆ normalizeArrayAsDensity()
template<typename Real >
This function sets all negative elements of the input array to zero and normalizes it so that the sum of the elements times the "binwidth" argument becomes 1. If the input array is nowhere positive, std::runtime_error is thrown. "true" is returned in case any negative array elements are found, otherwise the function returns "false". Upon exit (and if the "normfactor" pointer is not NULL), value of *normfactor is set to the factor by which array elements are multiplied so that they become normalized. ◆ ntupleColumns()
Convenience function for creating vectors of std::string using variable number of arguments (from 1 to 10 here) ◆ opsRootsFromJacobiMatrix()
template<class OPS >
The roots are returned in the increasing order ◆ orderedPermutation()
On output, array "permutation" will be filled by the permutation of numbers from 0 to permLen-1 which correspond to the given permutation number. The input permutation number must be less than factorial(permLen). ◆ orthoPolyMethodName()
Method names corresponding to enums ◆ PairFunctorRef()
template<typename Numeric >
Convenience function for creating PairFunctorRefHelper instances ◆ parabolicExtremum()
Find an extremum of a parabola passing through the three given points. The returned value is "true" for minimum and "false" for maximum. std::invalid_argument will be thrown in case some of the x values coincide or if the coordinates describe a straight line. ◆ parseEdgeworthSeriesMethod()
Enums corresponding to method names ◆ parseEigenMethod()
Enums corresponding to method names ◆ parseLikelihoodStatisticType()
Enums corresponding to simplified statistic names ◆ parseOrthoPolyMethod()
Enums corresponding to method names ◆ parseSvdMethod()
Enums corresponding to method names ◆ permutationNumber()
A mapping from a permuted set into a linear sequence. factorial(permLen) should be less than the largest unsigned long. ◆ poissonLogLikelihood()
template<typename Real , typename Numeric >
This function assumes that Poisson distribution parameters are given in the "means" array while the array "counts" contains corresponding observations. "len" is the length of both "means" and "counts" arrays. ◆ poissonProcessCumulants()
The meaning of the "order" argument is the same as in the constructor of the EdgeworthSeries1D class. "mu" is the background rate. The default values of the shape parameters play essenstially the same role as "None" does in Python. A sufficient number of shape parameters should be provided in order to generate the cumulants up to the order requested. The returned vector of cumulants can be used as the corresponding constructor argument of the EdgeworthSeries1D class. ◆ polyAndDeriv()
template<typename Numeric >
Sum and derivative of polynomial series. The length of the array of coefficients should be at least degree+1. ◆ polyIntegralCoeffs()
template<typename Numeric >
Coefficients for the integral of polynomial series. The integration constant is set to 0. The length of the array of coefficients should be at least degree+1, and the length of the buffer for the integral coefficients should be at least degree+2. ◆ polySeriesSum()
template<typename Numeric >
Sum of polynomial series. The length of the array of coefficients should be at least degree+1. The highest degree coefficient is assumed to be the last one in the "coeffs" array (0th degree coefficient comes first). ◆ pooledDiscreteTabulated1D()
Function which pools together two discrete tabulated distributions. Arguments "sampleSize1" and "sampleSize2" define the proportions with which the input distributions are combined. Arguments "first" and "oneAfterLast" define the support of the combined distribution. "First" is the first argument for which the combined density can be be positive (whether it will actually be positive also depends on the supports of the combined distribitions). "OneAfterLast" will be larger by one than the last value of the support. ◆ productResponseMatrix()
template<class Triplet >
Function arguments are as follows: unfoldedBox – boundaries for the "unfolded" (that is, physical) space. unfoldedShape – number of subdivisions for each dimension of the unfolded space. Uniform binning is used. observedBox – boundaries for the "observed" space. observedShape – number of subdivisions for each dimension of the observed space. Uniform binning is used. distro – distribution to use in order to build the response matrix. The method "isScalable" of this distribution must return "true". Locations and scales of "distro" components will be modified while this function runs but will be restored to their initial values upon exit. shifts – these functors should calculate shifts in the physical space, as a function of point location in that space. If the function pointer is NULL, the shift is set to 0. In order to calculate the response in the obseved space, locations of the marginals will be set to the coordinate of the physical space point plus this shift. widthFactors – these functors should calculate width factors used to mutiply the original scales, as a function of point location in the physical space. If the function pointer is NULL, the factor is set to 1. It is expected that some or all of the "distro" marginals will have finite support, otherwise the returned collection of triplets will not be sparse. The first index of each triplet will correspond to the cell number in the observed space, and the second index will correspond to the cell number in the physical space. ◆ quantileBinFromCdf()
template<typename Data >
Find the bin number corresponding to the given cdf value in an array which represents a cumulative distribution function (the numbers in the array must increase). It is expected that the "cdfValue" input is between cdf[0] and cdf[arrLen-1]. ◆ quantileDeltaUncertainty()
Uncertainty of a difference between two quantile values (asymptotic) ◆ quantileUncertainty()
Uncertainty of a quantile value (asymptotic) ◆ randomPermutation()
On output, array "permutation" will be filled by a random permutation of numbers from 0 to permLen-1. "permLen" can be as large as the largest unsigned integer (but you will probably run into memory limitations of your computer first). ◆ randomTukeyDepth1()
template<typename Real >
Random Tukey depth calculated within the sample. Function arguments are as follows: sample should be dimensioned n x d, where d is the point dimensionality and n is the sample size. rng random number generator in 1 or d dimensions. covmat should be dimensioned d x d. The random directions will be generated by a multivariate normal distribution with this covariance matrix. nRandom is the number of random directions to generate. In addition to random directions, the ordering will also be perfomed according to the marginals. depth this array will be filled with depth values on exit (in the same order of points as in the "sample" argument). nDepth is the length of the "depth" array. Should be at least as large as the sample size. usePointDirections if this argument is set to "true", directions from the center of the cloud to each sample point will also be utilized in the depth calculation. The CPU time used by this function will scale as O(nRandom*n*log(n)) if "usePointDirections" is false. If "usePointDirections" is true, another step will be added that scales as O(n^2*log(n)). ◆ randomTukeyDepth2()
template<typename Real1 , typename Real2 >
Random Tukey depth for each point in a sample calculated using another (reference) sample. Function arguments are as follows: sample should be dimensioned n x d, where d is the point dimensionality and n is the sample size. refSample the depth of each point in the "sample" argument will be defined with respect to this "reference" sample. The size of this sample should normally be substantialy larger than n. rng random number generator in 1 or d dimensions. covmat should be dimensioned d x d. The random directions will be generated by a multivariate normal distribution with this covariance matrix. nRandom is the number of random directions to generate. In addition to random directions, the ordering will also be perfomed according to the marginals. depth this array will be filled with depth values on exit (in the same order of points as in the "sample" argument). nDepth is the length of the "depth" array. Should be at least as large as the "sample" size. usePointDirections if this argument is set to "true", directions from the center of the "refSample" cloud to each point of the "sample" will also be utilized in the depth calculation. Assuming that the size of the reference sample is m, the CPU time used by this function will scale as O(nRandom*(m+n)*log(m)) if "usePointDirections" is false. If "usePointDirections" is true, another step will be added that scales as O(n*(m+n)*log(m)). ◆ rectangleIntegralCenterAndSize()
The "integrationPoints" parameter below must be one of the number of points supported by the GaussLegendreQuadrature class. See the corresponding header file for the list of allowed values. Naturally, length of the "rectangleCenter" and "rectangleSize" arrays (with obvious meaning) should be at least "dim". ◆ resampleWithReplacement()
template<typename T >
On output, vector "to" will be filled by a random sample of elements of the vector "from", chosen with replacement, according to the sample size requested. Note that, due to the simplicity of the algorithm and for very large "from" vectors, discreteness of random doubles on [0, 1) can affect resampling uniformity. ◆ rescanArray()
template<typename Num1 , unsigned Len1, unsigned Dim1, typename Num2 , unsigned Len2, unsigned Dim2>
A utility for filling one array using values of another. The array shapes do not have to be the same but the ranks have to be. Roughly, the arrays are treated as values of histogram bins inside the unit box. The array "to" is filled either with the closest bin value of the array "from" or with an interpolated value (if "interpolationDegree" parameter is not 0). interpolationDegree parameter must be one of 0, 1, or 3. ◆ sampleDistro1DWithWeight()
template<class AcceptanceFunction >
AcceptanceFunction is a functor which takes a double as an argument and returns a double on the [0, 1] interval. The elements of the returned triple are: first – sum of weights second – the Kish's effective sample size of the generated sample third – the efficiency of the generator (the ratio between the number of points generated and the number of points attempted) In each generated point (the vector "sample" is filled with them), the first element of the pair is the point coordinate and the second element is the point weight. ◆ sampleKendallsTau()
template<class Point >
Estimate Kendall's tau from a sample of multivariate points. Class Point should be subscriptable. ◆ sampleSpearmansRho()
template<class Point >
Estimate Spearman's rho from a sample of multivariate points. Class Point should be subscriptable. ◆ scalesFromHessian()
Make a guess about typical scales in various directions by looking at the Hessian diagonal elements. The Hessian matrix must be square, and the size of the "scales" array must be at least as large as the number of hessian rows. The diagonal elements of the Hessian are expected to be mostly positive. If they are negative, the "negativeScaleLimit" limit (on the maximum scale) will be imposed on them. ◆ scaleTransform()
Multiply by a scalar. "l" and "result" can be the same. ◆ scanMultivariateDensityAsWeight()
Utility function for scanning multivariate kernels and returning ArrayND of minimal size which encloses the complete density support region. Arguments are as follows: kernel – the density to scan maxDim – maximum number of steps to make in each dimension. All values must be odd. bandwidthSet – density function bandwidth values in each dimension stepSize – scan step size in each dimension arrayLength – number of elements in each of the arrays maxOctantDim, bandwidthSet, and stepSize ◆ scannedKSDistance()
The x values at which the cumulative density functions are compared are determined by spliting the [0, 1] interval into "nScanPoints" subintervals (bins) and then calculating the quantile function of the reference distribution at the center of each subinterval. ◆ scanSymmetricDensityAsWeight()
Utility function for scanning multivariate kernels and returning ArrayND of minimal size which encloses the complete density support region. It is assumed that the density is symmetric under all possible mirror reflections (i.e., whenever the sign of any "x" component changes, the density does not change). Arguments are as follows: kernel – the density to scan. Assumed to be even in any coordinate. maxOctantDim – maximum number of steps to make in each dimension while scanning the hyperoctant bandwidthSet – density function bandwidth values in each dimension stepSize – scan step size in each dimension arrayLength – number of elements in each of the arrays maxOctantDim, bandwidthSet, and stepSize fillOneOctantOnly – set "true" to scan one octant only (including central grid points), "false" to scan the complete support region of the density. In the latter case, array size in the n-th dimension can become as large as 2*maxOctantDim[n] - 1. ◆ signalToBgMaximum1D()
This function attempts to locate the position of the maximum of S/B density ratio. It returns 0 in case the input arguments pass all sanity checks and an error code otherwise. Note that, if a point is found for which background density is 0 and signal density is not, this point will be returned as a result. In this case *maximumSignalToBgRatio will be set to DBL_MAX. ◆ simpleColumnNames()
Generate column names "c0", "c1", ..., "cM", where M = ncols - 1 ◆ simpleEmpiricalCdf()
template<typename Data >
Returns the number of points in the data with values below or equal to x divided by the data size. The data vector must be sorted. ◆ simpleVariableBandwidthSmooth1D()
template<typename Numeric , typename NumOut >
High-level driver routine for "variableBandwidthSmooth1D". It is assumed that one of the symmetric beta family kernels (including the Gaussian for which "symbetaPower" parameter can be set to any negative number) is used to build both the pilot and the final density estimates. The "alpha" parameter is set to 0.5 and the bandwidth is not increased at the boundary. The pilot estimate is generated using the AMISE plugin bandwidth multiplied by the "bandwidthFactor". It is assumed that the correct sample size can be obtained by summing the histogram bin contents, so the histogram should not be scaled. The function returns the pilot bandwidth used. ◆ sineTransformMatrix()
nTerms is the number of sine terms in the transform. The basis functions are sqrt(2)*sin(k*Pi*x), k = 1, 2, ..., nTerms. nDiscrete is the number of points to use for discretizing the [0, 1] interval. Should normally be substantially larger than "nTerms". The returned matrix will have nTerms rows and nDiscrete columns. ◆ solve_linear_system()
template<typename Numeric >
Solve a linear system (LAPACK DGETRF/DGETRS routines are used for doubles). This function returns "true" on success or "false" in case the matrix is degenerate. ◆ solve_linear_systems()
template<typename Numeric >
Solve multiple linear systems (LAPACK DGETRF/DGETRS routines are used for doubles). This function returns "true" on success or "false" in case the matrix is degenerate. ◆ solveCubic()
Find the real roots of the cubic: x**3 + p*x**2 + q*x + r = 0 The number of real roots is returned, and the roots are placed into the "v3" array. Original code by Don Herbison-Evans (see his article "Solving Quartics and Cubics for Graphics" in the book "Graphics Gems V", page 3), with minimal adaptation for this package by igv. ◆ solveQuadratic()
Solve the quadratic equation x*x + b*x + c == 0 in a numerically sound manner. Return the number of roots. ◆ spearmansRhoFromCopula()
template<class Array >
Estimate Spearman's rho from an empirical copula. In this function, the array should represent an empirical 2-d copula (constructed, for example, by the calculateEmpiricalCopula function – see header file empiricalCopula.hh). Do not confuse it with the copula density. ◆ spearmansRhoFromCopulaDensity()
template<class Array >
Estimate Spearman's rho from an empirical copula density. In this function, the array should represent an empirical 2-d copula density (constructed, for example, by the empiricalCopulaDensity function – see header file empiricalCopula.hh). Do not confuse it with the copula. ◆ squaredDerivativeIntegral()
template<typename Real >
This function returns the mathematical functional R(d^n f(x)/d x^n), where function f(x) is given by its tabulated values on a grid with constant distance h between points (it is assumed that each value is given in the middle of a cell, like in a histogram). The functional R(y(x)) is, by definition, the integral of y(x) squared. d^n f(x)/d x^n is the derivative of order n. Note that the table of function values is NOT preserved. ◆ svdMethodName()
Method names corresponding to enums ◆ sym_matrix_eigensystem()
template<typename Numeric >
Determine eigenvalues and eigenvectors of a symmetric matrix (LAPACK DSYEV routine is used for doubles) ◆ sym_matrix_eigensystem_dc()
template<typename Numeric >
Determine eigenvalues and eigenvectors of a symmetric matrix using the divide and conquer LAPACK driver (DSYEVD routine is used for doubles) ◆ sym_matrix_eigensystem_rrr()
template<typename Numeric >
Determine eigenvalues and eigenvectors of a symmetric matrix using the "Relatively Robust Representations" (RRR) LAPACK driver (DSYEVR routine is used for doubles) ◆ sym_matrix_eigenvalues()
template<typename Numeric >
Determine eigenvalues of a symmetric matrix (LAPACK DSYEV routine is used for doubles) ◆ sym_matrix_eigenvalues_dc()
template<typename Numeric >
Determine eigenvalues of a symmetric matrix using the divide and conquer LAPACK driver (DSYEVD routine is used for doubles) ◆ sym_matrix_eigenvalues_rrr()
template<typename Numeric >
Determine eigenvalues of a symmetric matrix using the "Relatively Robust Representations" (RRR) LAPACK driver (DSYEVR routine is used for doubles) ◆ symbetaBandwidthRatio()
Ratio of the symmetric beta kernel AMISE bandwidth to the Gaussian kernel AMISE bandwidth (as in the concept of "canonical bandwidth"). 1.0 will be returned in case the "power" argument is negative. ◆ symbetaLOrPEFilter1D()
The utility function that generates the most common filters using kernels from the symmetric beta family. The arguments are as follows: m – Choose the kernel from the symmetric beta family proportional to (1 - x^2)^m. If m is negative, Gaussian kernel will be used, truncated at +- 12 sigma. bandwidth – Kernel bandwidth. maxDegree – Degree of the LOrPE polynomial. Interpretation of non-integer arguments is by the "continuousDegreeTaper" function. numberOfGridPoints – Length of the data array to be used with this filter (typically, number of histogram bins and such). gridCellSize – Cell size or histogram bin width. boundaryMethod – Method for handling the weight function at the boundary of the density support region. exclusionMask – If provided, array with numberOfGridPoints elements. If an element of this array is not 0, corresponding data point will be excluded from the filtering process. excludeCentralPoint – If "true", the weight will be set to 0 for the central bin of the filter. This can be useful in some cross validation scenarios. ◆ symbetaMultiFilter1D()
The utility function that generates the most common filters using kernels from the symmetric beta family. The arguments are as follows: m – Choose the kernel from the symmetric beta family proportional to (1 - x^2)^m. If m is negative, Gaussian kernel will be used, truncated at +- 12 sigma. bandwidth – Kernel bandwidth. maxDegree – Maximum degree of the LOrPE polynomial. numberOfGridPoints – Length of the data array to be used with this filter (typically, number of histogram bins and such) xmin, xmax – Data grid limits (as in a histogram). boundaryMethod – Method for handling the weight function at the boundary of the density support region. exclusionMask – If provided, array with numberOfGridPoints elements. If an element of this array is not 0, corresponding data point will be excluded from the filtering process. excludeCentralPoint – If "true", the weight will be set to 0 for the central bin of the filter. This can be useful in some cross validation scenarios. ◆ td_sym_matrix_eigenvalues_rrr()
template<typename Numeric >
Determine eigenvalues of a symmetric tridiagonal matrix using the "Relatively Robust Representations" (RRR) LAPACK driver (DSTEVR routine is used for doubles) ◆ transposeBuffer() [1/2]
template<typename T1 , typename T2 >
Copy a buffer (with possible type conversion on the fly) transposing it in the process (treating as a square matrix) ◆ transposeBuffer() [2/2]
template<typename T1 , typename T2 >
Copy a buffer (with possible type conversion on the fly) transposing it in the process (treating as an M x N matrix) ◆ truncatedInverseSqrt()
The following function constructs an inverse square root of a symmetric positive semidefinite matrix (given by the "covmat" argument) by keeping only a certain number of eigenvectors corresponding to the largest eigenvalues. The number of eigenvectors to keep is given by the "nEigenvectorsToKeep" argument. The "result" matrix will have each row set to a kept eigenvector multiplied by the inverse square root of the corresponding eigenvalue. If the egenvalue to keep is 0 or negative, it will be converted into the product of the largest eigenvalue times "eigenvaluePrecision". "eigenvaluePrecision" argument must not be negative. The function returns the ratio of the sum of the rejected eigenvalues of the "covmat" matrix to the total sum of its eigenvalues. In this calculation, all negative eigenvalues are assumed to be due to round-off, so they are converted to 0. std::invalid_argument will be thrown in case something is wrong with the arguments (e.g., the input matrix is not square) and if the largest eigenvalue is not positive. Intended for use in linear least squares problems with degenerate covariance matrices when the "proper" number of degrees of freedom is known in advance. If desired, the function can return the number of eigenvalues affected by the "eigenvaluePrecision" cutoff. ◆ unbinnedLogLikelihood1D()
template<typename Numeric >
Simple unbinned log-likelihood for a 1-d sample of points ◆ unitMatrixDeviations()
template<typename Numeric , unsigned Len, typename T >
The following function returns the size of the maximum absolute deviation as a function of leading principal submatrix size. result[0] will contain the deviation for 1 x 1 submatrix, result[1] will contain the deviation for 2 x 2 submatrix, etc. ◆ validEdgeworthSeriesMethodNames()
All valid method names for use in error messages, etc ◆ validEigenMethodNames()
All valid method names for use in error messages, etc ◆ validLikelihoodStatisticNames()
All valid statistic names or use in error messages, etc ◆ validOrthoPolyMethodNames()
All valid method names for use in error messages, etc ◆ validSvdMethodNames()
Valid method names for use in error messages, etc ◆ variableBandwidthSmooth1D()
template<typename Numeric , typename Num2 , typename NumOut >
This function performs kernel density estimation with variable bandwidth which changes for each bin of the data histogram in the inverse proportion to the pilot density estimate at that bin to some power given by the parameter "alpha" (it seems 0.5 works well as "alpha" argument in many contexts). The pilot estimate must be positive for any bin that has any data in it. This pilot can be created, for example, using fixed bandwidth polynomial filter of degree 0 (which is equivalent to KDE with boundary kernels). It is assumed that the kernel has its "center" at 0. It should normally be symmetric around 0. Boundary kernel adjustment is performed automatically. The variable bandwidth values will be adjusted in such a way that their geometric mean will be equal to the argument "bandwidth". The "increaseBandwidthAtBoundary" argument allows the user to use wider kernels at the boundaries of density support region (to compensate for kernel leakage outside the support region). This adjustment comes after the geometric mean normalization, so that geometric mean normalization will not hold strictly in this case. Number of bins in the input histogram, length of the pilot density estimate array, and length of the result array must all be the same. ◆ varianceUncertainty()
Uncertainty of the variance (not standard deviation!) estimate. Parameters "sigma" and "kurtosis" are population sigma and kurtosis, not sample estimates. ◆ volumeDensityFromBinnedRadial()
The function arguments are as follows: dim – Dimensionality of the space. binWidth – The width of the bins used originally to construct the density estimate. Typically, the value of r would be histogrammed and then smoothed. This argument must be non-negative. r – The distance from the origin. This argument must be non-negative. In addition, "binWidth" and "r" can not simultaneously be 0s. radialDensity – The value of density in r, i.e., (d Prob)/(d r). If the default value of 1.0 is used, the function returns the density conversion factor. ◆ weightedCopulaHisto()
template<class Point , class Histo >
Function for building copula density out of sets of points by ordering the points and remembering the order in each dimension. The histogram is filled with point weights. All histogram axes should normally have minimum at 0.0 and maximum at 1.0. This function assumes (but does not check) that the argument histogram is defined on the unit multivariate cube. The histogram will be reset before it is filled from the provided data. The histogram contents will not be normalized after the histogram is filled. If you want normalized result, call the "convertHistoToDensity" function (declared in the HistoND header). Note that ties are not resolved by this function (i.e., their mutual order is arbitrary). The function arguments are as follows: data – Input data. The pointers (the first element of the pair) assume to point to objects which have a subscripting operator. The second element of the pair is the weight. dimsToUse – Point dimensions to use for building the copula. This array should have at least "nDimsToUse" elements. nDimsToUse – Number of copula dimensions. result – The histogram to fill. useFillC – Specifies whether "fillC" histogram method should be used instead of the "fill" method. The function returns the Kish's effective sample size. ◆ weightedCopulaHisto_2()
template<class Point , class Histo >
A version of "weightedCopulaHisto" with a slightly different interface. Here, point weights are given in a separate vector. It is assumed that the element weights[i] corresponds to the point data[i]. ◆ weightedLorpeSmooth1D()
template<typename Numeric , typename NumOut >
Simple driver function for density estimation from histograms using LOrPE. It is assumed that the histogram fill counts correspond to the actual number of points in the sample. The function arguments are: m – Choose the kernel from the symmetric beta family proportional to (1 - x^2)^m. If m is negative, Gaussian kernel will be used, truncated at +- 12 sigma. maxDegree – Degree of the LOrPE polynomial. Interpretation of non-integer arguments is by the "continuousDegreeTaper" function. result – Buffer for storing the results. This can coincide with the input histogram bin contents (so that they will be changed in-place – this makes sense only if the histogram contents are real). lenResult – Length of the "result" buffer. bm – Method used to handle LOrPE weight at the boundary. bandwidthFactor – The plugin bandwidth estimate will be multiplied by this factor to make the actual kernel bandwidth. The function returns the actual bandwidth used. ◆ weightedVariableBandwidthSmooth1D()
template<typename Numeric , typename NumOut >
High-level driver routine for "variableBandwidthSmooth1D". It is assumed that one of the symmetric beta family kernels (including the Gaussian for which "symbetaPower" parameter can be set to any negative number) is used to build both the pilot and the final density estimates. The "alpha" parameter is set to 0.5 and the bandwidth is not increased at the boundary. The pilot estimate is generated using the AMISE plugin bandwidth multiplied by the "bandwidthFactor". This code can be used with histograms that are scaled or filled with weighted points as long as the correct effective sample size is provided. The function returns the pilot bandwidth used. Generated by 1.9.1 |