2026-05-14
A recurring trap in kernel methods is that a computable geometric quantity can look like an uncertainty estimate.
The power function is such a quantity. It is mathematically meaningful: it measures how well the design points constrain evaluation at a new location, relative to a chosen kernel. But the usual pointwise error bound has a hidden constant:
|f(x) - s_X f(x)| \leq P_X(x)\,\|f\|_{\mathcal H}.
The computable part is P_X(x). The difficult part is \|f\|_{\mathcal H}.
In classical approximation theory, that is natural: one assumes the target belongs to the native space with a controlled norm. In machine learning, the target is unknown, observations are noisy, the kernel is a modeling choice, and the RKHS norm may be enormous, infinite, or simply unconnected to the prediction problem.
The point of this note is narrow. The power function is useful as a geometry-of-information diagnostic, but it should not be read as a pointwise ML error bar unless the missing norm factor has a meaningful and calibrated scale. The four diagnostics below show why.
Let X = \{x_1, \ldots, x_n\} be interpolation sites and let k be a positive definite kernel with RKHS \mathcal H. The kernel interpolant is
s_X f(x) = \sum_{i=1}^n \alpha_i k(x, x_i), \qquad K_X \alpha = f_X,
where (K_X)_{ij} = k(x_i, x_j) and f_X = (f(x_1), \ldots, f(x_n))^\top. The power function is
P_X^2(x) = k(x,x) - k_X(x)^\top K_X^{-1} k_X(x),
where k_X(x) = (k(x,x_1), \ldots, k(x,x_n))^\top.
It depends on the kernel, the design, and the evaluation point. It does not depend on the observed target values.
The standard deterministic estimate is
|f(x) - s_X f(x)| \leq P_X(x)\,\|f\|_{\mathcal H}.
That is a useful theorem. But the theorem is not the same as the applied claim that P_X(x) alone is an error bar. The cleaner separation is
P_X(x) = \text{design geometry under the kernel}, \qquad \|f\|_{\mathcal H} = \text{unknown target complexity under the kernel}.
The second term is usually where the statistical difficulty lives.
Fix the training sites and the kernel. The power function is now fixed. Change only the target.
The design has not changed. The kernel has not changed. The power function has not changed. Only the target has changed — and that is enough to rule out interpreting P_X(x) alone as a pointwise error estimate.
At most, P_X(x) is the computable geometric factor in a bound whose missing factor is the target norm.
Table Table 1 gives the numerical version. Since design and kernel are fixed, max power is identical for every target. The fitted norm column is the computable interpolation norm \|s_X\|_{\mathcal H}, and the bound scale is \max_x P_X(x)\,\|s_X\|_{\mathcal H}. This is a diagnostic scale, not the theorem’s true bound with the unknown \|f\|_{\mathcal H}, so it can be smaller than the actual error.
| target | max power | max error | fit norm | bound scale | vacuity ratio |
|---|---|---|---|---|---|
| smooth | 0.000301 | 4.82e-05 | 1.8 | 0.000541 | 0.00027 |
| oscillatory | 0.000301 | 3.78 | 4.95e+04 | 14.9 | 7.46 |
| kink | 0.000301 | 0.032 | 1.57 | 0.000474 | 0.000948 |
| jump | 0.000301 | 0.494 | 152 | 0.0458 | 0.0458 |
Suppose we accept the chosen kernel and ask how expensive different targets are under it. A finite-dimensional proxy for the RKHS norm is
\|f\|_{\mathcal H,m}^2 = f(Z)^\top K_Z^{-1} f(Z),
where Z is a fine grid. This is the RKHS norm of the minimum-norm interpolant through the values of f on Z.
It is not the true continuous RKHS norm of f, but it is a useful diagnostic. If this proxy grows rapidly as the grid is refined, the native-space assumption is not giving a stable scale for the target.
| grid size | jump | kink | oscillatory | smooth |
|---|---|---|---|---|
| 30 | 3452.72 | 51.67 | 25828.36 | 1.78 |
| 50 | 5057.70 | 96.28 | 31171.90 | 1.81 |
| 80 | 6609.07 | 133.37 | 35970.28 | 1.82 |
| 120 | 8166.49 | 166.97 | 40459.34 | 1.83 |
| 180 | 10020.30 | 204.79 | 45304.34 | 1.83 |
For compatible smooth targets, the scale stays moderate. For rough, oscillatory, or discontinuous targets, the native-space cost can become very large under the same Gaussian kernel. The theorem is not failing. It is asking for a target-complexity constant that the machine-learning problem usually does not provide.
In practice, we do not observe a noiseless function. We observe data. A tempting substitution is to replace the unknown \|f\|_{\mathcal H} with the fitted interpolation norm
\|s_X\|_{\mathcal H}^2 = y^\top K_X^{-1} y.
This quantity is computable, but it is not the unknown target norm. It is a data-dependent interpolation norm, and with noisy observations it can mostly measure the cost of fitting noise.
If ridge regularization is introduced, the fitted norm can be stabilized. But then the original interpolation theorem is no longer being used as stated. We have moved from a deterministic interpolation bound to a different regularized statistical procedure. That may be entirely reasonable; it is just not the same certificate.
A bound is useful only if its scale is meaningful relative to the prediction problem. Define
V = \frac{\max_x P_X(x) \cdot C}{\operatorname{range}(y)},
where C is a candidate norm scale. When V \gg 1, the bound may be formally true but no longer informative at the scale of the prediction problem.
| target | max power | output range | fit norm | grid norm proxy | V, fit norm | V, grid norm |
|---|---|---|---|---|---|---|
| smooth | 0.000301 | 2 | 1.8 | 1.83 | 0.00027 | 0.000276 |
| oscillatory | 0.000301 | 2 | 4.95e+04 | 4.05e+04 | 7.46 | 6.09 |
| kink | 0.000301 | 0.499 | 1.57 | 167 | 0.000948 | 0.101 |
| jump | 0.000301 | 1 | 152 | 8.17e+03 | 0.0458 | 2.46 |
The norm term does not make the bound wrong. It makes it vacuous when the hidden complexity factor is too large, too unstable, or unavailable.
The power function sees the kernel, the design points, geometric coverage under the kernel metric, and where interpolation is weakly constrained by the data locations.
It does not see the unknown target complexity, kernel misspecification, observation noise, distribution shift, model-selection error, hyperparameter-selection error, whether the fitted norm is signal or noise, or whether the target belongs to the native space at all.
That is why the power function is useful as a design diagnostic but fragile as a pointwise uncertainty statement.
The tempting applied move is
|f(x) - s_X f(x)| \leq P_X(x)\,\|f\|_{\mathcal H} \qquad\leadsto\qquad \text{error at } x \approx P_X(x).
This drops the factor that depends on the target.
A slightly less naive version replaces \|f\|_{\mathcal H} with \|s_X\|_{\mathcal H}. But that is not a theorem either. It is a heuristic that can be dominated by noise, conditioning, and hyperparameter choices.
The formal theorem controls approximation error for functions in the RKHS. The ML problem is to learn an unknown target from finite, noisy data under model uncertainty. These are not the same problem, and the power function carries the geometry of the first without the statistics of the second.
The power function is a valid geometric quantity. It tells us something real about the design and the kernel.
But the pointwise error bound is a product of design geometry and unknown target complexity:
\text{pointwise error} \leq \text{design geometry} \times \text{unknown target complexity}.
In applications, the first term is computable and visually appealing. The second term is usually unknown, unstable, or incompatible with the modeling assumptions.
The useful interpretation is therefore simple: the power function is a geometry-of-information diagnostic. Without meaningful and calibrated control of \|f\|_{\mathcal H}, it is not a pointwise prediction-error estimate.
Gregory E. Fasshauer. Meshfree Approximation Methods with MATLAB. World Scientific, 2007.
Holger Wendland. Scattered Data Approximation. Cambridge University Press, 2005.
Robert Schaback and Holger Wendland. “Kernel techniques: From machine learning to meshless methods.” Acta Numerica 15, 2006.
Carl Edward Rasmussen and Christopher K. I. Williams. Gaussian Processes for Machine Learning. MIT Press, 2006.
@misc{miryusupov2026powerfunction,
author = {Miryusupov, Shohruh},
title = {When the Power Function Is Not an Error Bar},
year = {2026},
howpublished = {Research note},
url = {https://www.miryusupov.com/blog/posts/power_function_not_error_bar/index.html}
}