Student-t Fisher geometry and the Poincaré half-plane

We usually index a Student-t distribution by a location \mu, a scale \sigma, and degrees of freedom \nu. Those coordinates are convenient, but they don’t really measure anything on their own. A shift of 0.5 does very different work for a narrow distribution than for a wide one, and adding a unit of scale is not the same operation at \sigma=1 as it is at \sigma=10.

What we’d like instead is a ruler that comes from the distributions themselves, one that calls two parameter values close when their densities are hard to tell apart and far when they’re easy to tell apart. The Fisher–Rao metric is exactly that ruler. For fixed \nu it hands the Student-t location–scale family a clean answer — a rescaled Poincaré half-plane — and most of this note lives in that case. The clean picture stops being the whole story the moment \nu is allowed to move, and that is where the note ends up.

Why geometry enters

Coordinate distance measures the wrong thing. It tells you how far apart two parameter values sit in the chart (\mu,\sigma,\nu), which is not the same as how far apart the corresponding distributions are.

The Fisher information metric g_{ij}(\theta)=\mathbb{E}_\theta[\partial_i\log f\,\partial_j\log f] fixes this, because it is the quadratic term in the local KL divergence: D_{\mathrm{KL}}(p_\theta\|p_{\theta+d\theta}) =\frac12 g_{ij}(\theta)d\theta^i d\theta^j+O(\|d\theta\|^3). Read that way, the metric records which small parameter moves are actually visible at the level of the probability laws and quietly discounts the ones that aren’t. For fixed degrees of freedom it takes a pleasantly simple form — it turns out to be hyperbolic — and the next few sections work that out before I circle back to what breaks if you ignore it.

Location–scale families

Let g be a symmetric, smooth probability density on \mathbb{R}, and form the location–scale family f(x;\mu,\sigma) = \frac{1}{\sigma}\,g\!\left(\frac{x-\mu}{\sigma}\right), \qquad \sigma > 0.

Writing \ell = \log f and z = (x-\mu)/\sigma, one has \partial_\mu \ell = -\sigma^{-1} (g'/g)(z) and \partial_\sigma \ell = -\sigma^{-1}\,(1 + z\,(g'/g)(z)).

Symmetry of g kills the cross-term, so the Fisher matrix comes out diagonal: I_{\mu\mu}(\mu,\sigma) = \frac{A}{\sigma^2},\qquad I_{\sigma\sigma}(\mu,\sigma) = \frac{B}{\sigma^2}, with constants A = \int (g'/g)^2 \, g\,dz,\qquad B = \int (1 + z\,g'/g)^2 \, g\,dz.

Since A and B depend only on the shape g and not on \mu or \sigma, the Fisher–Rao line element on the upper half-plane is \boxed{\;ds^2 = \frac{A\,d\mu^2 + B\,d\sigma^2}{\sigma^2}\;}

That is a Poincaré-type metric for any reasonably regular symmetric location–scale family whose Fisher constants A and B are finite. The Gaussian gives A=1, B=2; next we work out the constants for the Student-t.

The Student-t metric

For the Student-t with \nu degrees of freedom, g(z) \propto \left(1 + \tfrac{z^2}{\nu}\right)^{-(\nu+1)/2}, \qquad \frac{g'(z)}{g(z)} = -\frac{(\nu+1)\,z}{\nu + z^2}.

Both integrals are standard — substitute z = \sqrt{\nu}\,\tan\theta and lean on the beta-function identities — and they give A = \frac{\nu+1}{\nu+3},\qquad B = \frac{2\nu}{\nu+3}.

So the Fisher–Rao line element is ds^2 \;=\; \frac{1}{\nu+3}\cdot\frac{(\nu+1)\,d\mu^2 + 2\nu\,d\sigma^2}{\sigma^2}.

As \nu\to\infty this recovers the Gaussian metric (d\mu^2 + 2\,d\sigma^2)/\sigma^2, and \nu=1 is the Cauchy case.

The Poincaré half-plane

Substitute \tilde\mu = (a/b)\,\mu with a=\sqrt{A}, b=\sqrt{B}. Then ds^2 = b^2 \cdot \frac{d\tilde\mu^2 + d\sigma^2}{\sigma^2}, which is b^2 times the standard Poincaré metric on the upper half-plane. That metric has Gaussian curvature -1, and multiplying a metric by b^2 scales its curvature by 1/b^2, so \boxed{\;K_\nu = -\frac{1}{b^2} = -\frac{\nu+3}{2\nu}\;}

Geodesics and distance

In the rescaled coordinate (\tilde\mu, \sigma) we are back to the standard Poincaré metric, whose geodesics everyone knows:

Pulling these back through \mu = (b/a)\,\tilde\mu stretches them horizontally by b/a = \sqrt{2\nu/(\nu+1)}. In the original (\mu,\sigma) coordinates, then, a non-vertical geodesic is half of an ellipse centred on the line \sigma=0, with horizontal semi-axis b/a times the vertical one. The aspect ratio depends only on \nu:

Closed-form distance

\nu	b/a	shape
1 (Cauchy)	1	circle
3	\sqrt{3/2} \approx 1.22	slightly stretched
\infty (Gaussian)	\sqrt{2} \approx 1.41	ellipse

The standard Poincaré distance on (\tilde\mu, \sigma) is d_P(P_1,P_2) = \operatorname{arccosh}(1 + \Delta) with \Delta = \tfrac{(\tilde\mu_1-\tilde\mu_2)^2 + (\sigma_1-\sigma_2)^2}{2\sigma_1\sigma_2}. Scaling the metric by b^2 scales distances by b, and undoing \tilde\mu = (a/b)\mu gives (\tilde\mu_1-\tilde\mu_2)^2 = (\nu+1)/(2\nu)\cdot (\mu_1-\mu_2)^2. Put together, \boxed{\; d_{\mathrm{FR}}\bigl(t_\nu(\mu_1,\sigma_1),\,t_\nu(\mu_2,\sigma_2)\bigr) =\sqrt{\tfrac{2\nu}{\nu+3}}\; \operatorname{arccosh}\!\left(1 + \frac{\tfrac{\nu+1}{2\nu}(\mu_1-\mu_2)^2 + (\sigma_1-\sigma_2)^2}{2\sigma_1\sigma_2}\right). \;}

In the Gaussian limit (\nu\to\infty) this collapses to the classical Atkinson–Mitchell formula d_{\mathrm{FR}} = \sqrt{2}\,\operatorname{arccosh}(1 + (\Delta\mu^2/2 + \Delta\sigma^2)/(2\sigma_1\sigma_2)).

Reading the geometry

The closed form earns its keep by telling us how to read the pictures that follow. A straight line in the chart is usually not the shortest statistical path; an equal-distance set is not a Euclidean circle; and any change in \mu gets divided by \sigma before it counts for anything.

A single geodesic

Take two distributions, P_1=t_3(-1.2,0.7) and P_2=t_3(1.3,1.5). The shortest Fisher–Rao path between them is not the straight segment joining their coordinates — it bows upward, because travelling close to small \sigma is expensive.

Figure 1: A Fisher–Rao geodesic bends away from small scale.

How \nu changes the shape

Hold the endpoints fixed and vary \nu, and what changes is the horizontal stretch of the hyperbolic half-plane. Heavy Cauchy-like tails give one shape; the Gaussian limit gives another.

Figure 2: Changing ν changes the shape of the same endpoint-to-endpoint path.

Balls are not Euclidean balls

A Fisher–Rao ball gathers the distributions at equal information distance from the centre. Drawn in the (\mu,\sigma) chart, these balls sit higher than you’d expect and curl away from \sigma=0. That boundary is infinitely far off, which is why a tiny Euclidean step near zero scale can amount to a large statistical move.

Figure 3: Fisher–Rao distance level sets from (0,1) at \nu=3.

Back to densities

The points in the half-plane are only a compact shorthand for densities. Here are the two distributions behind the points in the geodesic picture above.

Figure 4: The two densities represented by the points P_1 and P_2.

Curvature across degrees of freedom

Figure 5: Gaussian curvature K_\nu = -(\nu+3)/(2\nu) of the Student-t Fisher–Rao half-plane.

Where Euclidean geometry gives wrong answers

None of this matters unless the geometry actually flips an answer somewhere, so here are a handful of cases where Euclidean distance in (\mu,\sigma) ranks distributional changes the wrong way round.

To keep the comparison honest I’ll also report total variation as a check at the level of probabilities: \operatorname{TV}(P,Q)=\frac12\int |p(x)-q(x)|\,dx. With equal class priors, the best possible binary classifier telling P from Q has accuracy \operatorname{Acc}^*(P,Q)=\frac{1+\operatorname{TV}(P,Q)}{2}. So a small TV means even the optimal classifier is barely better than guessing, and that gives the numbers in the tables below something tangible to mean.

Same coordinate move, different statistical change

Shift the location by the same raw amount, \Delta\mu=0.5, once for a narrow t_3 distribution and once for a wide one:

Comparison	Euclidean	Fisher–Rao	TV	Bayes accuracy
t_3(0,0.2) vs t_3(0.5,0.2)	0.5000	1.7918	0.7001	0.8500
t_3(0,2) vs t_3(0.5,2)	0.5000	0.2038	0.0916	0.5458

By the Euclidean reckoning these are the same move. As distributions they are nothing alike: the narrow pair is easy to separate, while the same shift barely disturbs the wide pair. The d\mu^2/\sigma^2 term in the metric is precisely this effect — a location error only means something relative to the scale it sits on.

Figure 6: The same raw location shift is statistically large at small scale and small at large scale.

A ranking reversal

A starker failure. Pair A moves only 0.1 in raw location; pair B moves a full 5. By Euclidean distance B wins easily. But A is extremely narrow and B is very wide, so once you look at the distributions the verdict flips.

Comparison	Euclidean	Fisher–Rao	TV	Bayes accuracy
A: t_3(0,0.05) vs t_3(0.1,0.05)	0.1000	1.4910	0.6090	0.8045
B: t_3(0,10) vs t_3(5,10)	5.0000	0.4055	0.1813	0.5906

Euclidean distance is answering the coordinate question correctly and the statistical question wrongly: it ranks the raw 5-unit move as the larger one, whereas Fisher–Rao distance, TV distance, and optimal classification accuracy all agree that the tiny 0.1 move at small scale is the bigger change in distribution.

Scale changes are multiplicative

Euclidean geometry misreads scale too. In Fisher–Rao geometry, stretching \sigma to 2\sigma costs the same information distance wherever you start, since along a vertical geodesic d_{\mathrm{FR}}\bigl(t_\nu(\mu,\sigma),t_\nu(\mu,c\sigma)\bigr) = \sqrt{\frac{2\nu}{\nu+3}}\,|\log c|. At \nu=3 the prefactor is one, so the distance is just |\log c|.

Comparison	Euclidean	Fisher–Rao	TV	Bayes accuracy
doubling: t_3(0,1) vs t_3(0,2)	1.0000	0.6931	0.2783	0.6392
doubling: t_3(0,10) vs t_3(0,20)	10.0000	0.6931	0.2783	0.6392
add one: t_3(0,10) vs t_3(0,11)	1.0000	0.0953	0.0394	0.5197

Euclidean geometry puts the two doublings at distances 1 and 10; the statistical geometry calls them the very same change. Going the other way, adding one unit of scale at \sigma=10 is almost invisible next to doubling from 1 to 2, even though both have a raw scale increment of 1.

Wrong nearest neighbor

For a last example, take a query distribution Q=t_3(0,1) and two candidate prototypes A=t_3(0.2,0.05),\qquad B=t_3(0.1,2). Which one is closer to Q?

Comparison	Euclidean	Fisher–Rao	TV	Bayes accuracy
Q vs A=t_3(0.2,0.05)	0.9708	3.0221	0.8368	0.9184
Q vs B=t_3(0.1,2)	1.0050	0.6954	0.2792	0.6396

A nearest-neighbour rule on Euclidean distance picks A, since (0.2,0.05) is marginally closer to (0,1) in raw coordinates than (0.1,2) is. As distributions, though, A is the far one — a narrow spike — while B is just a moderate change of scale. Fisher–Rao, TV, and Bayes accuracy all prefer B.

That, more than the tidy formula, is why one bothers with the Poincaré metric. It keeps nearest-neighbour search, clustering, interpolation, and model comparison from mistaking proximity in coordinates for proximity in distribution.

Checks for the fixed-\nu formulas

Before promoting \nu to a coordinate, it’s worth checking the fixed-slice formulas against direct numerical integration — mostly to catch the sign and scaling slips that are easy to make here.

The direct numerical checks agree with the closed forms to numerical precision.

ν	A, numerical	A, closed form	B, numerical	B, closed form
1	0.500000	0.500000	0.500000	0.500000
3	0.666667	0.666667	1.000000	1.000000
10	0.846154	0.846154	1.538462	1.538462
50	0.962264	0.962264	1.886792	1.886792

ν	geodesic arc length	closed-form distance	absolute difference
1	1.508537	1.508537	4.6e-10
3	1.859740	1.859740	3.4e-10
10	2.160554	2.160554	3.3e-10
50	2.332169	2.332169	3.4e-10

Density normalization check over [-100,100]: 0.999999.

What changes when \nu is free

The Poincaré result is a fixed-\nu statement. Let \nu vary and the model becomes the three-dimensional manifold \mathcal M=\{t_\nu(\mu,\sigma):\mu\in\mathbb R,\ \sigma>0,\ \nu>0\}. Now the Fisher matrix carries a genuine shape direction. The location coordinate still decouples by symmetry, but \sigma and \nu no longer do, and the fixed-slice constant-curvature half-plane stops being the whole geometry.

The full Fisher matrix

Symmetry of the standardized density keeps \mu decoupled: I_{\mu\sigma}=I_{\mu\nu}=0. The fixed-\nu block is still sitting in the upper-left corner. What’s new is the signed coupling between scale and tail thickness: \boxed{\; I(\mu,\sigma,\nu)= \begin{pmatrix} \dfrac{\nu+1}{(\nu+3)\,\sigma^2} & 0 & 0 \\[1.1em] 0 & \dfrac{2\nu}{(\nu+3)\,\sigma^2} & -\dfrac{2}{\sigma(\nu+1)(\nu+3)} \\[1.1em] 0 & -\dfrac{2}{\sigma(\nu+1)(\nu+3)} & I_{\nu\nu}(\nu) \end{pmatrix} \;} I_{\nu\nu}(\nu)=\tfrac14\!\left[\psi'\!\left(\tfrac{\nu}{2}\right)-\psi'\!\left(\tfrac{\nu+1}{2}\right)\right] -\frac{\nu+5}{2\nu(\nu+1)(\nu+3)}.

That trigamma term is exactly what stops the metric from staying a nice rational form. The sign of the cross term also matters: with the usual scale coordinate \sigma we have \partial_\sigma\log f=-(1+z\,\partial_z\log g_\nu(z))/\sigma, which makes the (\sigma,\nu) entry negative. Both new entries are confirmed below by direct integration of the signed score products.

ν	I_σν, numerical	I_σν, closed form	I_νν, numerical	I_νν, closed form
3	-0.083333	-0.083333	0.016911	0.016911
10	-0.013986	-0.013986	0.000250	0.000250
30	-0.001955	-0.001955	0.000004	0.000004

Figure 7: The two new metric components versus ν (at σ=1). I_{\nu\nu} shown on a log scale.

Both new entries die off as \nu\to\infty (I_{\sigma\nu}\to0, and in fact I_{\nu\nu}=O(\nu^{-4})): the shape direction decouples and the Gaussian half-plane re-emerges as the limiting location-scale geometry.

Numerical geodesics in the full model

Below I assemble the metric, its analytic derivatives (the \nu-derivative of I_{\nu\nu} drags in the tetragamma \psi''), and the Christoffel symbols \Gamma^i_{jk}=\tfrac12 g^{im}(\partial_j g_{ml}+\partial_l g_{mj}-\partial_m g_{jl}). Geodesics solve \ddot\theta^{\,i}+\Gamma^i_{jk}\dot\theta^{\,j}\dot\theta^{\,k}=0, and to find the one between two prescribed distributions I solve the boundary-value problem and then measure its metric arc length.

Treat what follows as a transparent numerical check rather than a production geodesic solver. The boundary-value problem is solved directly in (\mu,\sigma,\nu), so the examples keep an eye on whether the path stays inside the domain \sigma>0,\nu>0; anything more robust would work in (\mu,\log\sigma,\log\nu) or impose the domain constraints explicitly.

Leaving the fixed-\nu slice

Take two distributions sharing the same degrees of freedom, P_1 = t_3(-1.2, 0.7) and P_2 = t_3(1.3, 1.5). Pinned to the \nu=3 slice, the geodesic between them has length about 1.860. In the full three-parameter manifold that same curve is still a legal path — it just need not be the shortest one. With the cross-term sign handled correctly the effect turns out mild, far milder than the earlier unsigned calculation suggested: the optimal path dips out of the slice only a little, \nu wandering between roughly 2.76 and 3.55, and the length drops only slightly, to about 1.852.

That is the honest reading. Freeing \nu can shorten the path, but how far it moves and in which direction depends on the endpoints. There is no generic plunge toward Cauchy tails.

At the first endpoint, Γ^ν_μμ = 5.688 and Γ^ν_σσ = -17.063. These nonzero normal components show that the fixed-ν slice is not totally geodesic.

quantity	value
full three-parameter geodesic length, L_3D	1.85176
fixed-ν in-slice distance, d_ν=3	1.85974
ν range along the full geodesic	[2.7618, 3.5517]

Figure 8: The 3D geodesic between two ν=3 distributions: its (μ,σ) projection and the mild off-slice variation in ν.

(a) Three Fisher–Rao geodesics in the full (μ,σ,ν) manifold.

case	geodesic length	ν range
ν: 3 → 3	1.85176	[2.76, 3.55]
ν: 2.5 → 20	1.54804	[2.50, 20.00]
ν: 2 → 30	0.59907	[2.00, 30.00]

When the two endpoints sit in the same fixed-\nu slice, the fixed-\nu distance is a constrained distance, and so it can only be an upper bound on the true three-parameter Fisher–Rao distance. Here the bound is tight: freeing \nu buys a small shortcut, not a dramatic heavy-tail detour. The two distances coincide only when \nu is genuinely pinned down as a modelling constraint.

Quantity	Fixed \nu	Free \nu
Manifold dimension	2	3
Curvature	constant -(\nu+3)/(2\nu) on each fixed slice	not captured by one fixed-slice constant-curvature model
Metric form	rational in \nu	trigamma \psi' in I_{\nu\nu}
Geodesics	half-ellipses (closed form)	numerical (BVP)
Distance	closed form (§6)	numerical; §6 is an upper bound when endpoints lie in one fixed-\nu slice
Same-\nu shortest path	constrained to stay in slice	may leave the slice; in the example only mildly