Interoceptive Attention: a Finite-Bandwidth Mechanism for Homeostatic Prioritisation

active-inferenceattentionhomeostasisinteroception

Interoceptive Attention: a Finite-Bandwidth Mechanism for Homeostatic Prioritisation

This is the long-form companion to Paper 1 (SAB 2026 submission), about an active-inference agent that has to keep itself alive in a small four-need gridworld under finite perceptual bandwidth. The headline finding is short: replacing uniform observation precision with a need-aligned, fixed-budget selector — what we call κ-attention — more than doubles learning-phase survival; reversing the direction of the selector falls below uniform. The point of the post is to do three things the 12-page paper cannot: (i) say the mechanism in plain English with side-by-side animations, (ii) be candid about where the result is strong and where it leaves genuine uncertainty, and (iii) provide an interactive demo that lets you turn the relevant knobs yourself.


The problem in one paragraph

Any embodied system regulating competing physiological needs runs on a finite perceptual budget. Sharpening any one channel — say, the report from your gut about how hungry you are — costs energy, and the total budget is bounded. Behaviour therefore depends not only on which actions are available but on which channels of evidence are sharpened first. Behavioural ecology has known this for decades (McNamara & Houston, 1986); biological neural systems implement perceptual selection through gain modulation (Bastos et al., 2012). The question here is whether, inside an active-inference agent, reallocating a fixed observation-precision budget toward whichever channel the agent’s own posterior flags as most needed is sufficient as a homeostatic-prioritisation mechanism. Where in the agent’s processing chain that signal enters turns out to matter as much as whether it is present at all.

AffectWorld in 90 seconds

The environment is a 6×6 gridworld with two food and two water tiles per layout. The agent has three active bodily needs — hunger, thirst, and suffocation — each tied to a world resource and to a noisy interoceptive channel, plus one fourth inert control channel that carries no need signal. Hunger and thirst decay by one unit per step and are lethal at zero; suffocation depletes only on water tiles, recovers elsewhere, and is not lethal on its own. Episodes run up to 60 steps and terminate early on death.

Representative AffectWorld layouts across easy, medium, and far tiers, with start-to-resource distances annotated

Twelve layouts across easy, medium, and far tiers. The headline results pool eleven of them (L01 is treated as a separate stress-test in the appendix).

The agent itself is an active-inference POMDP solver: it maintains beliefs over body and world states, plans by minimising expected free energy over a horizon of three steps, and updates its observation likelihood online via Dirichlet pseudo-counts. The single architectural intervention is how precision is split across the four interoceptive channels.

Precision as a scalar budget

Each interoceptive channel reports a noisy categorical observation om(i){0,,5}o^{(i)}_m \in \{0, \ldots, 5\} of the underlying body-state level ss. The likelihood is parameterised by a single scalar κm[0,1]\kappa_m \in [0, 1] — the probability that the per-step observation equals the true level on channel mm:

Ao,s(m)={κmo=s(1κm)/5otherwiseA^{(m)}_{o,\,s} = \begin{cases} \kappa_m & o = s \\[2pt] (1 - \kappa_m)/5 & \text{otherwise} \end{cases}

A precision of κm=1\kappa_m = 1 means the agent perfectly observes that channel; κm=0\kappa_m = 0 means it reads a uniformly random level. The constraint is a fixed soft budget across channels:

m=14κmK,K=2.60,κm[0.05,1]\sum_{m=1}^{4} \kappa_m \leq K, \qquad K = 2.60, \qquad \kappa_m \in [0.05, 1]

Two operating points fall out of this. Uniform allocation gives every channel κm=K/4=0.65\kappa_m = K/4 = 0.65 — every channel reports correctly 65% of the time. Selective allocation gives one attended channel κatt=0.90\kappa_\text{att} = 0.90 and the other three κun0.567\kappa_\text{un} \approx 0.567 — one channel is now 90% reliable, the other three are 57% reliable, and the total is preserved. The same K, the same observation channels — only the split changes.

The selector itself is just an arg-max: each step, the agent picks the channel for which its own posterior indicates the largest unmet need (needm(s)=(smaxs)/smax\text{need}_m(s) = (s_\text{max} - s)/s_\text{max} on the three active channels, zero on the inert one), and reallocates the budget toward it. There is no ground-truth oracle; the selector reads only what the agent itself currently believes.

κ-attention architecture: one precision-shaped likelihood matrix feeds both belief update and the EFE planner

The intervention is small: a single precision-shaped likelihood matrix A(m)A^{(m)} is read by two downstream consumers — the per-step belief update and the EFE planner. Everything else is held constant across agents.

Try it: three agents, same body, different κ-split

The demo below has three tabs. Mechanism rollout runs three agents side-by-side and lets each one pick from 11 selector mechanisms (need-aligned default, hysteresis, explorative, action-aware, anti-aligned, oracle, random, uniform, and three fixed-channel variants). For each agent you see, in real time: the true body state (coloured bar fill), the agent’s belief about each channel (the thin horizontal line inside each bar — the lag between line and fill is the entire mechanism story made visible), the κ allocation, a rolling 30-step trace of the attended channel, and per-channel Dirichlet sparklines showing how the posterior diagonal mass P(o=s)P(o = s) on each interoceptive likelihood matrix A(m)A^{(m)} concentrates over time. The Parameter explorer tab shows the real κ × α₀ grid from the paper (toggle between survival rate and cumulative need per step). The Real outcomes tab shows the real per-channel mean-Δ in posterior diagonal mass across 24 paired clusters and the convergence fractions across 352 clusters.

Interactive AffectWorld: three agents, one body
Comparative survival (steps survived this run) max 60

All three agents share an identical body, the same noisy categorical observations, and the same fixed precision budget K. They differ only in their selector — the rule that decides where the budget is allocated each step. The thin horizontal line inside each bar shows the agent's belief about that channel; the coloured fill is the channel's true level. The trace strip below each agent shows the attended channel for the last 30 steps. The rollout is a stylised body-and-selector simulator: it shares the paper's depletion, noisy-categorical observation, and belief-driven selection logic, but does not run the full expected-free-energy planner — the parameter-explorer tab below shows the real grid from the paper.

Trial-level mean Δ in posterior diagonal mass

Per-channel cluster-bootstrap means with 95% CIs from per_channel_delta_ci.json (24 paired clusters; layout × seed). Δ is the change over a trial in the agent's Dirichlet posterior diagonal mass on each channel — exactly what the live sparklines on the rollout tab estimate from the stylised simulator. Bars are mean Δ; thin brackets are the 95% CI.

Convergence: fraction of trials crossing 50% diagonal mass

From convergence_time.json (352 paired clusters across the full n=32 broad panel). Threshold = 0.50; window = 5 trials. Higher fraction is better — it means the per-channel A learning is reliably crossing the "meaningfully concentrated" line. Median crossing-trial is 5 across all agents, so the relevant signal is the fraction that reach it at all.

Per-step real-trial replay would need the trial-curve CSVs from runs/actinf_learning_gpu/test_c_plus_v2_kmp_fix/, which live on the HPC and aren't shipped with the snapshot to keep it under size limits. The Dirichlet story is faithfully reproducible from the simulator on the rollout tab; the aggregate numbers above are the real measurements those simulated trajectories are estimating.

κatt (attended channel)
α0 (Dirichlet prior concentration)
low
high survival

Real grid from kappa_alpha_grid.json in the paper repo — pooled across 3 easy-tier layouts × 8 seeds per cell. The two axes: κatt, the precision the selector concentrates on the attended channel; α0, the Dirichlet prior on the body-state observation likelihood (lower → looser, higher → more rigid). The paper's canonical operating point is κatt = 0.90, α0 = 0.1. The benefit grows fastest as κatt climbs above the uniform allocation K/4 ≈ 0.65 and as the prior tightens.

Three agents share an identical body and observations but split their precision budget differently. Run episodes and compare how each fares.

A few things to watch for. With the default settings, κ-attention will keep all three active bars away from zero for most of the run, switching the attended channel as need shifts; the Dirichlet sparkline of the attended channel climbs visibly faster than the others. The uniform agent typically dies a little earlier — it never gets a clear read on any one channel and acts on noisy beliefs. The anti-aligned agent dies fastest: it spends its high-precision budget on the channel that needs attention least, and the sparkline shows the wrong channel concentrating fastest while hunger and thirst stay near uniform.

What’s real and what’s stylised. The mechanism rollout is a stylised body-and-selector simulator: it shares the paper’s depletion, noisy-categorical observation, fixed-budget allocation, belief-driven selection, and the κ-weighted Dirichlet update, but it does not run the full expected-free-energy planner. The Dirichlet sparklines you see are the real mechanism applied to the stylised body — the same per-observation acceleration the paper measures, just from a simpler body. The parameter explorer and real outcomes tabs are the real paper data, baked in from kappa_alpha_grid.json, per_channel_delta_ci.json, and convergence_time.json. Per-step real-trial replay would need the trial-curve CSVs from the HPC; those aren’t shipped with the snapshot to keep it under size limits.

The headline number

Across 11 layouts and 32 batched seeds per cell, learning-phase survival is:

AgentSurvival rate95% CIp vs uniform
Uniform0.199[0.158, 0.240]
κ-attention0.414[0.365, 0.463]≤ 10⁻⁴
Anti-aligned0.144[0.108, 0.181]0.004

Selective allocation more than doubles survival at the same total budget — ≈ 2.08×. The anti-aligned control performs significantly worse than uniform: non-uniform allocation alone is not enough; the direction has to match the agent’s belief about where the need is.

Headline bar chart: κ-attention more than doubles uniform survival; anti-aligned falls below uniform

Headline gain. The dashed line is a Gumbel-AlphaZero oracle given the true environment transition function — about 0.66 survival on this benchmark.

The task is hard in this setting: even oracle-dynamics planners reach only about 0.66 survival.

Where the signal does its work

The same shaped likelihood A(m)A^{(m)} is read by two parts of the agent: the per-step belief update (does the agent know it is hungry?) and the EFE planner (does the agent plan a route to food?). The natural question is whether one of these sites carries most of the benefit.

We test this with an inference-only variant that presents the planner with an unshaped likelihood — selective precision still enters the belief update, but the planner sees uniform K/4 across all channels. If the agent’s beliefs alone were doing the work, this should not hurt. It does:

  • At the default prior concentration (α0=0.1\alpha_0 = 0.1), the inference-only variant loses about 20 pp relative to the full κ-attention agent.
  • Pooled across six prior concentrations α0{103,,102}\alpha_0 \in \{10^{-3}, \ldots, 10^2\}, the loss is 47 pp.
  • At rigid priors (α0=10\alpha_0 = 10), the loss reaches 88 pp — the inference-only agent collapses to the level of the anti-aligned control.

The mirror ablation — keep the planner’s shaped likelihood, disable the inference-stage shaping — actually matches or exceeds the full agent at loose priors. Read together, the dominant pathway is the planner: the inference-stage shaping is at most neutral when the prior is loose, and only becomes load-bearing when the prior is rigid enough that the planner needs help overriding it.

Mechanism propagation: the inference-only variant loses about half the benefit at the default prior, collapsing further as the prior tightens

Propagation matters. Suppressing the planner’s access to the shaped likelihood costs about half the benefit at the default prior, growing to almost all of it as the prior tightens.

This is the finding the paper’s reviewers pushed hardest on. The original framing — “the mechanism localises in inference, planning-stage reapplication adds nothing” — turned out to be backwards. The honest read is that the planner-stage shaping carries the gain, with inference-stage shaping serving as insurance against rigid priors.

Direction is essential

A natural worry: maybe any non-uniform allocation helps, simply by perturbing the agent’s beliefs out of a degenerate uniform-precision regime? The direction test answers this cleanly.

Six selector rules compared: every need-aligned rule beats baseline; only the anti-aligned rule fails

Four need-aligned selector rules (default, action-aware, hysteresis, explorative) plus an oracle ceiling all beat baseline. Only the direction-reversed rule fails.

Sweeping κatt\kappa_\text{att} at fixed K confirms it: the need-aligned rule climbs monotonically with asymmetry, reaching a +44 pp gap over uniform at the default κatt=0.90\kappa_\text{att} = 0.90. The anti-aligned rule stays flat near 0.36 across the whole sweep. Both converge to uniform at κatt=0.65\kappa_\text{att} = 0.65 — the uniform allocation — and diverge thereafter. The asymmetry does work because of its direction, not despite it.

A per-observation signature

The attended channel does not only support better action selection in the moment — it also learns its own dynamics faster, per observation, than channels left at uniform precision. Plotted against cumulative observation count (rather than trial index, which mixes in the agent’s survival), κ-attention’s hunger likelihood sits about 0.30 above uniform at every matched data volume. The anti-aligned agent sits about 0.22 above uniform; κ-attention beats the anti-aligned variant by a further ≈ 0.10.

Per-channel Dirichlet learning: the attended hunger channel converges about 2.4× faster than uniform

Relevant learning. The attended channel’s Dirichlet posterior concentrates faster than the unattended ones, at matched observation count. This is a per-observation effect, not a survival-mediated one.

This matters because it is a falsifiable mechanistic prediction. Preference-reweighting alternatives — agents that route attention by changing which channel the planner optimises for, rather than by sharpening the likelihood — should not reproduce per-observation Dirichlet acceleration of this shape. A matched-selector dynamic-C comparator is the natural follow-up experiment, and is in the queue for the next paper.

Robustness, briefly

The advantage holds across the three parameters that most plausibly drive performance.

  • Prior concentration α0{103,,102}\alpha_0 \in \{10^{-3}, \ldots, 10^2\}: κ-attention maintains around 0.85 plateau survival across the range. Uniform collapses at α010\alpha_0 \geq 10.
  • Attended-channel precision κatt{0.65,,0.99}\kappa_\text{att} \in \{0.65, \ldots, 0.99\} at fixed K: monotone, saturating around κatt0.90\kappa_\text{att} \geq 0.90.
  • Budget K{1.5,,4.0}K \in \{1.5, \ldots, 4.0\}: κ-attention beats uniform by 32–56 pp at every tested K. Both peak near the canonical K=2.60K = 2.60; uniform declines sharply at high K (the planner’s policy-posterior collapses to a single greedy policy), while κ-attention preserves cross-channel asymmetry and avoids overcommitment.
Robustness sweep across α₀, κ_att, and K

Three orthogonal robustness slices. The gain is not the artefact of a particular operating point.

The non-stationarity sweep tells a similar story: the advantage holds in sign across tile-mutation rates of 0.02–0.10, though the magnitude collapses alongside absolute survival at the higher rates.

What this paper does not settle

A few things worth saying without the page-budget pressure of LNCS.

The result is selector-fixed, not actuation-site-identified. All experiments share a single need-aligned selector and vary where it routes. The claim is therefore that selective interoceptive precision is one sufficient implementation of homeostatic prioritisation, not that it is the only one. A matched-selector preference-reweighting comparator is the natural next experiment — and is the planned follow-up paper.

The K = 4 budget regime is a special case. At K=4K = 4, the soft-budget cap binds (κ~un1.03\tilde\kappa_\text{un} \approx 1.03 clips to 1, leaving residual budget unused), so the direction-of-asymmetry claim is cleanly identified only at the canonical K=2.60K = 2.60 where no channel saturates. The K = 4 crossover is a regime where the test loses identifying power, not where the mechanism reverses.

The anti-aligned-vs-baseline contrast is the weakest test. The κ-attention-vs-baseline result passes at α=0.01\alpha = 0.01 with comfortable margin; the anti-aligned-vs-baseline result (padj=0.027p_\text{adj} = 0.027) clears α=0.05\alpha = 0.05 only narrowly and warrants higher-N replication. The qualitative claim — anti-aligned falls below uniform — is robust across selector variants and parameter sweeps, but the specific point estimate should be read with that caveat.

The κ router is hand-specified. Whether a learned router converges to the same need-aligned policy is the obvious follow-up. There is no theoretical reason it should not — the need-belief signal is in the agent’s posterior, freely available to any parametric router — but it is not what this paper tests.

Biology

The κ-allocation maps cleanly onto dynamic gain modulation of interoceptive cortex (Seth & Friston, 2016; Fermin et al., 2022), where interoceptive prediction errors gate cortical responsiveness in proportion to metabolic urgency. The cleanest biological correlate is the AgRP-neuron literature (Livneh et al., 2017): AgRP activity gates cue–resource association in rodent insular cortex on a timescale of 1–2 seconds, and the per-observation Dirichlet acceleration of the previous section maps onto exactly that gating window. Disruption of this gating — anterior insular lesion or interoceptive agnosia — predicts impaired homeostatic prioritisation, most severely during the early novel-environment phase where the κ-attention vs uniform gap is largest in this paper.

The prediction is sharp enough to be testable: per-observation interoceptive learning rates should be highest on the currently-most-needed channel, and that asymmetry should disappear under conditions that disrupt insular gating.

Reproducibility

Code, layout banks, configs, and analysis pipelines are at the anonymised snapshot anonymous.4open.science/r/attention-aif-sab2026-snapshot-C0E3 during review; the canonical (de-anonymised) repo will go live on acceptance. The snapshot is built from a single source commit via an idempotent leak-scanned manifest; the audit log of every build sits alongside it in the source repo. The full paper PDF and supplementary material are linked from the publications page.

A one-command smoke test reproduces a single-layout headline contrast in under ten minutes on a single GPU; the full 11-layout × 32-seed panel takes about three hours on the same hardware. The HPC sweep scripts are included.

What comes next

Three immediate follow-ups, in priority order:

  1. Matched-selector dynamic-C comparator. The cleanest test of the per-observation Dirichlet acceleration prediction. A properly-powered version of this experiment is the planned next paper, aimed at IWAI 2026 or AAAI 2027 depending on timing.
  2. Learned κ router. Replace the hand-specified arg-max selector with a small parametric router and ask whether it converges to need-alignment from the gradient signal alone.
  3. Forced-multi-need tier. The dynamic-vs-fixed-channel gap is largest in regimes where which channel dominates changes within a trial. The existing layout bank has only modest dominance shifts; a targeted layout tier where food and water pressure alternate within an episode would isolate the dynamic-selection benefit sharply.

If the matched-selector comparator falsifies the per-observation prediction — i.e. preference-reweighting reproduces the Dirichlet acceleration — the paper’s positioning reframes substantially but the headline survival result stands. Either outcome is publishable; the framing is what the next paper does.


The paper

Interoceptive Attention as Dynamic Homeostatic Prioritization in a Foraging Agent. Grimbly, S. J. (2026). Submitted to SAB 2026 (Simulation of Adaptive Behaviour; Springer LNCS). Preprint and code at the anonymised snapshot.

Comments, corrections, and reproduction attempts welcome at askingwhy@stjohngrimbly.com.

St John

Written by St John

Author of the Asking Why Blog - a personal blog and website with everything I find interesting.

Comments are being migrated. Check back soon.