ria-toolkit-oss/docs/agent_tx_implementation_plan.md

# Agent TX Streaming — Implementation Plan (`ria-toolkit-oss`)

**Scope:** Part A of [agent_tx_plan.md](./agent_tx_plan.md). This repo only.
**Goal:** Make the agent accept hub-originated TX control + binary IQ, stream it to the SDR in full duplex with RX, and enforce agent-local safety caps.
**Acceptance:** `pytest tests/agent/` green; `ria-agent stream --allow-tx` accepts a `tx_start` against MockSDR and round-trips binary frames to `_stream_tx`.

Each phase below lands independently. After every phase the existing agent tests must still pass (no regressions), and the phase's own new tests must be green.

---

## Preconditions

- `--allow-tx` is opt-in at CLI level. Default config has `tx_enabled=False`; the agent will reject all TX control frames from the hub.
- Pluto FDD: one `adi.Pluto` instance serves both RX and TX. We share the SDR between sessions keyed by `(device, identifier)`.
- Known pre-existing bug: [`sdr/pluto.py:151`](../src/ria_toolkit_oss/sdr/pluto.py#L151) sets `_rx_initialized = False` inside `init_tx`. Our streamer's RX path (`sdr.rx(n)`) does not read this flag, so FDD still works. Leave the bug for a separate follow-up; do not refactor Pluto in this plan.

## Glossary

- **Session** = a `(app_id, direction)` pair held by the agent: one `RxSession` or one `TxSession`.
- **Direction** = `"rx"` (agent → hub binary) or `"tx"` (hub → agent binary).
- **Shared SDR** = when the same `(device, identifier)` is referenced by an RX and TX session concurrently; both sessions hold the same driver instance.

---

## Phase 1 — WS binary ingress

**Why first:** protocol-plumbing only. No behavior change for existing RX, but unblocks every later phase.

### Touches

- `src/ria_toolkit_oss/agent/ws_client.py` — add optional `on_binary` callback to `WsClient.run`.
- `tests/agent/test_ws_client.py` — add a "server sends binary, handler receives" case.

### Shape

```python
# ws_client.py
BinaryHandler = Callable[[bytes], Awaitable[None]]

async def run(
    self,
    on_message: MessageHandler,
    heartbeat: HeartbeatBuilder,
    on_binary: BinaryHandler | None = None,   # NEW, default preserves old behavior
) -> None:
    ...
    async for raw in self._ws:
        if isinstance(raw, bytes):
            if on_binary is not None:
                try:
                    await on_binary(raw)
                except Exception:
                    logger.exception("on_binary handler raised; dropping frame")
            else:
                logger.debug("Discarding unexpected %d-byte binary frame", len(raw))
            continue
        # ... existing JSON dispatch unchanged
```

### Acceptance

- New test: local `websockets` server pushes a binary frame after JSON handshake → handler sees exact bytes.
- Existing `test_ws_client.py` cases still pass with `on_binary=None`.

---

## Phase 2 — Config + CLI TX opt-in

**Why second:** small, isolated, and gives the rest of the phases a real `AgentConfig.tx_enabled` / caps to read.

### Touches

- `src/ria_toolkit_oss/agent/config.py`
- `src/ria_toolkit_oss/agent/cli.py`
- `tests/agent/test_config.py`
- new `tests/agent/test_cli_tx.py`

### config.py

Extend the dataclass and preserve backward-compat for old JSON files (the existing `extra` trick already handles unknown keys, but we want these fields promoted to first-class):

```python
@dataclass
class AgentConfig:
    hub_url: str = ""
    agent_id: str = ""
    token: str = ""
    name: str = ""
    insecure: bool = False
    api_key: str = ""
    # NEW — TX interlocks
    tx_enabled: bool = False
    tx_max_gain_db: float | None = None
    tx_max_duration_s: float | None = None
    tx_allowed_freq_ranges: list[list[float]] | None = None   # JSON-friendly list-of-lists
    extra: dict = field(default_factory=dict)
```

Update `load()` to pull the new fields and `save()` to emit them. Preserve the `0o600` chmod behavior.

### cli.py

Two entry points need flags:

```
ria-agent register --hub ... --api-key ...
    [--allow-tx]
    [--tx-max-gain-db VALUE]
    [--tx-max-duration-s VALUE]
    [--tx-freq-range LO HI]      # repeatable: --tx-freq-range 2.4e9 2.5e9 --tx-freq-range 5.7e9 5.8e9
```

```
ria-agent stream
    [--allow-tx]                  # runtime override: sets cfg.tx_enabled for this process only
```

In `_cmd_register`: after successful server registration, populate `cfg.tx_enabled=bool(args.allow_tx)` and caps from argparse before `_config.save(cfg)`.

In `_cmd_stream`: `if args.allow_tx: cfg.tx_enabled = True` (before passing `cfg` to the streamer — which requires plumbing `cfg` in, see Phase 3).

### Acceptance

- `test_config.py` round-trip: new fields serialize → deserialize cleanly; missing fields in old JSON default correctly.
- `test_cli_tx.py`: `register --allow-tx --tx-max-gain-db -10` writes expected JSON; `stream --allow-tx` sets runtime flag without touching disk.

---

## Phase 3 — Streamer refactor: session model (RX behavior preserved)

**Why third:** the TX work needs a session-based state machine. Doing the refactor before wiring TX keeps the diff reviewable and keeps the RX regression surface contained.

**Goal:** replace the flat state (`self._sdr`, `self._app_id`, `self._capture_task`, `self._pending_config`, `self._status`) with explicit session objects and an SDR registry, without changing any observable RX behavior.

### Touches

- `src/ria_toolkit_oss/agent/streamer.py` (bulk of work)
- `src/ria_toolkit_oss/agent/hardware.py` (heartbeat grows `capabilities` + optional `sessions` snapshot)
- `src/ria_toolkit_oss/agent/cli.py` (plumb `cfg` into the streamer)
- `tests/agent/test_streamer.py` + `test_hardware.py` — update for new heartbeat shape; keep all RX assertions.

### Data model

```python
from dataclasses import dataclass, field

@dataclass
class RxSession:
    app_id: str
    sdr: Any
    device_key: tuple[str, str | None]       # (device, identifier)
    buffer_size: int
    task: asyncio.Task
    pending_config: dict = field(default_factory=dict)

@dataclass
class TxSession:
    app_id: str
    sdr: Any
    device_key: tuple[str, str | None]
    buffer_size: int
    queue: queue.Queue                        # thread-safe; bytes -> np.complex64 buffers
    stop_event: threading.Event
    task: asyncio.Task                        # wraps run_in_executor(sdr._stream_tx, ...)
    underrun_policy: str = "pause"
    pending_config: dict = field(default_factory=dict)
    last_buffer: np.ndarray | None = None     # for "repeat" policy
    started_at: float = 0.0
    max_duration_s: float | None = None
```

### SDR registry (ref-counted)

```python
class _SdrRegistry:
    def __init__(self, factory):
        self._factory = factory        # (device, identifier) -> SDR
        self._instances: dict[tuple[str, str|None], tuple[Any, int]] = {}
        self._lock = threading.Lock()

    def acquire(self, device: str, identifier: str | None):
        key = (device, identifier)
        with self._lock:
            if key in self._instances:
                sdr, rc = self._instances[key]
                self._instances[key] = (sdr, rc + 1)
                return sdr, key
            sdr = self._factory(device, identifier)
            self._instances[key] = (sdr, 1)
            return sdr, key

    def release(self, key: tuple[str, str|None]) -> bool:
        with self._lock:
            sdr, rc = self._instances[key]
            if rc <= 1:
                del self._instances[key]
                return True      # caller should close()
            self._instances[key] = (sdr, rc - 1)
            return False
```

### Streamer state

```python
class Streamer:
    def __init__(self, ws, cfg: AgentConfig, sdr_factory=None):
        self.ws = ws
        self._cfg = cfg
        self._registry = _SdrRegistry(sdr_factory or _default_sdr_factory)
        self._rx: RxSession | None = None
        self._tx: TxSession | None = None
```

### Message dispatch

```python
async def on_message(self, msg: dict) -> None:
    t = msg.get("type")
    handlers = {
        "start":        self._handle_rx_start,
        "stop":         self._handle_rx_stop,
        "configure":    self._handle_rx_configure,
        # TX handlers stubbed here in Phase 3, implemented in Phase 4
        "tx_start":     self._handle_tx_start,
        "tx_stop":      self._handle_tx_stop,
        "tx_configure": self._handle_tx_configure,
    }
    handler = handlers.get(t)
    if handler is None:
        logger.warning("Unknown server message type: %r", t)
        return
    await handler(msg)
```

Rename internals: `_handle_start → _handle_rx_start`, `_handle_stop → _handle_rx_stop`, etc. Behavior unchanged — just reading/writing `self._rx` in place of the old flat attributes, and going through the registry for acquire/release.

### Heartbeat

```python
# streamer.py
def build_heartbeat(self) -> dict:
    status = "streaming" if (self._rx or self._tx) else "idle"
    sessions: dict = {}
    if self._rx: sessions["rx"] = {"app_id": self._rx.app_id, "state": "streaming"}
    if self._tx: sessions["tx"] = {"app_id": self._tx.app_id, "state": self._tx_state()}
    return heartbeat_payload(
        status=status,
        app_id=(self._rx or self._tx).app_id if (self._rx or self._tx) else None,
        cfg=self._cfg,
        sessions=sessions or None,
    )
```

Update `hardware.heartbeat_payload` to take `cfg` (for `capabilities`/`tx_enabled`) and optional `sessions`. Keep unknown-arg compatibility — existing tests can pass `cfg=AgentConfig()` to get the old shape minus the new fields.

### Phase 3 acceptance

- All existing `test_streamer.py` / `test_integration.py` / `test_hardware.py` cases pass, with the heartbeat additions asserted in `test_hardware.py` (capabilities = `["rx"]` when `tx_enabled=False`).
- New test: two `start` messages in sequence with same `(device, identifier)` both succeed without recreating the SDR (registry hit). (This is a Phase 3 bonus — confirms the registry works before TX consumes it.)
- New test: `tx_start` with `tx_enabled=False` returns `tx_status: error` (handler stubs can do just this much in Phase 3, full implementation lands in Phase 4).

---

## Phase 4 — TX implementation

**Why fourth:** now that binary arrives, config exists, and sessions exist, wire up real TX.

### Touches

- `src/ria_toolkit_oss/agent/streamer.py`
- Potentially a small helper module `src/ria_toolkit_oss/agent/_tx_loop.py` if streamer.py gets unwieldy.
- `tests/agent/test_streamer_tx.py`, `test_tx_safety.py`, `test_tx_underrun.py`, `test_full_duplex.py`

### Binary ingress

```python
async def on_binary(self, data: bytes) -> None:
    if self._tx is None:
        logger.debug("Dropping %d-byte binary frame: no TX session", len(data))
        return
    try:
        self._tx.queue.put(data, timeout=2.0)   # backpressure: block if full
    except queue.Full:
        logger.warning("TX queue stalled; dropping frame (agent side)")
```

Wire this in via `ws.run(..., on_binary=self.on_binary)` — change `run_streamer()`'s `ws.run` call accordingly.

### `_handle_tx_start`

```python
async def _handle_tx_start(self, msg: dict) -> None:
    app_id = msg.get("app_id") or ""
    cfg_radio = dict(msg.get("radio_config") or {})

    # 1) interlocks
    if not self._cfg.tx_enabled:
        return await self._send_tx_error(app_id, "tx disabled on this agent")
    gain = cfg_radio.get("tx_gain")
    if self._cfg.tx_max_gain_db is not None and gain is not None and float(gain) > self._cfg.tx_max_gain_db:
        return await self._send_tx_error(app_id, f"tx_gain {gain} exceeds cap {self._cfg.tx_max_gain_db}")
    freq = cfg_radio.get("tx_center_frequency")
    if self._cfg.tx_allowed_freq_ranges and freq is not None:
        if not any(lo <= float(freq) <= hi for lo, hi in self._cfg.tx_allowed_freq_ranges):
            return await self._send_tx_error(app_id, f"tx_center_frequency {freq} outside allowed ranges")

    if self._tx is not None:
        return await self._send_tx_error(app_id, "tx already active on this agent")

    # 2) device
    device = cfg_radio.pop("device", None)
    identifier = cfg_radio.pop("identifier", None)
    buffer_size = int(cfg_radio.pop("buffer_size", 1024))
    underrun_policy = cfg_radio.pop("underrun_policy", "pause")
    if not device:
        return await self._send_tx_error(app_id, "tx_start missing radio_config.device")

    try:
        sdr, device_key = self._registry.acquire(device, identifier)
        _apply_sdr_config(sdr, cfg_radio)          # sets tx_* attributes via alias map
        # explicit init_tx if the driver supports it
        if hasattr(sdr, "init_tx"):
            sdr.init_tx(
                sample_rate=cfg_radio.get("tx_sample_rate"),
                center_frequency=cfg_radio.get("tx_center_frequency"),
                gain=cfg_radio.get("tx_gain"),
                channel=cfg_radio.get("tx_channel", 0),
                gain_mode=cfg_radio.get("tx_gain_mode", "manual"),
            )
    except Exception as exc:
        self._registry.release(device_key)
        logger.exception("Failed to init TX on %r", device)
        return await self._send_tx_error(app_id, f"tx init failed: {exc}")

    # 3) build session + launch loop
    self._tx = TxSession(
        app_id=app_id,
        sdr=sdr,
        device_key=device_key,
        buffer_size=buffer_size,
        queue=queue.Queue(maxsize=8),
        stop_event=threading.Event(),
        task=None,   # filled below
        underrun_policy=underrun_policy,
        max_duration_s=self._cfg.tx_max_duration_s,
        started_at=time.monotonic(),
    )
    loop = asyncio.get_running_loop()
    self._tx.task = loop.run_in_executor(None, self._tx_executor_body)

    await self._send_tx_status(app_id, "armed")
    # streamer transitions to "transmitting" on the first buffer consumed in the thread;
    # schedule a tiny watchdog that emits that status when queue count rises.
```

### TX executor body

Runs in a worker thread. Blocks in the SDR's `_stream_tx` driven by our callback that pulls from the queue.

```python
def _tx_executor_body(self) -> None:
    sdr = self._tx.sdr
    try:
        sdr._stream_tx(self._tx_callback)
    except Exception:
        logger.exception("TX stream crashed")
        # surface via asyncio side
        asyncio.run_coroutine_threadsafe(
            self._send_tx_status(self._tx.app_id, "error", "stream crashed"),
            asyncio.get_event_loop(),
        )

def _tx_callback(self, num_samples):
    tx = self._tx
    if tx is None or tx.stop_event.is_set():
        sdr = tx.sdr if tx else None
        if sdr is not None:
            sdr.pause_tx()
        return _silence(num_samples)

    # duration watchdog
    if tx.max_duration_s is not None and (time.monotonic() - tx.started_at) > tx.max_duration_s:
        tx.stop_event.set()
        tx.sdr.pause_tx()
        _schedule(self._send_tx_status(tx.app_id, "done", "max duration reached"))
        return _silence(num_samples)

    try:
        raw = tx.queue.get(timeout=0.1)
    except queue.Empty:
        return self._underrun_fill(tx, num_samples)

    samples = np.frombuffer(raw, dtype=np.float32)
    # interleaved float32 -> complex64
    if samples.size % 2 != 0 or samples.size // 2 != num_samples:
        # malformed / wrong-sized frame; underrun this cycle
        logger.warning("TX frame size mismatch: got %d floats, expected %d", samples.size, num_samples * 2)
        return self._underrun_fill(tx, num_samples)
    complex_samples = samples.reshape(-1, 2).view(np.complex64).reshape(-1)
    tx.last_buffer = complex_samples
    return complex_samples
```

Helper `_underrun_fill`:

```python
def _underrun_fill(self, tx: TxSession, num_samples: int):
    if tx.underrun_policy == "zero":
        return np.zeros(num_samples, dtype=np.complex64)
    if tx.underrun_policy == "repeat" and tx.last_buffer is not None:
        return tx.last_buffer[:num_samples] if tx.last_buffer.size >= num_samples \
            else np.concatenate([tx.last_buffer,
                                 np.zeros(num_samples - tx.last_buffer.size, dtype=np.complex64)])
    # "pause" (default)
    tx.stop_event.set()
    tx.sdr.pause_tx()
    _schedule(self._send_tx_status(tx.app_id, "underrun"))
    return np.zeros(num_samples, dtype=np.complex64)
```

`_schedule()` is a tiny wrapper around `asyncio.run_coroutine_threadsafe` that resolves the loop once at streamer construction.

### `_handle_tx_stop`

```python
async def _handle_tx_stop(self, msg: dict) -> None:
    tx = self._tx
    if tx is None:
        return
    tx.stop_event.set()
    tx.sdr.pause_tx()
    # drain the queue so the executor thread wakes
    try:
        while True:
            tx.queue.get_nowait()
    except queue.Empty:
        pass
    # wait up to ~1s for the executor thread to finish
    if tx.task is not None:
        try:
            await asyncio.wait_for(asyncio.wrap_future(tx.task), timeout=1.0)
        except asyncio.TimeoutError:
            logger.warning("TX executor did not exit within 1s after stop")

    # release SDR reference
    should_close = self._registry.release(tx.device_key)
    if should_close:
        try:
            tx.sdr.close()
        except Exception:
            logger.exception("Error closing SDR on tx_stop")

    self._tx = None
    await self._send_tx_status(msg.get("app_id") or "", "done")
```

### `_handle_tx_configure`

```python
async def _handle_tx_configure(self, msg: dict) -> None:
    if self._tx is None:
        return
    self._tx.pending_config.update(msg.get("radio_config") or {})
```

Consume `pending_config` at the top of `_tx_callback` before pulling from the queue (same pattern as RX's `_capture_loop`), using `_apply_sdr_config` with tx aliases.

### `_apply_sdr_config` — extend alias map

```python
_CONFIG_ATTR_MAP = {
    # existing RX aliases...
    "sample_rate":      ("sample_rate", "rx_sample_rate"),
    "center_frequency": ("center_freq", "rx_center_frequency"),
    "gain":             ("gain", "rx_gain"),
    "bandwidth":        ("bandwidth", "rx_bandwidth"),
    # NEW TX aliases
    "tx_sample_rate":      ("tx_sample_rate",),
    "tx_center_frequency": ("tx_center_frequency", "tx_lo"),
    "tx_gain":             ("tx_gain",),
    "tx_bandwidth":        ("tx_bandwidth",),
}
```

Pluto has `set_tx_sample_rate`, `set_tx_center_frequency`, `set_tx_gain` — those are called by `init_tx` using the attribute values, so setting attributes via `_apply_sdr_config` + calling `init_tx` is sufficient.

### Phase 4 acceptance

- `test_streamer_tx.py`: full happy path — `tx_start` against MockSDR → 3 binary frames → verify `_stream_tx` callback received them in order → `tx_stop` → session cleared, SDR closed.
- `test_tx_safety.py`: one test per cap — `tx_enabled=False`, gain cap, freq range, duplicate session. Each produces a `tx_status: error` JSON; registry shows zero outstanding acquires.
- `test_tx_underrun.py`: three tests — `pause` (session ends, `underrun` emitted), `zero` (callback returns zeros, no status change), `repeat` (callback returns last buffer).
- `test_full_duplex.py`: against MockSDR, send `start` + `tx_start` with same `(device=mock, identifier=None)` → registry ref-count = 2 → both sessions stream independently → stop one, other still runs → stop second, SDR closed.

---

## Phase 5 — Integration + docs

**Touches:**

- `tests/agent/test_integration_tx.py` — end-to-end with a real local `websockets` server + MockSDR. Mirror `test_integration.py`'s shape: register → heartbeat with `tx_enabled=True` → tx_start → 3 binary frames → tx_stop.
- `docs/agent_tx_protocol.md` — short, user-facing: message types, binary format, heartbeat additions, interlock config, CLI examples. Link from [screens_agent_handoff.md](./screens_agent_handoff.md).
- `README.md` (if it mentions agent subcommands) — add `--allow-tx` usage.

**Real-Pluto smoke test** (manual, not in CI):

1. `ria-agent register --hub http://hub:3005 --api-key KEY --allow-tx --tx-max-gain-db -10 --tx-freq-range 2.4e9 2.5e9`
2. `ria-agent stream`
3. From a Python REPL with `websockets`, open the hub WS on the agent's behalf (bypass hub during dev), send a `tx_start` + binary frames of a 1kHz tone → confirm carrier on a spectrum analyzer at the configured frequency.

---

## File-by-file summary

| File | Phase | Change |
|---|---|---|
| `src/ria_toolkit_oss/agent/ws_client.py` | 1 | Add `on_binary` callback. |
| `src/ria_toolkit_oss/agent/config.py` | 2 | Add `tx_enabled`, `tx_max_gain_db`, `tx_max_duration_s`, `tx_allowed_freq_ranges`. |
| `src/ria_toolkit_oss/agent/cli.py` | 2, 4 | `--allow-tx` + cap flags on `register`; `--allow-tx` on `stream`; plumb `cfg` into `Streamer`. |
| `src/ria_toolkit_oss/agent/hardware.py` | 3 | `heartbeat_payload(cfg, sessions)` with `capabilities`, `tx_enabled`. |
| `src/ria_toolkit_oss/agent/streamer.py` | 3, 4 | Session refactor, SDR registry, TX dispatch, TX loop, underrun fills, `_apply_sdr_config` TX aliases. |
| `src/ria_toolkit_oss/agent/_tx_loop.py` | 4 (opt) | Extracted TX callback helpers if streamer.py > ~400 lines. |
| `tests/agent/test_ws_client.py` | 1 | Binary-frame case. |
| `tests/agent/test_config.py` | 2 | Round-trip new fields. |
| `tests/agent/test_cli_tx.py` | 2 | New — `--allow-tx` flag handling. |
| `tests/agent/test_hardware.py` | 3 | Heartbeat `capabilities` + `sessions`. |
| `tests/agent/test_streamer.py` | 3 | Refactor for session model; RX assertions unchanged. |
| `tests/agent/test_streamer_tx.py` | 4 | New — TX happy path. |
| `tests/agent/test_tx_safety.py` | 4 | New — cap enforcement. |
| `tests/agent/test_tx_underrun.py` | 4 | New — pause/zero/repeat policies. |
| `tests/agent/test_full_duplex.py` | 4 | New — shared SDR ref count. |
| `tests/agent/test_integration_tx.py` | 5 | New — real `websockets` server E2E. |
| `docs/agent_tx_protocol.md` | 5 | New — operator-facing protocol doc. |

---

## Implementation gotchas (do not skip)

1. **Asyncio ↔ thread bridge.** The SDR's `_stream_tx` is synchronous and runs in an executor thread. Its callback must not `await`. Use `queue.Queue` (thread-safe) for inbound buffers and `asyncio.run_coroutine_threadsafe(coro, loop)` to emit `tx_status` from inside the thread. Resolve `loop` once at streamer construction; don't call `get_event_loop()` from the thread.

2. **`sdr.pause_tx()` from inside the callback.** Pluto's `_stream_tx` loop condition is `while self._enable_tx is True`. Calling `pause_tx()` inside the callback sets `_enable_tx = False` so the NEXT iteration exits. That's fine — it may emit one trailing zero-filled buffer. Document this; don't try to exit mid-callback.

3. **Queue drain on stop.** When `_handle_tx_stop` sets `stop_event` and pauses TX, the executor thread may still be blocked in `queue.get(timeout=0.1)`. Draining the queue does not unblock a timed get. Rely on the 100ms timeout; the thread exits on the next iteration. Don't try to clever-inject a poison pill.

4. **Interleaved float32 → complex64 conversion.** `np.frombuffer(buf, dtype=np.float32).view(np.complex64)` is zero-copy and correct when `buf.size` is a multiple of 8 bytes. Validate size first; mismatched size = underrun for that cycle, don't crash the thread.

5. **MockSDR's `_stream_tx`** ([sdr/mock.py:96-100](../src/ria_toolkit_oss/sdr/mock.py#L96-L100)) calls `callback(self.rx_buffer_size)` — it passes a *size*, not samples. The TX callback contract is "I am given `num_samples`, I return that many complex64 samples." `test_streamer_tx` must respect this: the test's `sdr.tx_buffer_size` (if used) doesn't affect what the callback receives from mock. Simplest path: set `MockSDR.rx_buffer_size = buffer_size` in the test harness before `_stream_tx` is invoked, so the TX callback receives the right size.

6. **`init_tx` on MockSDR vs Pluto.** MockSDR's `init_tx` [sets attributes and flips `_tx_initialized = True`](../src/ria_toolkit_oss/sdr/mock.py#L70-L81). Pluto's does the same plus `_rx_initialized = False` (the FDD bug). For full-duplex tests we currently target MockSDR only — Pluto FDD will work because our RX path ignores `_rx_initialized`, but the real-Pluto smoke test is the only validation. Call that out in the PR description.

7. **Don't block the event loop.** `asyncio.wait_for(asyncio.wrap_future(tx.task), timeout=1.0)` in `_handle_tx_stop` is non-blocking from the loop's perspective — the 1s cap prevents a misbehaving driver from stalling heartbeat/RX.

8. **Heartbeat during TX.** The existing heartbeat loop runs on a 30s timer. Sessions snapshot is cheap; no locking needed if we read `self._rx`/`self._tx` references atomically (Python ref swap is GIL-safe for single field reads).

---

## Rollout

1. Open a single PR per phase (1 → 2 → 3 → 4 → 5), each green on its own.
2. Phase 3 is the riskiest diff (RX refactor). Get a second reviewer if possible; the regression surface is all of current RX behavior.
3. After Phase 4 merges, `ria-agent stream --allow-tx` is a usable toy — you can hand-drive it from a Python REPL with `websockets` to validate against real hardware before the hub side is ready.
4. Phase 5 closes the loop and ships the user-facing docs.

## Out of scope (explicit)

- **Multi-app-per-agent** — one RX + one TX per agent in v1. Adding session IDs to binary frames is a v2 protocol bump.
- **Other TX drivers** (HackRF, USRP, bladeRF) — wiring `_CONFIG_ATTR_MAP` entries and verifying `_stream_tx` behavior per-driver. Tackle when the hub has an operator that targets them.
- **Resampling / clock drift** — agent treats the hub-supplied samples as authoritative. Drift manifests as underruns; the underrun policy is the only mitigation.
- **Fixing Pluto's `init_tx` `_rx_initialized = False` reset** — pre-existing, not triggered by our RX path, left for a separate cleanup.