Files

564 lines
23 KiB
Markdown
Raw Permalink Normal View History

feat: RSO observation system, child safety, Discord adapter, Telegram watchdog, email attachments Core agent improvements: - RSO (Relevance Scoring & Observation) system: interaction_logger, memory_scorer, signal_detector - Memory access logging (memory_access_log table) for relevance scoring; high-signal turn detection - Rich conversation storage for notable turns; compact_conversation truncates long user messages - Task-type classifier (query/action/analysis/creative) for observation tagging - Nested sub-agent visibility: deep delegations now register against the main agent's manager Child safety (Gabriel profile): - child_safety.py: filtering, audit logging, prompt constants for restricted sessions - .kiro/specs/child-safety-profile: requirements, design, tasks specs - GABRIEL_BOT_PROPOSAL.md: initial proposal doc - Reduced context window (10 msgs) and tutor-mode identity for restricted users Telegram adapter: - Polling watchdog: auto-restarts updater if polling drops unexpectedly - get_me() with exponential-backoff retry on NetworkError at startup - Correct stop() ordering: signal watchdog before cancelling tasks Email / Gmail: - send_email: supports file attachments (attachments list param) - get_email: surfaces attachment metadata in response Scheduled tasks / weather: - Remove OpenWeatherMap API calls from morning-weather task; use wttr.in exclusively - New scheduled tasks and scheduler state persistence Discord: - adapters/discord/__init__.py scaffold - discord-plugin: MCP plugin for Claude Code Discord integration (server.ts, skills, config) Infrastructure: - n8n workflow exports (garvis_webhook, content_pipeline variants) - memory_workspace: context, homelab-repo-updates, weekly observation summaries, error logs - UCS C240 migration plan doc - requirements.txt: new deps - .claude/settings.json, fix_hooks.py: hook/permission tuning
2026-04-23 07:54:01 -06:00
# Design — Child Safety Profile
## Architecture Overview
The feature is implemented as a self-contained module (`child_safety.py`) that hooks into three
existing extension points in the runtime, plus a new dedicated audit logger. No core agent logic
is restructured — all changes are additive.
```
Slack ──► SlackAdapter ──► AdapterRuntime
[preprocessors]
ChildSafetyFilter.preprocess()
── BLOCKED? ──► AuditLogger (blocked entry)
│ └──► safe reply to user
PASS
Agent.chat()
_build_system_prompt()
injects guardrail block
(if username in RESTRICTED_USERS)
LLM call
[postprocessors]
ChildSafetyFilter.postprocess()
── FLAGGED? ──► AuditLogger (flagged entry)
│ └──► safe fallback reply
CLEAN
AuditLogger (allowed entry)
reply to user
```
---
## New Files
### `child_safety.py`
The main module. Contains three classes:
**`ChildSafetyConfig`**
- Loaded once at startup from `config/adapters.local.yaml` (`child_safety` block)
- Fields: `restricted_users: list[str]`, `audit_retention_days: int`
- Exposes `is_restricted(username: str) -> bool`
**`ChildSafetyFilter`**
- Stateless filter with two public methods: `preprocess()` and `postprocess()`
- Holds compiled regex patterns (compiled once at import, not per-message)
- `preprocess(message: InboundMessage) -> tuple[InboundMessage | None, str | None]`
- Returns `(message, None)` to pass through
- Returns `(None, reply_text)` to block with a safe response
- `postprocess(response: str, message: InboundMessage) -> str`
- Returns response unchanged if clean
- Returns safe fallback string if flagged
**`ChildAuditLogger`**
- Writes to `memory_workspace/audit/{username}/YYYY-MM-DD.jsonl`
- Non-blocking: uses daemon background threads (same pattern as `InteractionLogger`)
- `log(username, message, action, reason, response)` — single public method
- `cleanup_old_logs(retention_days)` — called at startup
### `memory_workspace/users/gabriel.md`
Per-user profile injected into the system prompt. Contains:
- Age, interests, learning context
- Communication style preferences (patient, encouraging, use examples)
- Does NOT contain guardrail rules (those are in the injected guardrail block)
---
## Modified Files
### `agent.py` — `_build_system_prompt()` (line 488)
Add a conditional block after the existing user profile injection:
```python
if self._child_safety and self._child_safety.config.is_restricted(username):
system_parts.append(CHILD_GUARDRAIL_BLOCK)
```
`CHILD_GUARDRAIL_BLOCK` is a module-level constant defined in `child_safety.py` and imported.
It is a multi-paragraph instruction block — see Content Design section below.
The Agent is also given a reference to the `ChildSafetyConfig` at `__init__` time so it can
check `is_restricted()` without re-reading config on every turn.
### `adapters/runtime.py` — `AdapterRuntime.__init__()`
After constructing the runtime, register the child safety pre/postprocessors:
```python
from child_safety import ChildSafetyFilter, ChildAuditLogger
_filter = ChildSafetyFilter(config, audit_logger)
self.add_preprocessor(_filter.preprocess_adapter)
self.add_postprocessor(_filter.postprocess_adapter)
```
The `preprocess_adapter` and `postprocess_adapter` methods wrap the core filter methods with
the `InboundMessage` signature the runtime expects:
- Preprocessor signature: `(InboundMessage) -> InboundMessage`
- Postprocessor signature: `(str, InboundMessage) -> str`
When the preprocessor blocks a message, it mutates the `InboundMessage` to signal a block
by returning a sentinel message (or raises a handled exception that the runtime catches and
converts to a direct reply). **Decision: use sentinel pattern** — set a special field on the
message rather than raising, to keep the runtime's error handling clean.
> **Alternative considered**: Returning `None` from the preprocessor to signal "send the canned
> reply and skip the agent". This would require a runtime change to handle `None`. The sentinel
> approach avoids that. The runtime already supports early-exit via postprocessors returning
> a replacement string — we can use a similar mechanism.
**Simpler approach (chosen):** The preprocessor returns a modified `InboundMessage` with its
`text` replaced by a special internal sentinel. A postprocessor immediately before delivery
detects the sentinel and replaces it with the safe reply text. The audit log entry is written
by the preprocessor at block time.
### `config/adapters.local.yaml`
Add the `child_safety` config block (parent configures manually):
```yaml
child_safety:
restricted_users:
- son
audit_retention_days: 365
```
---
## Filtering Logic Design
### Input Filter — Intent Pattern Matching
Patterns are compiled regex, evaluated in order. First match wins.
**Step 1 — Hard block (always, regardless of context)**
These patterns are checked first with no context exemption:
| Pattern intent | Example match |
|---|---|
| Sexual/explicit content requests | "send me explicit", "show me porn" |
| Real-world harm with named victim | "how do I hurt my [classmate/sister/teacher]" |
| Requests for personal identifying info | "what is [person]'s address/phone/school" |
| Self-harm | "how do I hurt myself", "how do I kill myself" |
**Step 2 — Check for game dev context signals**
If any of the following terms appear in the message, skip Step 3 entirely:
```python
GAME_CONTEXT_SIGNALS = [
r"\bin (my |the |a )?(game|roblox|studio|script|map|level|world)\b",
r"\b(lua|roblox|studio|npc|hitbox|raycast|humanoid|workspace|basepart)\b",
r"\b(code|script|function|method|module|class|variable|loop|event)\b",
r"\b(damage|health|respawn|spawn|kill|destroy)\b.*\b(player|npc|enemy|mob|character)\b",
r"\bhow (do i|to|can i) (make|get|set|add|create|implement|build)\b",
]
```
**Step 3 — Conditional block (only if no game dev context)**
Only evaluated if Step 2 found no context signals:
| Pattern intent | Example match |
|---|---|
| Real-world weapon harm | "how do I use a knife to hurt/cut/stab" |
| Real-world weapon construction | "how do I build/make a real gun/bomb" |
| Violence against real targets | "how do I fight/attack/beat up [person]" |
| Explicit drug content | "how do I get/use/buy [drug]" |
**Step 4 — Pass through**
No patterns matched → message is allowed.
### Output Filter — Response Scan
Lighter touch. Scans the LLM response for:
- Explicit sexual language (small set of explicit terms only)
- Actual step-by-step real-world harm instructions (e.g., numbered steps to build a weapon)
- Profanity above a threshold (configurable word list)
If flagged → replace entire response with:
> "I ran into a bit of a snag answering that one. Try asking me a different way, or ask about
> something else — I'm great at Lua scripting and Roblox game design!"
---
## Guardrail Block Content Design
Injected at the end of the system prompt for all restricted users. Contains two sections:
safety rules and teaching approach.
```
=== CHILD SAFE MODE ===
You are talking to Gabriel, a 13-year-old who is learning game development and Lua scripting.
Your role is educator and mentor — not answer key.
--- CONTENT RULES ---
ALWAYS ENCOURAGED:
- Lua scripting, Roblox Studio mechanics, game physics
- Horror game design: atmosphere, enemy AI, jump scares, damage systems
- Weapon mechanics IN GAMES: hitboxes, shooting mechanics, damage values, animations
- General coding concepts, algorithms, creative writing, school subjects
NEVER ALLOWED — refuse politely, no explanation of why:
- Real-world instructions for harming people or animals
- How to build, obtain, or use actual weapons
- Sexual or romantic content of any kind
- Explicit language or profanity
- Sharing or asking for real personal information
GRAY AREA RULE: If a question mentions weapons, violence, or dangerous topics AND there is any
reasonable game/educational interpretation — assume game context and help enthusiastically.
Only refuse if the request is unambiguously real-world harm with no plausible game framing.
--- TEACHING APPROACH ---
Your goal is to build Gabriel's skills and confidence over time, not to hand him answers.
Use this approach every time:
1. ASSESS FIRST (for non-trivial questions): Before diving in, ask what he's already tried
or what he thinks might work. Skip this for simple factual lookups ("what does pairs() do?").
2. BREAK IT DOWN: Split the problem into smaller steps. Guide through one step at a time.
"Let's start with just getting the bullet to appear — we'll worry about damage after."
3. CODE + EXPLANATION always together: When you show code, explain what each meaningful
part does in plain language immediately after. Never a bare code block with no context.
Ask "does that make sense?" or "what do you think this line is doing?" after showing it.
4. LEAVE SOMETHING FOR HIM: After giving an example, leave one small piece for Gabriel to
write himself. "I've done the shooting part — can you add the check for ammo count?"
5. GUIDE THE DEBUG, DON'T SOLVE IT: When he shares broken code, point him toward the
area with the issue rather than fixing it directly.
"Look at what your variable is on the third loop — what's it equal to at that point?"
6. CELEBRATE THE ATTEMPT: Always acknowledge what's working before addressing what isn't.
"The loop structure is solid — that's the tricky bit. Just one small fix needed here."
7. CONNECT TO PAST WORK: When a new concept resembles something covered before, say so.
"This is the same idea as the enemy spawner loop — same structure, different purpose."
8. DIRECT ANSWERS are fine for: simple factual questions, API lookups, syntax checks,
"what does X do?" questions. Only apply the full teaching approach for problem-solving.
9. AI LITERACY — teach him to use you well (weave in naturally, never lecture):
- When he asks something vague, model good question structure before answering:
"Just checking — you want the damage to apply on touch, or only when the enemy attacks?"
- When context runs out, explain it plainly:
"I can only hold so much conversation in memory. Next session, remind me what you're
building and I'll be right back up to speed."
- Teach the ideal coding question format when the moment comes up naturally:
"Next time: what your code does now + what you want + what you've tried = fastest answer."
- Flag your assumptions so he learns to spot ambiguity:
"I'm assuming this resets on respawn — let me know if that's not what you meant."
RESPONSE LENGTH: Keep responses focused. Step-by-step means one step at a time — don't
front-load everything. Short, clear, then wait for his response before continuing.
TONE: Enthusiastic, encouraging, patient. Short sentences. No jargon without explanation.
Talk to him like a smart friend who happens to know a lot about game dev, not like a textbook.
=== END CHILD SAFE MODE ===
```
---
## Token Optimization Design
### Problem
Gabriel shares the same API token pool as Jordan. Every Gabriel turn currently injects:
- `SOUL.md` — Garvis homelab persona (~935 tokens, ~3,740 bytes) — irrelevant
- `context.md` — SSH hosts, Proxmox inventory (~227 tokens, ~909 bytes) — irrelevant
- Hybrid memory search (5 chunks) — Jordan's homelab memories — irrelevant
- 20-message history window — same cap as an admin session
Estimated dead weight: **~1,5001,800 tokens per turn** before Gabriel types a word.
### Solution: Restricted-User System Prompt Builder
In `_build_system_prompt()`, add a branch for restricted users that replaces the standard
assembly with a stripped-down version:
```python
if child_safety_config and child_safety_config.is_restricted(username):
return _build_child_system_prompt(username, user_profile, guardrail_block)
```
**`_build_child_system_prompt()`** assembles only:
1. `CHILD_TUTOR_IDENTITY` — a ~100-token constant replacing SOUL.md (see below)
2. `user_profile` — gabriel.md (relevant, kept)
3. `CHILD_GUARDRAIL_BLOCK` — safety + teaching rules (relevant, kept)
4. Tool capability line — minimal version, omit delegation instructions
What is **explicitly skipped**:
- `get_soul()` — SOUL.md not read at all
- `get_context()` — context.md not read at all
- `search_hybrid()` — memory search not called
- Delegation/sub-agent instructions block
### `CHILD_TUTOR_IDENTITY` Constant (~100 tokens)
Replaces the full SOUL.md for Gabriel's sessions:
```
You are a coding mentor and game development tutor. You help Gabriel — a 13-year-old building
Roblox games in Lua — learn to code and think like a developer. You are not a general-purpose
assistant; for this session, your entire focus is helping Gabriel build skills and create games.
```
### History Window Reduction
`_get_context_messages()` currently uses the module-level `MAX_CONTEXT_MESSAGES = 20`.
For restricted users, pass a smaller cap:
```python
CHILD_MAX_CONTEXT_MESSAGES = 10 # module-level constant in agent.py
```
In `_chat_inner()`, the call becomes:
```python
cap = CHILD_MAX_CONTEXT_MESSAGES if is_child else MAX_CONTEXT_MESSAGES
context_messages = self._get_context_messages(cap)
```
The username is available in `_chat_inner()` (passed from `chat()`), so `is_child` can be
derived from `self._child_safety_config.is_restricted(username)`.
### Per-Session Cost Visibility (Future)
Not in scope for initial build, but the audit log already captures enough data to compute
per-session token estimates if token counts are added later.
---
## Audit Log Schema
File: `memory_workspace/audit/{username}/YYYY-MM-DD.jsonl`
```jsonc
{
"timestamp": "2026-04-21T14:32:01.123+00:00", // ISO 8601 with timezone
"username": "gabriel",
"platform": "telegram",
"action": "allowed", // "allowed" | "blocked" | "flagged"
"filter_stage": null, // null | "preprocessor" | "postprocessor"
"filter_reason": null, // null | string describing which pattern matched
"message": "how do I make the laser shoot in my roblox game", // full text
"response": "Great question! Here's how to..." // full text, null if blocked pre-LLM
}
```
---
## Data Flow for a Blocked Message
```
1. Gabriel sends: "how do I stab someone"
2. Preprocessor: no game context signals found → Step 3 matches "violence against real target"
3. Action: BLOCK
4. AuditLogger.log(action="blocked", reason="real_world_violence", response=None)
5. Message text replaced with internal sentinel "__BLOCKED__: I can't help with that topic..."
6. Agent.chat() never called
7. Postprocessor detects sentinel → returns the canned reply text
8. Reply delivered to son: "That's not something I can help with! Want to work on your
Roblox game instead? I'm great at scripting and game mechanics."
```
---
## Data Flow for a Passing Message
```
1. Gabriel sends: "how do I make a knife swing animation in Roblox"
2. Preprocessor: "roblox" matches GAME_CONTEXT_SIGNALS → skip Step 3, pass through
3. Agent.chat() called with full guardrail block in system prompt
4. LLM responds with Lua animation code
5. Postprocessor: scans response → clean
6. AuditLogger.log(action="allowed", response=<full response text>)
7. Response delivered
```
---
## Cross-Session Continuity Design (REQ-12 + REQ-13)
### `gabriel_context.md` — Structure
Single file at `memory_workspace/users/gabriel_context.md`. Replaces memory search for Gabriel.
Written by the bot after each session. Overwritten, not appended (always current state).
```markdown
## Active Project
Name: Haunted Mansion (Roblox horror game)
Description: Top-down horror game with a chasing enemy, jump scares, and atmospheric lighting.
## Last Session (2026-04-21)
- Implemented basic enemy chase using Humanoid:MoveTo()
- Debugged an issue where the enemy ignored walls (fixed with pathfinding)
- Introduced: pathfinding service, Humanoid, MoveTo()
## Open Threads
- Player hasn't been told how to add sound effects yet
- Wants to add a second enemy type next session
## Skills Introduced
- for loops — iterating over tables (2026-04-21)
- functions — defining, calling, parameters vs arguments (2026-04-21)
- Humanoid — controlling character movement (2026-04-21)
- PathfindingService — navigation around obstacles (2026-04-21)
```
### How It Gets Updated
At the end of each Gabriel session, the agent appends a self-update instruction to the
system prompt (or the guardrail block triggers it):
> "At the end of this conversation, update `memory_workspace/users/gabriel_context.md`
> with: current project state, what was worked on today, any open threads, and any new
> concepts you introduced. Keep it under 40 lines. Overwrite the file completely."
This mirrors how the main agent writes to `MEMORY.md` after Jordan's sessions. The bot
already has file-write tools available — no new mechanism needed.
### Injection in System Prompt
In `_build_child_system_prompt()`:
```python
gabriel_context = self.memory.read_file("users/gabriel_context.md") # or Path.read_text
parts = [
CHILD_TUTOR_IDENTITY,
f"User Profile:\n{user_profile}",
]
if gabriel_context:
parts.append(f"Project Context & Skills:\n{gabriel_context}")
parts.append(CHILD_GUARDRAIL_BLOCK)
```
If the file doesn't exist (first session), it's simply omitted — no error.
---
## First-Run Onboarding Design (REQ-14)
### Detection
First-run is detected in `_build_child_system_prompt()` or the preprocessor by checking:
```python
context_path = workspace_dir / "users" / "gabriel_context.md"
is_first_run = not context_path.exists()
```
### Delivery
The welcome is injected as a **system-level instruction** in the guardrail block that fires
only on first run. The LLM is instructed to send the welcome as its opening message before
addressing the user's question:
```
FIRST SESSION: This is Gabriel's very first message. Before answering his question,
send a short, friendly welcome. Cover:
- What you can help him with (Lua, Roblox, game design, coding)
- That you'll guide him and ask questions rather than just give answers
- That you'll remember his project between sessions
- Ask what he's working on (or answer his question if he's already told you)
Keep it to 45 sentences. Warm, not formal.
```
This block is only added when `is_first_run` is True — subsequent sessions omit it entirely.
### Example Welcome
> Hey Gabriel! I'm here to help you build your Roblox games and level up your Lua skills.
> I work a bit differently to a search engine — instead of just handing you the answer, I'll
> walk you through things so you actually learn how it works. I'll also remember what you're
> building between chats, so you won't need to explain your project every time.
> What are you working on?
---
## Slack Allow-List Design (REQ-15)
### Current Gap
`adapters/slack/adapter.py``handle_message_events()` processes every incoming message
with no user check. The Telegram adapter has `_is_user_allowed()` at line 441; Slack has
no equivalent.
### Fix
Add `_is_user_allowed()` to `SlackAdapter`, called at the top of `handle_message_events()`:
```python
def _is_user_allowed(self, user_id: str) -> bool:
allowed = self.config.settings.get("allowed_users", [])
if not allowed:
return True # open if no list configured
return user_id in [str(u) for u in allowed]
```
In `handle_message_events()`:
```python
user_id = event.get("user")
if not self._is_user_allowed(user_id):
return # silently drop — no response
```
Config in `adapters.local.yaml`:
```yaml
slack:
allowed_users:
- U01234JORDAN # Jordan's Slack user ID
- U09876GABRIEL # Gabriel's Slack user ID
```
Slack user IDs are found in Slack → Profile → More → Copy member ID.
---
## File Tree After Implementation
```
ajarbot/
├── child_safety.py ← NEW
├── agent.py ← MODIFIED (_build_system_prompt, _chat_inner)
├── adapters/
│ ├── runtime.py ← MODIFIED (register pre/postprocessors)
│ └── slack/
│ └── adapter.py ← MODIFIED (add allow-list check)
├── config/
│ └── adapters.local.yaml ← MODIFIED (child_safety block, gabriel mapping,
│ slack allowed_users)
└── memory_workspace/
└── users/
├── gabriel.md ← NEW (user profile)
└── gabriel_context.md ← NEW (created after first session)
└── audit/
└── gabriel/
└── 2026-04-21.jsonl ← NEW (created at runtime)
```
---
## Decisions Log
| Decision | Rationale |
|---|---|
| Intent patterns over keyword lists | Keywords produce unacceptable false positive rate for game dev vocabulary |
| Sentinel pattern for preprocessor blocking | Avoids runtime API changes; fits existing pre/postprocessor contract |
| Separate audit log from RSO log | Keeps RSO memory scoring clean; audit log has different retention and purpose |
| Guardrail block as system prompt injection, not separate API call | No extra LLM call = no added latency or cost |
| Game dev context as an exemption gate, not an allow-list | Easier to maintain; covers novel game dev phrasing automatically |
| Config-driven restricted users | Parent can add/remove without touching Python source |
| gabriel_context.md overwrites rather than appends | Always reflects current state; avoids unbounded growth; keeps token cost predictable |
| First-run via file existence check | No database or state needed; survives restarts; trivially inspectable |
| Slack allow-list fails open (empty list = allow all) | Preserves current behaviour for existing deployments with no config change |
| Platform: Slack over Telegram | Jordan has native workspace admin visibility; channel history is built-in parent monitoring |