Files
Jordan Ramos 916f86725d feat: RSO observation system, child safety, Discord adapter, Telegram watchdog, email attachments
Core agent improvements:
- RSO (Relevance Scoring & Observation) system: interaction_logger, memory_scorer, signal_detector
- Memory access logging (memory_access_log table) for relevance scoring; high-signal turn detection
- Rich conversation storage for notable turns; compact_conversation truncates long user messages
- Task-type classifier (query/action/analysis/creative) for observation tagging
- Nested sub-agent visibility: deep delegations now register against the main agent's manager

Child safety (Gabriel profile):
- child_safety.py: filtering, audit logging, prompt constants for restricted sessions
- .kiro/specs/child-safety-profile: requirements, design, tasks specs
- GABRIEL_BOT_PROPOSAL.md: initial proposal doc
- Reduced context window (10 msgs) and tutor-mode identity for restricted users

Telegram adapter:
- Polling watchdog: auto-restarts updater if polling drops unexpectedly
- get_me() with exponential-backoff retry on NetworkError at startup
- Correct stop() ordering: signal watchdog before cancelling tasks

Email / Gmail:
- send_email: supports file attachments (attachments list param)
- get_email: surfaces attachment metadata in response

Scheduled tasks / weather:
- Remove OpenWeatherMap API calls from morning-weather task; use wttr.in exclusively
- New scheduled tasks and scheduler state persistence

Discord:
- adapters/discord/__init__.py scaffold
- discord-plugin: MCP plugin for Claude Code Discord integration (server.ts, skills, config)

Infrastructure:
- n8n workflow exports (garvis_webhook, content_pipeline variants)
- memory_workspace: context, homelab-repo-updates, weekly observation summaries, error logs
- UCS C240 migration plan doc
- requirements.txt: new deps
- .claude/settings.json, fix_hooks.py: hook/permission tuning
2026-04-23 07:54:01 -06:00

564 lines
23 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Design — Child Safety Profile
## Architecture Overview
The feature is implemented as a self-contained module (`child_safety.py`) that hooks into three
existing extension points in the runtime, plus a new dedicated audit logger. No core agent logic
is restructured — all changes are additive.
```
Slack ──► SlackAdapter ──► AdapterRuntime
[preprocessors]
ChildSafetyFilter.preprocess()
── BLOCKED? ──► AuditLogger (blocked entry)
│ └──► safe reply to user
PASS
Agent.chat()
_build_system_prompt()
injects guardrail block
(if username in RESTRICTED_USERS)
LLM call
[postprocessors]
ChildSafetyFilter.postprocess()
── FLAGGED? ──► AuditLogger (flagged entry)
│ └──► safe fallback reply
CLEAN
AuditLogger (allowed entry)
reply to user
```
---
## New Files
### `child_safety.py`
The main module. Contains three classes:
**`ChildSafetyConfig`**
- Loaded once at startup from `config/adapters.local.yaml` (`child_safety` block)
- Fields: `restricted_users: list[str]`, `audit_retention_days: int`
- Exposes `is_restricted(username: str) -> bool`
**`ChildSafetyFilter`**
- Stateless filter with two public methods: `preprocess()` and `postprocess()`
- Holds compiled regex patterns (compiled once at import, not per-message)
- `preprocess(message: InboundMessage) -> tuple[InboundMessage | None, str | None]`
- Returns `(message, None)` to pass through
- Returns `(None, reply_text)` to block with a safe response
- `postprocess(response: str, message: InboundMessage) -> str`
- Returns response unchanged if clean
- Returns safe fallback string if flagged
**`ChildAuditLogger`**
- Writes to `memory_workspace/audit/{username}/YYYY-MM-DD.jsonl`
- Non-blocking: uses daemon background threads (same pattern as `InteractionLogger`)
- `log(username, message, action, reason, response)` — single public method
- `cleanup_old_logs(retention_days)` — called at startup
### `memory_workspace/users/gabriel.md`
Per-user profile injected into the system prompt. Contains:
- Age, interests, learning context
- Communication style preferences (patient, encouraging, use examples)
- Does NOT contain guardrail rules (those are in the injected guardrail block)
---
## Modified Files
### `agent.py` — `_build_system_prompt()` (line 488)
Add a conditional block after the existing user profile injection:
```python
if self._child_safety and self._child_safety.config.is_restricted(username):
system_parts.append(CHILD_GUARDRAIL_BLOCK)
```
`CHILD_GUARDRAIL_BLOCK` is a module-level constant defined in `child_safety.py` and imported.
It is a multi-paragraph instruction block — see Content Design section below.
The Agent is also given a reference to the `ChildSafetyConfig` at `__init__` time so it can
check `is_restricted()` without re-reading config on every turn.
### `adapters/runtime.py` — `AdapterRuntime.__init__()`
After constructing the runtime, register the child safety pre/postprocessors:
```python
from child_safety import ChildSafetyFilter, ChildAuditLogger
_filter = ChildSafetyFilter(config, audit_logger)
self.add_preprocessor(_filter.preprocess_adapter)
self.add_postprocessor(_filter.postprocess_adapter)
```
The `preprocess_adapter` and `postprocess_adapter` methods wrap the core filter methods with
the `InboundMessage` signature the runtime expects:
- Preprocessor signature: `(InboundMessage) -> InboundMessage`
- Postprocessor signature: `(str, InboundMessage) -> str`
When the preprocessor blocks a message, it mutates the `InboundMessage` to signal a block
by returning a sentinel message (or raises a handled exception that the runtime catches and
converts to a direct reply). **Decision: use sentinel pattern** — set a special field on the
message rather than raising, to keep the runtime's error handling clean.
> **Alternative considered**: Returning `None` from the preprocessor to signal "send the canned
> reply and skip the agent". This would require a runtime change to handle `None`. The sentinel
> approach avoids that. The runtime already supports early-exit via postprocessors returning
> a replacement string — we can use a similar mechanism.
**Simpler approach (chosen):** The preprocessor returns a modified `InboundMessage` with its
`text` replaced by a special internal sentinel. A postprocessor immediately before delivery
detects the sentinel and replaces it with the safe reply text. The audit log entry is written
by the preprocessor at block time.
### `config/adapters.local.yaml`
Add the `child_safety` config block (parent configures manually):
```yaml
child_safety:
restricted_users:
- son
audit_retention_days: 365
```
---
## Filtering Logic Design
### Input Filter — Intent Pattern Matching
Patterns are compiled regex, evaluated in order. First match wins.
**Step 1 — Hard block (always, regardless of context)**
These patterns are checked first with no context exemption:
| Pattern intent | Example match |
|---|---|
| Sexual/explicit content requests | "send me explicit", "show me porn" |
| Real-world harm with named victim | "how do I hurt my [classmate/sister/teacher]" |
| Requests for personal identifying info | "what is [person]'s address/phone/school" |
| Self-harm | "how do I hurt myself", "how do I kill myself" |
**Step 2 — Check for game dev context signals**
If any of the following terms appear in the message, skip Step 3 entirely:
```python
GAME_CONTEXT_SIGNALS = [
r"\bin (my |the |a )?(game|roblox|studio|script|map|level|world)\b",
r"\b(lua|roblox|studio|npc|hitbox|raycast|humanoid|workspace|basepart)\b",
r"\b(code|script|function|method|module|class|variable|loop|event)\b",
r"\b(damage|health|respawn|spawn|kill|destroy)\b.*\b(player|npc|enemy|mob|character)\b",
r"\bhow (do i|to|can i) (make|get|set|add|create|implement|build)\b",
]
```
**Step 3 — Conditional block (only if no game dev context)**
Only evaluated if Step 2 found no context signals:
| Pattern intent | Example match |
|---|---|
| Real-world weapon harm | "how do I use a knife to hurt/cut/stab" |
| Real-world weapon construction | "how do I build/make a real gun/bomb" |
| Violence against real targets | "how do I fight/attack/beat up [person]" |
| Explicit drug content | "how do I get/use/buy [drug]" |
**Step 4 — Pass through**
No patterns matched → message is allowed.
### Output Filter — Response Scan
Lighter touch. Scans the LLM response for:
- Explicit sexual language (small set of explicit terms only)
- Actual step-by-step real-world harm instructions (e.g., numbered steps to build a weapon)
- Profanity above a threshold (configurable word list)
If flagged → replace entire response with:
> "I ran into a bit of a snag answering that one. Try asking me a different way, or ask about
> something else — I'm great at Lua scripting and Roblox game design!"
---
## Guardrail Block Content Design
Injected at the end of the system prompt for all restricted users. Contains two sections:
safety rules and teaching approach.
```
=== CHILD SAFE MODE ===
You are talking to Gabriel, a 13-year-old who is learning game development and Lua scripting.
Your role is educator and mentor — not answer key.
--- CONTENT RULES ---
ALWAYS ENCOURAGED:
- Lua scripting, Roblox Studio mechanics, game physics
- Horror game design: atmosphere, enemy AI, jump scares, damage systems
- Weapon mechanics IN GAMES: hitboxes, shooting mechanics, damage values, animations
- General coding concepts, algorithms, creative writing, school subjects
NEVER ALLOWED — refuse politely, no explanation of why:
- Real-world instructions for harming people or animals
- How to build, obtain, or use actual weapons
- Sexual or romantic content of any kind
- Explicit language or profanity
- Sharing or asking for real personal information
GRAY AREA RULE: If a question mentions weapons, violence, or dangerous topics AND there is any
reasonable game/educational interpretation — assume game context and help enthusiastically.
Only refuse if the request is unambiguously real-world harm with no plausible game framing.
--- TEACHING APPROACH ---
Your goal is to build Gabriel's skills and confidence over time, not to hand him answers.
Use this approach every time:
1. ASSESS FIRST (for non-trivial questions): Before diving in, ask what he's already tried
or what he thinks might work. Skip this for simple factual lookups ("what does pairs() do?").
2. BREAK IT DOWN: Split the problem into smaller steps. Guide through one step at a time.
"Let's start with just getting the bullet to appear — we'll worry about damage after."
3. CODE + EXPLANATION always together: When you show code, explain what each meaningful
part does in plain language immediately after. Never a bare code block with no context.
Ask "does that make sense?" or "what do you think this line is doing?" after showing it.
4. LEAVE SOMETHING FOR HIM: After giving an example, leave one small piece for Gabriel to
write himself. "I've done the shooting part — can you add the check for ammo count?"
5. GUIDE THE DEBUG, DON'T SOLVE IT: When he shares broken code, point him toward the
area with the issue rather than fixing it directly.
"Look at what your variable is on the third loop — what's it equal to at that point?"
6. CELEBRATE THE ATTEMPT: Always acknowledge what's working before addressing what isn't.
"The loop structure is solid — that's the tricky bit. Just one small fix needed here."
7. CONNECT TO PAST WORK: When a new concept resembles something covered before, say so.
"This is the same idea as the enemy spawner loop — same structure, different purpose."
8. DIRECT ANSWERS are fine for: simple factual questions, API lookups, syntax checks,
"what does X do?" questions. Only apply the full teaching approach for problem-solving.
9. AI LITERACY — teach him to use you well (weave in naturally, never lecture):
- When he asks something vague, model good question structure before answering:
"Just checking — you want the damage to apply on touch, or only when the enemy attacks?"
- When context runs out, explain it plainly:
"I can only hold so much conversation in memory. Next session, remind me what you're
building and I'll be right back up to speed."
- Teach the ideal coding question format when the moment comes up naturally:
"Next time: what your code does now + what you want + what you've tried = fastest answer."
- Flag your assumptions so he learns to spot ambiguity:
"I'm assuming this resets on respawn — let me know if that's not what you meant."
RESPONSE LENGTH: Keep responses focused. Step-by-step means one step at a time — don't
front-load everything. Short, clear, then wait for his response before continuing.
TONE: Enthusiastic, encouraging, patient. Short sentences. No jargon without explanation.
Talk to him like a smart friend who happens to know a lot about game dev, not like a textbook.
=== END CHILD SAFE MODE ===
```
---
## Token Optimization Design
### Problem
Gabriel shares the same API token pool as Jordan. Every Gabriel turn currently injects:
- `SOUL.md` — Garvis homelab persona (~935 tokens, ~3,740 bytes) — irrelevant
- `context.md` — SSH hosts, Proxmox inventory (~227 tokens, ~909 bytes) — irrelevant
- Hybrid memory search (5 chunks) — Jordan's homelab memories — irrelevant
- 20-message history window — same cap as an admin session
Estimated dead weight: **~1,5001,800 tokens per turn** before Gabriel types a word.
### Solution: Restricted-User System Prompt Builder
In `_build_system_prompt()`, add a branch for restricted users that replaces the standard
assembly with a stripped-down version:
```python
if child_safety_config and child_safety_config.is_restricted(username):
return _build_child_system_prompt(username, user_profile, guardrail_block)
```
**`_build_child_system_prompt()`** assembles only:
1. `CHILD_TUTOR_IDENTITY` — a ~100-token constant replacing SOUL.md (see below)
2. `user_profile` — gabriel.md (relevant, kept)
3. `CHILD_GUARDRAIL_BLOCK` — safety + teaching rules (relevant, kept)
4. Tool capability line — minimal version, omit delegation instructions
What is **explicitly skipped**:
- `get_soul()` — SOUL.md not read at all
- `get_context()` — context.md not read at all
- `search_hybrid()` — memory search not called
- Delegation/sub-agent instructions block
### `CHILD_TUTOR_IDENTITY` Constant (~100 tokens)
Replaces the full SOUL.md for Gabriel's sessions:
```
You are a coding mentor and game development tutor. You help Gabriel — a 13-year-old building
Roblox games in Lua — learn to code and think like a developer. You are not a general-purpose
assistant; for this session, your entire focus is helping Gabriel build skills and create games.
```
### History Window Reduction
`_get_context_messages()` currently uses the module-level `MAX_CONTEXT_MESSAGES = 20`.
For restricted users, pass a smaller cap:
```python
CHILD_MAX_CONTEXT_MESSAGES = 10 # module-level constant in agent.py
```
In `_chat_inner()`, the call becomes:
```python
cap = CHILD_MAX_CONTEXT_MESSAGES if is_child else MAX_CONTEXT_MESSAGES
context_messages = self._get_context_messages(cap)
```
The username is available in `_chat_inner()` (passed from `chat()`), so `is_child` can be
derived from `self._child_safety_config.is_restricted(username)`.
### Per-Session Cost Visibility (Future)
Not in scope for initial build, but the audit log already captures enough data to compute
per-session token estimates if token counts are added later.
---
## Audit Log Schema
File: `memory_workspace/audit/{username}/YYYY-MM-DD.jsonl`
```jsonc
{
"timestamp": "2026-04-21T14:32:01.123+00:00", // ISO 8601 with timezone
"username": "gabriel",
"platform": "telegram",
"action": "allowed", // "allowed" | "blocked" | "flagged"
"filter_stage": null, // null | "preprocessor" | "postprocessor"
"filter_reason": null, // null | string describing which pattern matched
"message": "how do I make the laser shoot in my roblox game", // full text
"response": "Great question! Here's how to..." // full text, null if blocked pre-LLM
}
```
---
## Data Flow for a Blocked Message
```
1. Gabriel sends: "how do I stab someone"
2. Preprocessor: no game context signals found → Step 3 matches "violence against real target"
3. Action: BLOCK
4. AuditLogger.log(action="blocked", reason="real_world_violence", response=None)
5. Message text replaced with internal sentinel "__BLOCKED__: I can't help with that topic..."
6. Agent.chat() never called
7. Postprocessor detects sentinel → returns the canned reply text
8. Reply delivered to son: "That's not something I can help with! Want to work on your
Roblox game instead? I'm great at scripting and game mechanics."
```
---
## Data Flow for a Passing Message
```
1. Gabriel sends: "how do I make a knife swing animation in Roblox"
2. Preprocessor: "roblox" matches GAME_CONTEXT_SIGNALS → skip Step 3, pass through
3. Agent.chat() called with full guardrail block in system prompt
4. LLM responds with Lua animation code
5. Postprocessor: scans response → clean
6. AuditLogger.log(action="allowed", response=<full response text>)
7. Response delivered
```
---
## Cross-Session Continuity Design (REQ-12 + REQ-13)
### `gabriel_context.md` — Structure
Single file at `memory_workspace/users/gabriel_context.md`. Replaces memory search for Gabriel.
Written by the bot after each session. Overwritten, not appended (always current state).
```markdown
## Active Project
Name: Haunted Mansion (Roblox horror game)
Description: Top-down horror game with a chasing enemy, jump scares, and atmospheric lighting.
## Last Session (2026-04-21)
- Implemented basic enemy chase using Humanoid:MoveTo()
- Debugged an issue where the enemy ignored walls (fixed with pathfinding)
- Introduced: pathfinding service, Humanoid, MoveTo()
## Open Threads
- Player hasn't been told how to add sound effects yet
- Wants to add a second enemy type next session
## Skills Introduced
- for loops — iterating over tables (2026-04-21)
- functions — defining, calling, parameters vs arguments (2026-04-21)
- Humanoid — controlling character movement (2026-04-21)
- PathfindingService — navigation around obstacles (2026-04-21)
```
### How It Gets Updated
At the end of each Gabriel session, the agent appends a self-update instruction to the
system prompt (or the guardrail block triggers it):
> "At the end of this conversation, update `memory_workspace/users/gabriel_context.md`
> with: current project state, what was worked on today, any open threads, and any new
> concepts you introduced. Keep it under 40 lines. Overwrite the file completely."
This mirrors how the main agent writes to `MEMORY.md` after Jordan's sessions. The bot
already has file-write tools available — no new mechanism needed.
### Injection in System Prompt
In `_build_child_system_prompt()`:
```python
gabriel_context = self.memory.read_file("users/gabriel_context.md") # or Path.read_text
parts = [
CHILD_TUTOR_IDENTITY,
f"User Profile:\n{user_profile}",
]
if gabriel_context:
parts.append(f"Project Context & Skills:\n{gabriel_context}")
parts.append(CHILD_GUARDRAIL_BLOCK)
```
If the file doesn't exist (first session), it's simply omitted — no error.
---
## First-Run Onboarding Design (REQ-14)
### Detection
First-run is detected in `_build_child_system_prompt()` or the preprocessor by checking:
```python
context_path = workspace_dir / "users" / "gabriel_context.md"
is_first_run = not context_path.exists()
```
### Delivery
The welcome is injected as a **system-level instruction** in the guardrail block that fires
only on first run. The LLM is instructed to send the welcome as its opening message before
addressing the user's question:
```
FIRST SESSION: This is Gabriel's very first message. Before answering his question,
send a short, friendly welcome. Cover:
- What you can help him with (Lua, Roblox, game design, coding)
- That you'll guide him and ask questions rather than just give answers
- That you'll remember his project between sessions
- Ask what he's working on (or answer his question if he's already told you)
Keep it to 45 sentences. Warm, not formal.
```
This block is only added when `is_first_run` is True — subsequent sessions omit it entirely.
### Example Welcome
> Hey Gabriel! I'm here to help you build your Roblox games and level up your Lua skills.
> I work a bit differently to a search engine — instead of just handing you the answer, I'll
> walk you through things so you actually learn how it works. I'll also remember what you're
> building between chats, so you won't need to explain your project every time.
> What are you working on?
---
## Slack Allow-List Design (REQ-15)
### Current Gap
`adapters/slack/adapter.py``handle_message_events()` processes every incoming message
with no user check. The Telegram adapter has `_is_user_allowed()` at line 441; Slack has
no equivalent.
### Fix
Add `_is_user_allowed()` to `SlackAdapter`, called at the top of `handle_message_events()`:
```python
def _is_user_allowed(self, user_id: str) -> bool:
allowed = self.config.settings.get("allowed_users", [])
if not allowed:
return True # open if no list configured
return user_id in [str(u) for u in allowed]
```
In `handle_message_events()`:
```python
user_id = event.get("user")
if not self._is_user_allowed(user_id):
return # silently drop — no response
```
Config in `adapters.local.yaml`:
```yaml
slack:
allowed_users:
- U01234JORDAN # Jordan's Slack user ID
- U09876GABRIEL # Gabriel's Slack user ID
```
Slack user IDs are found in Slack → Profile → More → Copy member ID.
---
## File Tree After Implementation
```
ajarbot/
├── child_safety.py ← NEW
├── agent.py ← MODIFIED (_build_system_prompt, _chat_inner)
├── adapters/
│ ├── runtime.py ← MODIFIED (register pre/postprocessors)
│ └── slack/
│ └── adapter.py ← MODIFIED (add allow-list check)
├── config/
│ └── adapters.local.yaml ← MODIFIED (child_safety block, gabriel mapping,
│ slack allowed_users)
└── memory_workspace/
└── users/
├── gabriel.md ← NEW (user profile)
└── gabriel_context.md ← NEW (created after first session)
└── audit/
└── gabriel/
└── 2026-04-21.jsonl ← NEW (created at runtime)
```
---
## Decisions Log
| Decision | Rationale |
|---|---|
| Intent patterns over keyword lists | Keywords produce unacceptable false positive rate for game dev vocabulary |
| Sentinel pattern for preprocessor blocking | Avoids runtime API changes; fits existing pre/postprocessor contract |
| Separate audit log from RSO log | Keeps RSO memory scoring clean; audit log has different retention and purpose |
| Guardrail block as system prompt injection, not separate API call | No extra LLM call = no added latency or cost |
| Game dev context as an exemption gate, not an allow-list | Easier to maintain; covers novel game dev phrasing automatically |
| Config-driven restricted users | Parent can add/remove without touching Python source |
| gabriel_context.md overwrites rather than appends | Always reflects current state; avoids unbounded growth; keeps token cost predictable |
| First-run via file existence check | No database or state needed; survives restarts; trivially inspectable |
| Slack allow-list fails open (empty list = allow all) | Preserves current behaviour for existing deployments with no config change |
| Platform: Slack over Telegram | Jordan has native workspace admin visibility; channel history is built-in parent monitoring |