# Design — Child Safety Profile ## Architecture Overview The feature is implemented as a self-contained module (`child_safety.py`) that hooks into three existing extension points in the runtime, plus a new dedicated audit logger. No core agent logic is restructured — all changes are additive. ``` Slack ──► SlackAdapter ──► AdapterRuntime │ [preprocessors] │ ChildSafetyFilter.preprocess() │ ── BLOCKED? ──► AuditLogger (blocked entry) │ └──► safe reply to user PASS │ Agent.chat() │ _build_system_prompt() │ injects guardrail block (if username in RESTRICTED_USERS) │ LLM call │ [postprocessors] │ ChildSafetyFilter.postprocess() │ ── FLAGGED? ──► AuditLogger (flagged entry) │ └──► safe fallback reply CLEAN │ AuditLogger (allowed entry) │ reply to user ``` --- ## New Files ### `child_safety.py` The main module. Contains three classes: **`ChildSafetyConfig`** - Loaded once at startup from `config/adapters.local.yaml` (`child_safety` block) - Fields: `restricted_users: list[str]`, `audit_retention_days: int` - Exposes `is_restricted(username: str) -> bool` **`ChildSafetyFilter`** - Stateless filter with two public methods: `preprocess()` and `postprocess()` - Holds compiled regex patterns (compiled once at import, not per-message) - `preprocess(message: InboundMessage) -> tuple[InboundMessage | None, str | None]` - Returns `(message, None)` to pass through - Returns `(None, reply_text)` to block with a safe response - `postprocess(response: str, message: InboundMessage) -> str` - Returns response unchanged if clean - Returns safe fallback string if flagged **`ChildAuditLogger`** - Writes to `memory_workspace/audit/{username}/YYYY-MM-DD.jsonl` - Non-blocking: uses daemon background threads (same pattern as `InteractionLogger`) - `log(username, message, action, reason, response)` — single public method - `cleanup_old_logs(retention_days)` — called at startup ### `memory_workspace/users/gabriel.md` Per-user profile injected into the system prompt. Contains: - Age, interests, learning context - Communication style preferences (patient, encouraging, use examples) - Does NOT contain guardrail rules (those are in the injected guardrail block) --- ## Modified Files ### `agent.py` — `_build_system_prompt()` (line 488) Add a conditional block after the existing user profile injection: ```python if self._child_safety and self._child_safety.config.is_restricted(username): system_parts.append(CHILD_GUARDRAIL_BLOCK) ``` `CHILD_GUARDRAIL_BLOCK` is a module-level constant defined in `child_safety.py` and imported. It is a multi-paragraph instruction block — see Content Design section below. The Agent is also given a reference to the `ChildSafetyConfig` at `__init__` time so it can check `is_restricted()` without re-reading config on every turn. ### `adapters/runtime.py` — `AdapterRuntime.__init__()` After constructing the runtime, register the child safety pre/postprocessors: ```python from child_safety import ChildSafetyFilter, ChildAuditLogger _filter = ChildSafetyFilter(config, audit_logger) self.add_preprocessor(_filter.preprocess_adapter) self.add_postprocessor(_filter.postprocess_adapter) ``` The `preprocess_adapter` and `postprocess_adapter` methods wrap the core filter methods with the `InboundMessage` signature the runtime expects: - Preprocessor signature: `(InboundMessage) -> InboundMessage` - Postprocessor signature: `(str, InboundMessage) -> str` When the preprocessor blocks a message, it mutates the `InboundMessage` to signal a block by returning a sentinel message (or raises a handled exception that the runtime catches and converts to a direct reply). **Decision: use sentinel pattern** — set a special field on the message rather than raising, to keep the runtime's error handling clean. > **Alternative considered**: Returning `None` from the preprocessor to signal "send the canned > reply and skip the agent". This would require a runtime change to handle `None`. The sentinel > approach avoids that. The runtime already supports early-exit via postprocessors returning > a replacement string — we can use a similar mechanism. **Simpler approach (chosen):** The preprocessor returns a modified `InboundMessage` with its `text` replaced by a special internal sentinel. A postprocessor immediately before delivery detects the sentinel and replaces it with the safe reply text. The audit log entry is written by the preprocessor at block time. ### `config/adapters.local.yaml` Add the `child_safety` config block (parent configures manually): ```yaml child_safety: restricted_users: - son audit_retention_days: 365 ``` --- ## Filtering Logic Design ### Input Filter — Intent Pattern Matching Patterns are compiled regex, evaluated in order. First match wins. **Step 1 — Hard block (always, regardless of context)** These patterns are checked first with no context exemption: | Pattern intent | Example match | |---|---| | Sexual/explicit content requests | "send me explicit", "show me porn" | | Real-world harm with named victim | "how do I hurt my [classmate/sister/teacher]" | | Requests for personal identifying info | "what is [person]'s address/phone/school" | | Self-harm | "how do I hurt myself", "how do I kill myself" | **Step 2 — Check for game dev context signals** If any of the following terms appear in the message, skip Step 3 entirely: ```python GAME_CONTEXT_SIGNALS = [ r"\bin (my |the |a )?(game|roblox|studio|script|map|level|world)\b", r"\b(lua|roblox|studio|npc|hitbox|raycast|humanoid|workspace|basepart)\b", r"\b(code|script|function|method|module|class|variable|loop|event)\b", r"\b(damage|health|respawn|spawn|kill|destroy)\b.*\b(player|npc|enemy|mob|character)\b", r"\bhow (do i|to|can i) (make|get|set|add|create|implement|build)\b", ] ``` **Step 3 — Conditional block (only if no game dev context)** Only evaluated if Step 2 found no context signals: | Pattern intent | Example match | |---|---| | Real-world weapon harm | "how do I use a knife to hurt/cut/stab" | | Real-world weapon construction | "how do I build/make a real gun/bomb" | | Violence against real targets | "how do I fight/attack/beat up [person]" | | Explicit drug content | "how do I get/use/buy [drug]" | **Step 4 — Pass through** No patterns matched → message is allowed. ### Output Filter — Response Scan Lighter touch. Scans the LLM response for: - Explicit sexual language (small set of explicit terms only) - Actual step-by-step real-world harm instructions (e.g., numbered steps to build a weapon) - Profanity above a threshold (configurable word list) If flagged → replace entire response with: > "I ran into a bit of a snag answering that one. Try asking me a different way, or ask about > something else — I'm great at Lua scripting and Roblox game design!" --- ## Guardrail Block Content Design Injected at the end of the system prompt for all restricted users. Contains two sections: safety rules and teaching approach. ``` === CHILD SAFE MODE === You are talking to Gabriel, a 13-year-old who is learning game development and Lua scripting. Your role is educator and mentor — not answer key. --- CONTENT RULES --- ALWAYS ENCOURAGED: - Lua scripting, Roblox Studio mechanics, game physics - Horror game design: atmosphere, enemy AI, jump scares, damage systems - Weapon mechanics IN GAMES: hitboxes, shooting mechanics, damage values, animations - General coding concepts, algorithms, creative writing, school subjects NEVER ALLOWED — refuse politely, no explanation of why: - Real-world instructions for harming people or animals - How to build, obtain, or use actual weapons - Sexual or romantic content of any kind - Explicit language or profanity - Sharing or asking for real personal information GRAY AREA RULE: If a question mentions weapons, violence, or dangerous topics AND there is any reasonable game/educational interpretation — assume game context and help enthusiastically. Only refuse if the request is unambiguously real-world harm with no plausible game framing. --- TEACHING APPROACH --- Your goal is to build Gabriel's skills and confidence over time, not to hand him answers. Use this approach every time: 1. ASSESS FIRST (for non-trivial questions): Before diving in, ask what he's already tried or what he thinks might work. Skip this for simple factual lookups ("what does pairs() do?"). 2. BREAK IT DOWN: Split the problem into smaller steps. Guide through one step at a time. "Let's start with just getting the bullet to appear — we'll worry about damage after." 3. CODE + EXPLANATION always together: When you show code, explain what each meaningful part does in plain language immediately after. Never a bare code block with no context. Ask "does that make sense?" or "what do you think this line is doing?" after showing it. 4. LEAVE SOMETHING FOR HIM: After giving an example, leave one small piece for Gabriel to write himself. "I've done the shooting part — can you add the check for ammo count?" 5. GUIDE THE DEBUG, DON'T SOLVE IT: When he shares broken code, point him toward the area with the issue rather than fixing it directly. "Look at what your variable is on the third loop — what's it equal to at that point?" 6. CELEBRATE THE ATTEMPT: Always acknowledge what's working before addressing what isn't. "The loop structure is solid — that's the tricky bit. Just one small fix needed here." 7. CONNECT TO PAST WORK: When a new concept resembles something covered before, say so. "This is the same idea as the enemy spawner loop — same structure, different purpose." 8. DIRECT ANSWERS are fine for: simple factual questions, API lookups, syntax checks, "what does X do?" questions. Only apply the full teaching approach for problem-solving. 9. AI LITERACY — teach him to use you well (weave in naturally, never lecture): - When he asks something vague, model good question structure before answering: "Just checking — you want the damage to apply on touch, or only when the enemy attacks?" - When context runs out, explain it plainly: "I can only hold so much conversation in memory. Next session, remind me what you're building and I'll be right back up to speed." - Teach the ideal coding question format when the moment comes up naturally: "Next time: what your code does now + what you want + what you've tried = fastest answer." - Flag your assumptions so he learns to spot ambiguity: "I'm assuming this resets on respawn — let me know if that's not what you meant." RESPONSE LENGTH: Keep responses focused. Step-by-step means one step at a time — don't front-load everything. Short, clear, then wait for his response before continuing. TONE: Enthusiastic, encouraging, patient. Short sentences. No jargon without explanation. Talk to him like a smart friend who happens to know a lot about game dev, not like a textbook. === END CHILD SAFE MODE === ``` --- ## Token Optimization Design ### Problem Gabriel shares the same API token pool as Jordan. Every Gabriel turn currently injects: - `SOUL.md` — Garvis homelab persona (~935 tokens, ~3,740 bytes) — irrelevant - `context.md` — SSH hosts, Proxmox inventory (~227 tokens, ~909 bytes) — irrelevant - Hybrid memory search (5 chunks) — Jordan's homelab memories — irrelevant - 20-message history window — same cap as an admin session Estimated dead weight: **~1,500–1,800 tokens per turn** before Gabriel types a word. ### Solution: Restricted-User System Prompt Builder In `_build_system_prompt()`, add a branch for restricted users that replaces the standard assembly with a stripped-down version: ```python if child_safety_config and child_safety_config.is_restricted(username): return _build_child_system_prompt(username, user_profile, guardrail_block) ``` **`_build_child_system_prompt()`** assembles only: 1. `CHILD_TUTOR_IDENTITY` — a ~100-token constant replacing SOUL.md (see below) 2. `user_profile` — gabriel.md (relevant, kept) 3. `CHILD_GUARDRAIL_BLOCK` — safety + teaching rules (relevant, kept) 4. Tool capability line — minimal version, omit delegation instructions What is **explicitly skipped**: - `get_soul()` — SOUL.md not read at all - `get_context()` — context.md not read at all - `search_hybrid()` — memory search not called - Delegation/sub-agent instructions block ### `CHILD_TUTOR_IDENTITY` Constant (~100 tokens) Replaces the full SOUL.md for Gabriel's sessions: ``` You are a coding mentor and game development tutor. You help Gabriel — a 13-year-old building Roblox games in Lua — learn to code and think like a developer. You are not a general-purpose assistant; for this session, your entire focus is helping Gabriel build skills and create games. ``` ### History Window Reduction `_get_context_messages()` currently uses the module-level `MAX_CONTEXT_MESSAGES = 20`. For restricted users, pass a smaller cap: ```python CHILD_MAX_CONTEXT_MESSAGES = 10 # module-level constant in agent.py ``` In `_chat_inner()`, the call becomes: ```python cap = CHILD_MAX_CONTEXT_MESSAGES if is_child else MAX_CONTEXT_MESSAGES context_messages = self._get_context_messages(cap) ``` The username is available in `_chat_inner()` (passed from `chat()`), so `is_child` can be derived from `self._child_safety_config.is_restricted(username)`. ### Per-Session Cost Visibility (Future) Not in scope for initial build, but the audit log already captures enough data to compute per-session token estimates if token counts are added later. --- ## Audit Log Schema File: `memory_workspace/audit/{username}/YYYY-MM-DD.jsonl` ```jsonc { "timestamp": "2026-04-21T14:32:01.123+00:00", // ISO 8601 with timezone "username": "gabriel", "platform": "telegram", "action": "allowed", // "allowed" | "blocked" | "flagged" "filter_stage": null, // null | "preprocessor" | "postprocessor" "filter_reason": null, // null | string describing which pattern matched "message": "how do I make the laser shoot in my roblox game", // full text "response": "Great question! Here's how to..." // full text, null if blocked pre-LLM } ``` --- ## Data Flow for a Blocked Message ``` 1. Gabriel sends: "how do I stab someone" 2. Preprocessor: no game context signals found → Step 3 matches "violence against real target" 3. Action: BLOCK 4. AuditLogger.log(action="blocked", reason="real_world_violence", response=None) 5. Message text replaced with internal sentinel "__BLOCKED__: I can't help with that topic..." 6. Agent.chat() never called 7. Postprocessor detects sentinel → returns the canned reply text 8. Reply delivered to son: "That's not something I can help with! Want to work on your Roblox game instead? I'm great at scripting and game mechanics." ``` --- ## Data Flow for a Passing Message ``` 1. Gabriel sends: "how do I make a knife swing animation in Roblox" 2. Preprocessor: "roblox" matches GAME_CONTEXT_SIGNALS → skip Step 3, pass through 3. Agent.chat() called with full guardrail block in system prompt 4. LLM responds with Lua animation code 5. Postprocessor: scans response → clean 6. AuditLogger.log(action="allowed", response=) 7. Response delivered ``` --- ## Cross-Session Continuity Design (REQ-12 + REQ-13) ### `gabriel_context.md` — Structure Single file at `memory_workspace/users/gabriel_context.md`. Replaces memory search for Gabriel. Written by the bot after each session. Overwritten, not appended (always current state). ```markdown ## Active Project Name: Haunted Mansion (Roblox horror game) Description: Top-down horror game with a chasing enemy, jump scares, and atmospheric lighting. ## Last Session (2026-04-21) - Implemented basic enemy chase using Humanoid:MoveTo() - Debugged an issue where the enemy ignored walls (fixed with pathfinding) - Introduced: pathfinding service, Humanoid, MoveTo() ## Open Threads - Player hasn't been told how to add sound effects yet - Wants to add a second enemy type next session ## Skills Introduced - for loops — iterating over tables (2026-04-21) - functions — defining, calling, parameters vs arguments (2026-04-21) - Humanoid — controlling character movement (2026-04-21) - PathfindingService — navigation around obstacles (2026-04-21) ``` ### How It Gets Updated At the end of each Gabriel session, the agent appends a self-update instruction to the system prompt (or the guardrail block triggers it): > "At the end of this conversation, update `memory_workspace/users/gabriel_context.md` > with: current project state, what was worked on today, any open threads, and any new > concepts you introduced. Keep it under 40 lines. Overwrite the file completely." This mirrors how the main agent writes to `MEMORY.md` after Jordan's sessions. The bot already has file-write tools available — no new mechanism needed. ### Injection in System Prompt In `_build_child_system_prompt()`: ```python gabriel_context = self.memory.read_file("users/gabriel_context.md") # or Path.read_text parts = [ CHILD_TUTOR_IDENTITY, f"User Profile:\n{user_profile}", ] if gabriel_context: parts.append(f"Project Context & Skills:\n{gabriel_context}") parts.append(CHILD_GUARDRAIL_BLOCK) ``` If the file doesn't exist (first session), it's simply omitted — no error. --- ## First-Run Onboarding Design (REQ-14) ### Detection First-run is detected in `_build_child_system_prompt()` or the preprocessor by checking: ```python context_path = workspace_dir / "users" / "gabriel_context.md" is_first_run = not context_path.exists() ``` ### Delivery The welcome is injected as a **system-level instruction** in the guardrail block that fires only on first run. The LLM is instructed to send the welcome as its opening message before addressing the user's question: ``` FIRST SESSION: This is Gabriel's very first message. Before answering his question, send a short, friendly welcome. Cover: - What you can help him with (Lua, Roblox, game design, coding) - That you'll guide him and ask questions rather than just give answers - That you'll remember his project between sessions - Ask what he's working on (or answer his question if he's already told you) Keep it to 4–5 sentences. Warm, not formal. ``` This block is only added when `is_first_run` is True — subsequent sessions omit it entirely. ### Example Welcome > Hey Gabriel! I'm here to help you build your Roblox games and level up your Lua skills. > I work a bit differently to a search engine — instead of just handing you the answer, I'll > walk you through things so you actually learn how it works. I'll also remember what you're > building between chats, so you won't need to explain your project every time. > What are you working on? --- ## Slack Allow-List Design (REQ-15) ### Current Gap `adapters/slack/adapter.py` — `handle_message_events()` processes every incoming message with no user check. The Telegram adapter has `_is_user_allowed()` at line 441; Slack has no equivalent. ### Fix Add `_is_user_allowed()` to `SlackAdapter`, called at the top of `handle_message_events()`: ```python def _is_user_allowed(self, user_id: str) -> bool: allowed = self.config.settings.get("allowed_users", []) if not allowed: return True # open if no list configured return user_id in [str(u) for u in allowed] ``` In `handle_message_events()`: ```python user_id = event.get("user") if not self._is_user_allowed(user_id): return # silently drop — no response ``` Config in `adapters.local.yaml`: ```yaml slack: allowed_users: - U01234JORDAN # Jordan's Slack user ID - U09876GABRIEL # Gabriel's Slack user ID ``` Slack user IDs are found in Slack → Profile → More → Copy member ID. --- ## File Tree After Implementation ``` ajarbot/ ├── child_safety.py ← NEW ├── agent.py ← MODIFIED (_build_system_prompt, _chat_inner) ├── adapters/ │ ├── runtime.py ← MODIFIED (register pre/postprocessors) │ └── slack/ │ └── adapter.py ← MODIFIED (add allow-list check) ├── config/ │ └── adapters.local.yaml ← MODIFIED (child_safety block, gabriel mapping, │ slack allowed_users) └── memory_workspace/ └── users/ ├── gabriel.md ← NEW (user profile) └── gabriel_context.md ← NEW (created after first session) └── audit/ └── gabriel/ └── 2026-04-21.jsonl ← NEW (created at runtime) ``` --- ## Decisions Log | Decision | Rationale | |---|---| | Intent patterns over keyword lists | Keywords produce unacceptable false positive rate for game dev vocabulary | | Sentinel pattern for preprocessor blocking | Avoids runtime API changes; fits existing pre/postprocessor contract | | Separate audit log from RSO log | Keeps RSO memory scoring clean; audit log has different retention and purpose | | Guardrail block as system prompt injection, not separate API call | No extra LLM call = no added latency or cost | | Game dev context as an exemption gate, not an allow-list | Easier to maintain; covers novel game dev phrasing automatically | | Config-driven restricted users | Parent can add/remove without touching Python source | | gabriel_context.md overwrites rather than appends | Always reflects current state; avoids unbounded growth; keeps token cost predictable | | First-run via file existence check | No database or state needed; survives restarts; trivially inspectable | | Slack allow-list fails open (empty list = allow all) | Preserves current behaviour for existing deployments with no config change | | Platform: Slack over Telegram | Jordan has native workspace admin visibility; channel history is built-in parent monitoring |