ajarbot/.kiro/specs/child-safety-profile/design.md

# Design — Child Safety Profile

## Architecture Overview

The feature is implemented as a self-contained module (`child_safety.py`) that hooks into three
existing extension points in the runtime, plus a new dedicated audit logger. No core agent logic
is restructured — all changes are additive.

```
Slack ──► SlackAdapter ──► AdapterRuntime
                                      │
                               [preprocessors]
                                      │
                              ChildSafetyFilter.preprocess()
                                      │
                               ── BLOCKED? ──► AuditLogger (blocked entry)
                                      │                └──► safe reply to user
                                   PASS
                                      │
                                 Agent.chat()
                                      │
                            _build_system_prompt()
                                      │
                           injects guardrail block
                           (if username in RESTRICTED_USERS)
                                      │
                                  LLM call
                                      │
                               [postprocessors]
                                      │
                             ChildSafetyFilter.postprocess()
                                      │
                              ── FLAGGED? ──► AuditLogger (flagged entry)
                                      │              └──► safe fallback reply
                                   CLEAN
                                      │
                              AuditLogger (allowed entry)
                                      │
                                reply to user
```

---

## New Files

### `child_safety.py`
The main module. Contains three classes:

**`ChildSafetyConfig`**
- Loaded once at startup from `config/adapters.local.yaml` (`child_safety` block)
- Fields: `restricted_users: list[str]`, `audit_retention_days: int`
- Exposes `is_restricted(username: str) -> bool`

**`ChildSafetyFilter`**
- Stateless filter with two public methods: `preprocess()` and `postprocess()`
- Holds compiled regex patterns (compiled once at import, not per-message)
- `preprocess(message: InboundMessage) -> tuple[InboundMessage | None, str | None]`
  - Returns `(message, None)` to pass through
  - Returns `(None, reply_text)` to block with a safe response
- `postprocess(response: str, message: InboundMessage) -> str`
  - Returns response unchanged if clean
  - Returns safe fallback string if flagged

**`ChildAuditLogger`**
- Writes to `memory_workspace/audit/{username}/YYYY-MM-DD.jsonl`
- Non-blocking: uses daemon background threads (same pattern as `InteractionLogger`)
- `log(username, message, action, reason, response)` — single public method
- `cleanup_old_logs(retention_days)` — called at startup

### `memory_workspace/users/gabriel.md`
Per-user profile injected into the system prompt. Contains:
- Age, interests, learning context
- Communication style preferences (patient, encouraging, use examples)
- Does NOT contain guardrail rules (those are in the injected guardrail block)

---

## Modified Files

### `agent.py` — `_build_system_prompt()` (line 488)
Add a conditional block after the existing user profile injection:

```python
if self._child_safety and self._child_safety.config.is_restricted(username):
    system_parts.append(CHILD_GUARDRAIL_BLOCK)
```

`CHILD_GUARDRAIL_BLOCK` is a module-level constant defined in `child_safety.py` and imported.
It is a multi-paragraph instruction block — see Content Design section below.

The Agent is also given a reference to the `ChildSafetyConfig` at `__init__` time so it can
check `is_restricted()` without re-reading config on every turn.

### `adapters/runtime.py` — `AdapterRuntime.__init__()`
After constructing the runtime, register the child safety pre/postprocessors:

```python
from child_safety import ChildSafetyFilter, ChildAuditLogger
_filter = ChildSafetyFilter(config, audit_logger)
self.add_preprocessor(_filter.preprocess_adapter)
self.add_postprocessor(_filter.postprocess_adapter)
```

The `preprocess_adapter` and `postprocess_adapter` methods wrap the core filter methods with
the `InboundMessage` signature the runtime expects:
- Preprocessor signature: `(InboundMessage) -> InboundMessage`
- Postprocessor signature: `(str, InboundMessage) -> str`

When the preprocessor blocks a message, it mutates the `InboundMessage` to signal a block
by returning a sentinel message (or raises a handled exception that the runtime catches and
converts to a direct reply). **Decision: use sentinel pattern** — set a special field on the
message rather than raising, to keep the runtime's error handling clean.

> **Alternative considered**: Returning `None` from the preprocessor to signal "send the canned
> reply and skip the agent". This would require a runtime change to handle `None`. The sentinel
> approach avoids that. The runtime already supports early-exit via postprocessors returning
> a replacement string — we can use a similar mechanism.

**Simpler approach (chosen):** The preprocessor returns a modified `InboundMessage` with its
`text` replaced by a special internal sentinel. A postprocessor immediately before delivery
detects the sentinel and replaces it with the safe reply text. The audit log entry is written
by the preprocessor at block time.

### `config/adapters.local.yaml`
Add the `child_safety` config block (parent configures manually):

```yaml
child_safety:
  restricted_users:
    - son
  audit_retention_days: 365
```

---

## Filtering Logic Design

### Input Filter — Intent Pattern Matching

Patterns are compiled regex, evaluated in order. First match wins.

**Step 1 — Hard block (always, regardless of context)**
These patterns are checked first with no context exemption:

| Pattern intent | Example match |
|---|---|
| Sexual/explicit content requests | "send me explicit", "show me porn" |
| Real-world harm with named victim | "how do I hurt my [classmate/sister/teacher]" |
| Requests for personal identifying info | "what is [person]'s address/phone/school" |
| Self-harm | "how do I hurt myself", "how do I kill myself" |

**Step 2 — Check for game dev context signals**
If any of the following terms appear in the message, skip Step 3 entirely:

```python
GAME_CONTEXT_SIGNALS = [
    r"\bin (my |the |a )?(game|roblox|studio|script|map|level|world)\b",
    r"\b(lua|roblox|studio|npc|hitbox|raycast|humanoid|workspace|basepart)\b",
    r"\b(code|script|function|method|module|class|variable|loop|event)\b",
    r"\b(damage|health|respawn|spawn|kill|destroy)\b.*\b(player|npc|enemy|mob|character)\b",
    r"\bhow (do i|to|can i) (make|get|set|add|create|implement|build)\b",
]
```

**Step 3 — Conditional block (only if no game dev context)**
Only evaluated if Step 2 found no context signals:

| Pattern intent | Example match |
|---|---|
| Real-world weapon harm | "how do I use a knife to hurt/cut/stab" |
| Real-world weapon construction | "how do I build/make a real gun/bomb" |
| Violence against real targets | "how do I fight/attack/beat up [person]" |
| Explicit drug content | "how do I get/use/buy [drug]" |

**Step 4 — Pass through**
No patterns matched → message is allowed.

### Output Filter — Response Scan

Lighter touch. Scans the LLM response for:
- Explicit sexual language (small set of explicit terms only)
- Actual step-by-step real-world harm instructions (e.g., numbered steps to build a weapon)
- Profanity above a threshold (configurable word list)

If flagged → replace entire response with:
> "I ran into a bit of a snag answering that one. Try asking me a different way, or ask about
> something else — I'm great at Lua scripting and Roblox game design!"

---

## Guardrail Block Content Design

Injected at the end of the system prompt for all restricted users. Contains two sections:
safety rules and teaching approach.

```
=== CHILD SAFE MODE ===
You are talking to Gabriel, a 13-year-old who is learning game development and Lua scripting.
Your role is educator and mentor — not answer key.

--- CONTENT RULES ---

ALWAYS ENCOURAGED:
- Lua scripting, Roblox Studio mechanics, game physics
- Horror game design: atmosphere, enemy AI, jump scares, damage systems
- Weapon mechanics IN GAMES: hitboxes, shooting mechanics, damage values, animations
- General coding concepts, algorithms, creative writing, school subjects

NEVER ALLOWED — refuse politely, no explanation of why:
- Real-world instructions for harming people or animals
- How to build, obtain, or use actual weapons
- Sexual or romantic content of any kind
- Explicit language or profanity
- Sharing or asking for real personal information

GRAY AREA RULE: If a question mentions weapons, violence, or dangerous topics AND there is any
reasonable game/educational interpretation — assume game context and help enthusiastically.
Only refuse if the request is unambiguously real-world harm with no plausible game framing.

--- TEACHING APPROACH ---

Your goal is to build Gabriel's skills and confidence over time, not to hand him answers.
Use this approach every time:

1. ASSESS FIRST (for non-trivial questions): Before diving in, ask what he's already tried
   or what he thinks might work. Skip this for simple factual lookups ("what does pairs() do?").

2. BREAK IT DOWN: Split the problem into smaller steps. Guide through one step at a time.
   "Let's start with just getting the bullet to appear — we'll worry about damage after."

3. CODE + EXPLANATION always together: When you show code, explain what each meaningful
   part does in plain language immediately after. Never a bare code block with no context.
   Ask "does that make sense?" or "what do you think this line is doing?" after showing it.

4. LEAVE SOMETHING FOR HIM: After giving an example, leave one small piece for Gabriel to
   write himself. "I've done the shooting part — can you add the check for ammo count?"

5. GUIDE THE DEBUG, DON'T SOLVE IT: When he shares broken code, point him toward the
   area with the issue rather than fixing it directly.
   "Look at what your variable is on the third loop — what's it equal to at that point?"

6. CELEBRATE THE ATTEMPT: Always acknowledge what's working before addressing what isn't.
   "The loop structure is solid — that's the tricky bit. Just one small fix needed here."

7. CONNECT TO PAST WORK: When a new concept resembles something covered before, say so.
   "This is the same idea as the enemy spawner loop — same structure, different purpose."

8. DIRECT ANSWERS are fine for: simple factual questions, API lookups, syntax checks,
   "what does X do?" questions. Only apply the full teaching approach for problem-solving.

9. AI LITERACY — teach him to use you well (weave in naturally, never lecture):
   - When he asks something vague, model good question structure before answering:
     "Just checking — you want the damage to apply on touch, or only when the enemy attacks?"
   - When context runs out, explain it plainly:
     "I can only hold so much conversation in memory. Next session, remind me what you're
     building and I'll be right back up to speed."
   - Teach the ideal coding question format when the moment comes up naturally:
     "Next time: what your code does now + what you want + what you've tried = fastest answer."
   - Flag your assumptions so he learns to spot ambiguity:
     "I'm assuming this resets on respawn — let me know if that's not what you meant."

RESPONSE LENGTH: Keep responses focused. Step-by-step means one step at a time — don't
front-load everything. Short, clear, then wait for his response before continuing.

TONE: Enthusiastic, encouraging, patient. Short sentences. No jargon without explanation.
Talk to him like a smart friend who happens to know a lot about game dev, not like a textbook.
=== END CHILD SAFE MODE ===
```

---

## Token Optimization Design

### Problem
Gabriel shares the same API token pool as Jordan. Every Gabriel turn currently injects:
- `SOUL.md` — Garvis homelab persona (~935 tokens, ~3,740 bytes) — irrelevant
- `context.md` — SSH hosts, Proxmox inventory (~227 tokens, ~909 bytes) — irrelevant
- Hybrid memory search (5 chunks) — Jordan's homelab memories — irrelevant
- 20-message history window — same cap as an admin session

Estimated dead weight: **~1,500–1,800 tokens per turn** before Gabriel types a word.

### Solution: Restricted-User System Prompt Builder

In `_build_system_prompt()`, add a branch for restricted users that replaces the standard
assembly with a stripped-down version:

```python
if child_safety_config and child_safety_config.is_restricted(username):
    return _build_child_system_prompt(username, user_profile, guardrail_block)
```

**`_build_child_system_prompt()`** assembles only:
1. `CHILD_TUTOR_IDENTITY` — a ~100-token constant replacing SOUL.md (see below)
2. `user_profile` — gabriel.md (relevant, kept)
3. `CHILD_GUARDRAIL_BLOCK` — safety + teaching rules (relevant, kept)
4. Tool capability line — minimal version, omit delegation instructions

What is **explicitly skipped**:
- `get_soul()` — SOUL.md not read at all
- `get_context()` — context.md not read at all
- `search_hybrid()` — memory search not called
- Delegation/sub-agent instructions block

### `CHILD_TUTOR_IDENTITY` Constant (~100 tokens)

Replaces the full SOUL.md for Gabriel's sessions:

```
You are a coding mentor and game development tutor. You help Gabriel — a 13-year-old building
Roblox games in Lua — learn to code and think like a developer. You are not a general-purpose
assistant; for this session, your entire focus is helping Gabriel build skills and create games.
```

### History Window Reduction

`_get_context_messages()` currently uses the module-level `MAX_CONTEXT_MESSAGES = 20`.

For restricted users, pass a smaller cap:

```python
CHILD_MAX_CONTEXT_MESSAGES = 10  # module-level constant in agent.py
```

In `_chat_inner()`, the call becomes:
```python
cap = CHILD_MAX_CONTEXT_MESSAGES if is_child else MAX_CONTEXT_MESSAGES
context_messages = self._get_context_messages(cap)
```

The username is available in `_chat_inner()` (passed from `chat()`), so `is_child` can be
derived from `self._child_safety_config.is_restricted(username)`.

### Per-Session Cost Visibility (Future)
Not in scope for initial build, but the audit log already captures enough data to compute
per-session token estimates if token counts are added later.

---

## Audit Log Schema

File: `memory_workspace/audit/{username}/YYYY-MM-DD.jsonl`

```jsonc
{
  "timestamp": "2026-04-21T14:32:01.123+00:00",  // ISO 8601 with timezone
  "username": "gabriel",
  "platform": "telegram",
  "action": "allowed",          // "allowed" | "blocked" | "flagged"
  "filter_stage": null,         // null | "preprocessor" | "postprocessor"
  "filter_reason": null,        // null | string describing which pattern matched
  "message": "how do I make the laser shoot in my roblox game",  // full text
  "response": "Great question! Here's how to..."  // full text, null if blocked pre-LLM
}
```

---

## Data Flow for a Blocked Message

```
1. Gabriel sends: "how do I stab someone"
2. Preprocessor: no game context signals found → Step 3 matches "violence against real target"
3. Action: BLOCK
4. AuditLogger.log(action="blocked", reason="real_world_violence", response=None)
5. Message text replaced with internal sentinel "__BLOCKED__: I can't help with that topic..."
6. Agent.chat() never called
7. Postprocessor detects sentinel → returns the canned reply text
8. Reply delivered to son: "That's not something I can help with! Want to work on your
   Roblox game instead? I'm great at scripting and game mechanics."
```

---

## Data Flow for a Passing Message

```
1. Gabriel sends: "how do I make a knife swing animation in Roblox"
2. Preprocessor: "roblox" matches GAME_CONTEXT_SIGNALS → skip Step 3, pass through
3. Agent.chat() called with full guardrail block in system prompt
4. LLM responds with Lua animation code
5. Postprocessor: scans response → clean
6. AuditLogger.log(action="allowed", response=<full response text>)
7. Response delivered
```

---

## Cross-Session Continuity Design (REQ-12 + REQ-13)

### `gabriel_context.md` — Structure

Single file at `memory_workspace/users/gabriel_context.md`. Replaces memory search for Gabriel.
Written by the bot after each session. Overwritten, not appended (always current state).

```markdown
## Active Project
Name: Haunted Mansion (Roblox horror game)
Description: Top-down horror game with a chasing enemy, jump scares, and atmospheric lighting.

## Last Session (2026-04-21)
- Implemented basic enemy chase using Humanoid:MoveTo()
- Debugged an issue where the enemy ignored walls (fixed with pathfinding)
- Introduced: pathfinding service, Humanoid, MoveTo()

## Open Threads
- Player hasn't been told how to add sound effects yet
- Wants to add a second enemy type next session

## Skills Introduced
- for loops — iterating over tables (2026-04-21)
- functions — defining, calling, parameters vs arguments (2026-04-21)
- Humanoid — controlling character movement (2026-04-21)
- PathfindingService — navigation around obstacles (2026-04-21)
```

### How It Gets Updated

At the end of each Gabriel session, the agent appends a self-update instruction to the
system prompt (or the guardrail block triggers it):

> "At the end of this conversation, update `memory_workspace/users/gabriel_context.md`
> with: current project state, what was worked on today, any open threads, and any new
> concepts you introduced. Keep it under 40 lines. Overwrite the file completely."

This mirrors how the main agent writes to `MEMORY.md` after Jordan's sessions. The bot
already has file-write tools available — no new mechanism needed.

### Injection in System Prompt

In `_build_child_system_prompt()`:

```python
gabriel_context = self.memory.read_file("users/gabriel_context.md")  # or Path.read_text
parts = [
    CHILD_TUTOR_IDENTITY,
    f"User Profile:\n{user_profile}",
]
if gabriel_context:
    parts.append(f"Project Context & Skills:\n{gabriel_context}")
parts.append(CHILD_GUARDRAIL_BLOCK)
```

If the file doesn't exist (first session), it's simply omitted — no error.

---

## First-Run Onboarding Design (REQ-14)

### Detection

First-run is detected in `_build_child_system_prompt()` or the preprocessor by checking:

```python
context_path = workspace_dir / "users" / "gabriel_context.md"
is_first_run = not context_path.exists()
```

### Delivery

The welcome is injected as a **system-level instruction** in the guardrail block that fires
only on first run. The LLM is instructed to send the welcome as its opening message before
addressing the user's question:

```
FIRST SESSION: This is Gabriel's very first message. Before answering his question,
send a short, friendly welcome. Cover:
- What you can help him with (Lua, Roblox, game design, coding)
- That you'll guide him and ask questions rather than just give answers
- That you'll remember his project between sessions
- Ask what he's working on (or answer his question if he's already told you)
Keep it to 4–5 sentences. Warm, not formal.
```

This block is only added when `is_first_run` is True — subsequent sessions omit it entirely.

### Example Welcome

> Hey Gabriel! I'm here to help you build your Roblox games and level up your Lua skills.
> I work a bit differently to a search engine — instead of just handing you the answer, I'll
> walk you through things so you actually learn how it works. I'll also remember what you're
> building between chats, so you won't need to explain your project every time.
> What are you working on?

---

## Slack Allow-List Design (REQ-15)

### Current Gap

`adapters/slack/adapter.py` — `handle_message_events()` processes every incoming message
with no user check. The Telegram adapter has `_is_user_allowed()` at line 441; Slack has
no equivalent.

### Fix

Add `_is_user_allowed()` to `SlackAdapter`, called at the top of `handle_message_events()`:

```python
def _is_user_allowed(self, user_id: str) -> bool:
    allowed = self.config.settings.get("allowed_users", [])
    if not allowed:
        return True  # open if no list configured
    return user_id in [str(u) for u in allowed]
```

In `handle_message_events()`:
```python
user_id = event.get("user")
if not self._is_user_allowed(user_id):
    return  # silently drop — no response
```

Config in `adapters.local.yaml`:
```yaml
slack:
  allowed_users:
    - U01234JORDAN   # Jordan's Slack user ID
    - U09876GABRIEL  # Gabriel's Slack user ID
```

Slack user IDs are found in Slack → Profile → More → Copy member ID.

---

## File Tree After Implementation

```
ajarbot/
├── child_safety.py                          ← NEW
├── agent.py                                 ← MODIFIED (_build_system_prompt, _chat_inner)
├── adapters/
│   ├── runtime.py                           ← MODIFIED (register pre/postprocessors)
│   └── slack/
│       └── adapter.py                       ← MODIFIED (add allow-list check)
├── config/
│   └── adapters.local.yaml                  ← MODIFIED (child_safety block, gabriel mapping,
│                                                         slack allowed_users)
└── memory_workspace/
    └── users/
        ├── gabriel.md                       ← NEW (user profile)
        └── gabriel_context.md              ← NEW (created after first session)
    └── audit/
        └── gabriel/
            └── 2026-04-21.jsonl             ← NEW (created at runtime)
```

---

## Decisions Log

| Decision | Rationale |
|---|---|
| Intent patterns over keyword lists | Keywords produce unacceptable false positive rate for game dev vocabulary |
| Sentinel pattern for preprocessor blocking | Avoids runtime API changes; fits existing pre/postprocessor contract |
| Separate audit log from RSO log | Keeps RSO memory scoring clean; audit log has different retention and purpose |
| Guardrail block as system prompt injection, not separate API call | No extra LLM call = no added latency or cost |
| Game dev context as an exemption gate, not an allow-list | Easier to maintain; covers novel game dev phrasing automatically |
| Config-driven restricted users | Parent can add/remove without touching Python source |
| gabriel_context.md overwrites rather than appends | Always reflects current state; avoids unbounded growth; keeps token cost predictable |
| First-run via file existence check | No database or state needed; survives restarts; trivially inspectable |
| Slack allow-list fails open (empty list = allow all) | Preserves current behaviour for existing deployments with no config change |
| Platform: Slack over Telegram | Jordan has native workspace admin visibility; channel history is built-in parent monitoring |