Files
ajarbot/memory_workspace/UCS_C240_MIGRATION_PLAN.md
Jordan Ramos 916f86725d feat: RSO observation system, child safety, Discord adapter, Telegram watchdog, email attachments
Core agent improvements:
- RSO (Relevance Scoring & Observation) system: interaction_logger, memory_scorer, signal_detector
- Memory access logging (memory_access_log table) for relevance scoring; high-signal turn detection
- Rich conversation storage for notable turns; compact_conversation truncates long user messages
- Task-type classifier (query/action/analysis/creative) for observation tagging
- Nested sub-agent visibility: deep delegations now register against the main agent's manager

Child safety (Gabriel profile):
- child_safety.py: filtering, audit logging, prompt constants for restricted sessions
- .kiro/specs/child-safety-profile: requirements, design, tasks specs
- GABRIEL_BOT_PROPOSAL.md: initial proposal doc
- Reduced context window (10 msgs) and tutor-mode identity for restricted users

Telegram adapter:
- Polling watchdog: auto-restarts updater if polling drops unexpectedly
- get_me() with exponential-backoff retry on NetworkError at startup
- Correct stop() ordering: signal watchdog before cancelling tasks

Email / Gmail:
- send_email: supports file attachments (attachments list param)
- get_email: surfaces attachment metadata in response

Scheduled tasks / weather:
- Remove OpenWeatherMap API calls from morning-weather task; use wttr.in exclusively
- New scheduled tasks and scheduler state persistence

Discord:
- adapters/discord/__init__.py scaffold
- discord-plugin: MCP plugin for Claude Code Discord integration (server.ts, skills, config)

Infrastructure:
- n8n workflow exports (garvis_webhook, content_pipeline variants)
- memory_workspace: context, homelab-repo-updates, weekly observation summaries, error logs
- UCS C240 migration plan doc
- requirements.txt: new deps
- .claude/settings.json, fix_hooks.py: hook/permission tuning
2026-04-23 07:54:01 -06:00

18 KiB

Proxmox Migration Plan: Dell R620 → Cisco UCS C240 M5

Created: 2026-03-14 Updated: 2026-03-14 Status: Pre-Migration — Backups Running, Awaiting C240 M5 Power-On Strategy: Option C — Wipe R620 Drives → Install in C240 → Restore from PBS


1. Current Environment Summary

Source Server: Dell PowerEdge R620

Component Details
Proxmox VE Latest (verify version on next SSH)
RAID Controller LSI SAS1068E (Fusion MPT SAS) — NOT a Dell PERC
Boot Drive /dev/sda — 146 GB SAS (Seagate ST914603SSUN146G) — Proxmox OS on LVM
Data Pool ZFS "Vault" — 4.36 TB on /dev/sdb (RAID 0 virtual disk — 4x 1.2TB NETAPP drives)
Pool Usage 108 GB used / 4.25 TB free — HEALTHY, 0 errors
Last Scrub Mar 8, 2026 — clean

⚠️ RAID 0 Warning

The "Vault" ZFS pool sits on a RAID 0 stripe (4 drives, no redundancy). If any single drive fails, all data is lost. This is another strong reason to get fresh backups before touching anything.

Physical Drive Inventory — R620 (6 Drives)

Slot Vendor Model Capacity RPM Interface Serial Current Use
0 SEAGATE ST914602SSUN146G 146 GB 10,025 2.5" SAS 2896MNAS Unused (no block device assigned)
1 SEAGATE ST914603SSUN146G 146 GB 10,000 2.5" SAS 00110282EXXH sda — Proxmox boot (LVM)
2 NETAPP X425_SIRMN1T2A10 1.20 TB 10,500 2.5" SAS S3L1GAHC sdb — RAID 0 member → ZFS "Vault"
3 NETAPP X425_SIRMN1T2A10 1.20 TB 10,500 2.5" SAS S3L1TPXN sdb — RAID 0 member → ZFS "Vault"
4 NETAPP X425_SIRMN1T2A10 1.20 TB 10,500 2.5" SAS S3L1YV7T sdb — RAID 0 member → ZFS "Vault"
5 NETAPP X425_SIRMN1T2A10 1.20 TB 10,500 2.5" SAS S3L1TTA2 sdb — RAID 0 member → ZFS "Vault"

Note: NETAPP X425 drives are Seagate-manufactured 1.2TB 10K SAS drives (rebranded for NetApp storage shelves).

Workloads (12 total — 6 running, 6 stopped)

VMID Name Type Status RAM Disk Priority
100 docker-hub VM 🟢 Running 8.2 GB 100 GB HIGH
101 monitoring-docker VM 🟢 Running 8 GB 50 GB HIGH
102 CML VM 🟢 Running 32 GB 200 GB HIGH
105 pfSense-Firewall VM 🟢 Running 2 GB 16 GB CRITICAL
114 haos VM 🟢 Running 4 GB 50 GB HIGH
109 caddy LXC 🟢 Running HIGH
112 twingate-connector LXC 🟢 Running HIGH
104 ubuntu-dev VM Stopped 5 GB 32 GB LOW
106 Ansible-Control VM Stopped 4 GB 32 GB LOW
107 ubuntu-docker VM Stopped 4 GB 50 GB LOW
113 n8n LXC Stopped LOW
117 test-cve-database LXC Stopped LOW

Backup Server

Component Details
PBS Host 192.168.2.151 (container on TrueNAS 192.168.2.150)
Storage PBS-Backups — 292 GB used / 962 GB total
Status Online (restored 2026-03-14 — fixed macvtap collision)
Fresh Backups 🔄 Running as of 2026-03-14

2. Target Server: Cisco UCS C240 M5

Known Specs

Component Details
Chassis Cisco UCS C240 M5 (2U rack)
New Drives 2x 960 GB (SSD — likely SATA or SAS, verify on power-on)
Reused Drives 6x drives from R620 (2x 146GB SAS + 4x 1.2TB SAS)
Total Drive Count 8 drives (2 new + 6 from R620)
CPUs TBD — power on to check (C240 M5 supports 2x Xeon Scalable)
RAM TBD — power on to check (C240 M5 supports up to 3 TB)
Drive Bays C240 M5 has 24x 2.5" SFF or 12x 3.5" LFF depending on config
CIMC Cisco Integrated Management Controller (equivalent to iDRAC/iLO)

⚠️ Items to Verify on Power-On

  1. CPU model & count — Need to confirm sufficient cores/threads
  2. Total RAM installed — Current R620 workloads need ~62 GB minimum (CML alone uses 32 GB)
  3. Drive bay form factor — Should be 2.5" SFF to accept the R620 SAS drives
  4. RAID controller or HBA — Need HBA/IT mode for ZFS (NOT hardware RAID)
  5. NIC configuration — How many ports, speed, VLAN capability
  6. CIMC IP/access — For remote management
  7. Firmware version — May need BIOS/CIMC update

3. Migration Strategy — Option C: Wipe & Restore

Why This Approach

The R620's "Vault" pool sits on a RAID 0 virtual disk behind an LSI SAS1068E controller. The RAID metadata is tied to that controller — the drives aren't directly portable as a ZFS pool. Rather than fighting controller compatibility, we'll:

  1. Back everything up to PBS (running now)
  2. Wipe the R620 drives (RAID metadata gets destroyed when removed anyway)
  3. Install drives in C240 with a proper HBA/IT mode controller
  4. Create a fresh ZFS pool on the clean drives
  5. Restore all VMs/CTs from PBS

Benefits

Benefit Details
More storage 2x 960GB SSDs (boot mirror) + 4x 1.2TB drives = separate OS and data pools
Clean ZFS No RAID controller metadata — native ZFS from the start
Better redundancy Can use RAIDZ1 instead of RAID 0 (lose 1 drive worth of capacity, gain fault tolerance)
Full rollback R620 untouched until drives are pulled; PBS has all backups
No wasted drives Reusing all existing hardware

Target Drive Layout

┌───────────────────────────────────────────────────────────────┐
│                      UCS C240 M5                               │
├─────────────────────┬─────────────────────────────────────────┤
│  Boot Pool          │  Data Pool ("Vault")                     │
│  2x 960GB SSD       │  4x 1.2TB NETAPP SAS (from R620)        │
│  ZFS Mirror (RAID1) │  ZFS RAIDZ1 = ~3.6TB usable             │
│  Proxmox OS +       │  OR ZFS Stripe = ~4.8TB (no redundancy) │
│  local templates    │  VM/CT storage                           │
├─────────────────────┴─────────────────────────────────────────┤
│  Spare: 2x 146GB Seagate SAS (from R620)                      │
│  Options: ZIL/SLOG, L2ARC, small utility pool, or don't use   │
└───────────────────────────────────────────────────────────────┘

ZFS Pool Decision

Option Usable Space Fault Tolerance Recommendation
4x RAIDZ1 ~3.6 TB Survives 1 drive failure RECOMMENDED
2x Mirror pairs ~2.4 TB Survives 1 per pair, better IOPS Good if space isn't tight
4x Stripe (RAID0) ~4.8 TB NO redundancy (current R620 setup) Don't repeat this mistake

RAIDZ1 is the way to go. You only have ~108 GB of data currently, so 3.6 TB is more than enough. And you gain drive failure protection you don't have today.

What About the 2x 146GB Seagate Drives?

These are small and old but still functional. Options:

  • ZFS SLOG (write log) — marginal benefit for home lab, skip unless doing sync writes
  • L2ARC (read cache) — 146GB of SAS cache, minor benefit with only 108GB of data
  • Leave them out — simplest option, fewer failure points
  • Small utility pool — ISOs, templates, scratch space

Recommendation: Leave them out for now. Keep them as spares. You can always add them later.


4. Detailed Phase Breakdown

Phase 1: Prepare (Before Migration Day)

1.1 — Power On C240 M5 & Inventory

Action: Power on, access CIMC (default IP via console or DHCP)
Check:  CPUs, RAM, drive bays, RAID controller model, NIC ports
Goal:   Confirm hardware meets requirements (64+ GB RAM, 2.5" SFF bays, HBA capable)

1.2 — RAID Controller Configuration

CRITICAL: ZFS needs raw disk access — NOT behind a hardware RAID controller

If C240 M5 has Cisco 12G SAS Modular RAID Controller:
  → Flash to IT mode (HBA passthrough) OR
  → Configure JBOD mode in BIOS/CIMC
  → Create individual RAID-0 per disk (JBOD workaround if needed)

If C240 M5 has a simple HBA:
  → No action needed, ZFS will see raw disks

1.3 — Firmware Updates

Action: Check CIMC firmware version, update if below 4.x
Tool:   Cisco Host Upgrade Utility (HUU) — bootable ISO
Note:   Do this BEFORE installing Proxmox

1.4 — Verify Backups

Action: Confirm all 7 running workloads backed up successfully
Check:  tail -f /tmp/backup_all.log (running now)
Verify: pvesm list PBS-Backups (from Proxmox shell)

Phase 2: Install Proxmox on C240 M5

2.1 — Proxmox Boot Drive Setup

Config:    ZFS Mirror (RAID-1) on the 2x 960GB SSDs
Why:       Boot drive redundancy — if one SSD dies, system keeps running
Installer: Select "zfs (RAID1)" during Proxmox install
Bonus:     ~900GB usable for OS + local storage (ISOs, templates, etc.)

2.2 — Network Configuration During Install

Management IP:   Pick a new IP (e.g., 192.168.2.141) — keep R620 at .140 as fallback
Gateway:         192.168.2.1 (or whatever pfSense assigns)
DNS:             Match current R620 config
Hostname:        pve-c240 (or whatever you prefer)
Bridge:          vmbr0 on primary NIC

2.3 — Post-Install Configuration

# Add PBS storage
pvesm add pbs PBS-Backups \
  --server 192.168.2.151 \
  --datastore <datastore-name> \
  --username <pbs-user> \
  --fingerprint <pbs-fingerprint> \
  --content backup

# Verify connectivity
pvesm status

# Add any needed repos (no-subscription, etc.)
# Match /etc/apt/sources.list from R620

Phase 3: Migrate Data (The Big Move)

3.1 — Pre-Migration Checklist

□ All backups verified on PBS (all 7 running workloads)
□ pfSense config exported as XML (Diagnostics → Backup & Restore)
□ Proxmox configs backed up (tar czf /tmp/pve-configs.tar.gz /etc/pve/)
□ C240 M5 Proxmox installed and accessible
□ PBS storage connected on C240
□ RAID controller in HBA/IT mode on C240
□ Drive bays confirmed compatible (2.5" SFF SAS)
□ Maintenance window planned (Home Assistant, pfSense will be down)

3.2 — Shutdown Sequence (R620)

# Stop VMs/CTs in reverse dependency order
# pfSense LAST (everything depends on it for networking)

qm shutdown 102   # CML (resource heavy, shut down first)
qm shutdown 114   # haos
qm shutdown 100   # docker-hub
qm shutdown 101   # monitoring-docker
pct shutdown 109  # caddy
pct shutdown 112  # twingate-connector
qm shutdown 105   # pfSense — LAST

# Wait for all to stop
qm list && pct list

# Power off R620
shutdown -h now

3.3 — Physical Drive Migration

1. Power off R620 completely (already done in 3.2)
2. Pull the 4x NETAPP 1.2TB SAS drives (slots 2-5)
3. Optionally pull 2x Seagate 146GB SAS drives (slots 0-1)
4. Insert drives into C240 M5 drive bays
5. Power on C240 M5
6. Verify drives visible in CIMC/Proxmox: lsblk -d -o NAME,SIZE,MODEL,SERIAL

3.4 — Create Fresh ZFS Pool on C240

# Identify the 4x 1.2TB NETAPP drives (will have new device names)
lsblk -d -o NAME,SIZE,MODEL,SERIAL

# Wipe any leftover RAID metadata
wipefs -a /dev/sdX /dev/sdY /dev/sdZ /dev/sdW  # replace with actual device names

# Create RAIDZ1 pool (RECOMMENDED — 1 drive fault tolerance)
zpool create -f \
  -o ashift=12 \
  -O atime=off \
  -O compression=lz4 \
  -O recordsize=64k \
  Vault raidz1 /dev/disk/by-id/<drive1> /dev/disk/by-id/<drive2> /dev/disk/by-id/<drive3> /dev/disk/by-id/<drive4>

# Always use /dev/disk/by-id/ paths — they're stable across reboots

# Verify pool
zpool status Vault
zpool list Vault

# Add to Proxmox as storage
pvesm add zfspool Vault-data -pool Vault -content images,rootdir

Phase 4: Restore & Verify

4.1 — Restore from PBS

# Restore each VM/CT from PBS backup
# Easiest via Proxmox Web UI: Storage → PBS-Backups → Select backup → Restore

# CLI examples if preferred:

# VM 105 (pfSense) — RESTORE FIRST
qmrestore PBS-Backups:backup/vzdump-qemu-105-<timestamp>.vma.zst 105 \
  --storage Vault-data

# LXC 109 (caddy)
pct restore 109 PBS-Backups:backup/vzdump-lxc-109-<timestamp>.tar.zst \
  --storage Vault-data

# Repeat for: 100, 101, 102, 112, 114
# Also restore stopped VMs if needed: 104, 106, 107, 113, 117

4.2 — Startup Sequence (CRITICAL ORDER)

1. pfSense (105)           — FIRST — everything needs networking
2. caddy (109)             — reverse proxy for services
3. twingate-connector (112) — remote access
4. docker-hub (100)        — core services
5. monitoring-docker (101) — observability
6. haos (114)              — Home Assistant
7. CML (102)               — Cisco Modeling Labs (resource heavy, LAST)

4.3 — Post-Migration Verification Checklist

□ All VMs/CTs start successfully
□ pfSense routing/firewall rules intact
□ pfSense WAN/LAN interfaces mapped correctly to new NIC names
□ Home Assistant devices reconnected
□ Docker containers running (check docker-hub VM)
□ Monitoring/Grafana dashboards loading
□ Caddy reverse proxy serving sites
□ Twingate remote access working
□ PBS backup jobs reconfigured on new Proxmox host
□ ZFS pool healthy (zpool status Vault)
□ No disk errors in dmesg
□ SMART health on all drives (smartctl -a /dev/sdX)

5. Rollback Plan

UNTIL you pull drives from R620, rollback is trivial:
  1. Power off C240 M5
  2. Power on R620
  3. Everything is exactly as it was

AFTER drives are pulled and wiped:
  1. You cannot restore the R620 to original state
  2. BUT: PBS has full backups of everything
  3. If C240 fails: re-insert drives in R620, install fresh Proxmox, restore from PBS
  4. OR: put drives back in C240 and troubleshoot

KEY SAFETY NET: PBS on TrueNAS (192.168.2.150/151) is independent of both servers.
As long as TrueNAS stays up, your backups are safe regardless of what happens.

6. Estimated Timeline

Phase Duration Notes
Phase 1: Prepare 1-2 hours CIMC setup, firmware, verify hardware, HBA config
Phase 2: Install Proxmox 30-45 min Proxmox install on SSD mirror + basic config
Phase 3: Migrate drives + ZFS pool 30-60 min Physical drive swap + create RAIDZ1 pool
Phase 4: Restore from PBS 1-3 hours Depends on data size (~108 GB across all VMs)
Phase 4: Verify 1-2 hours Start everything, test services
Total ~4-7 hours Plan for a half-day window

7. Risk Matrix

Risk Impact Likelihood Mitigation
C240 RAM insufficient (<64 GB) HIGH MEDIUM Check CIMC before starting — need 62+ GB
RAID controller doesn't support HBA/IT mode HIGH LOW Most C240 M5 configs have this; JBOD workaround available
Drive bay incompatible (3.5" LFF chassis) HIGH LOW C240 M5 SFF variant uses 2.5" — verify on power-on
PBS goes down during migration HIGH LOW Fixed macvtap issue today; verify before starting
pfSense NIC mapping changes MEDIUM MEDIUM NICs will have different names on C240; remap in pfSense console
Drive failure during migration HIGH LOW RAID 0 has zero redundancy today — fresh backups are the safety net
Firmware incompatibility LOW LOW Update CIMC/BIOS first via HUU

8. Pre-Migration Bonus Tasks (Do Before Migration Day)

# 1. Export pfSense config (CRITICAL — do from pfSense Web UI)
#    Diagnostics → Backup & Restore → Download configuration as XML
#    Save to local machine AND to TrueNAS

# 2. Document current network config (run on R620)
ip addr show
cat /etc/network/interfaces
cat /etc/hosts
cat /etc/resolv.conf

# 3. Save Proxmox configs
tar czf /tmp/proxmox-configs-backup.tar.gz /etc/pve/

# 4. Copy to TrueNAS for safekeeping
scp /tmp/proxmox-configs-backup.tar.gz truenas_admin@192.168.2.150:/mnt/data/backups/

# 5. Note down PBS connection details for re-adding on new Proxmox
cat /etc/pve/storage.cfg | grep -A 10 PBS

# 6. Record current VM disk locations
for vmid in 100 101 102 104 105 106 107 114; do
  echo "=== VM $vmid ==="; qm config $vmid | grep -E "scsi|virtio|ide|efidisk"
done
for ctid in 109 112 113 117; do
  echo "=== CT $ctid ==="; pct config $ctid | grep rootfs
done

9. Open Questions (Resolve on Power-On)

  1. C240 M5 drive bay form factor? — Need 2.5" SFF for the R620 SAS drives
  2. RAID controller model? — Determines HBA/IT mode procedure
  3. Total RAM? — Minimum 64 GB needed (CML = 32 GB alone)
  4. CPU specs? — Should be fine, but confirm core count
  5. Individual R620 drive sizes? — Jordan to double-check (currently showing 2x 146GB + 4x 1.2TB)
  6. ZFS pool layout preference? — RAIDZ1 recommended (~3.6TB), stripe (~4.8TB) if you need space
  7. Keep the 2x 146GB Seagates? — Recommend leaving out; they're small and old
  8. Same IP (.140) or new IP for C240?
  9. Hostname preference?pve, pve-c240, something else?

Plan authored by Garvis — 2026-03-14 Updated: Option C strategy (wipe drives, restore from PBS), added full drive inventory. Will be updated once C240 M5 hardware inventory is complete.