NanoClaw Architecture

NanoClaw isn't one agent — it's a fleet of short-lived Claude Code containers driven by a host-side scheduler, with three persistence layers stitched together: a task queue (what to run), a memory store (what was learned), and a context layer (what the agent sees at boot). Each container is ephemeral. The persistence is not.

Why NanoClaw exists

NanoClaw is BitSafe's attempt to answer a specific question: what does a company look like when an AI agent has continuous organizational memory and the authority to act on it? Most AI tools today fall into one of two categories — Q&A systems (Notion AI, ChatGPT, Claude.ai) that wait for a human to ask, search a fixed corpus, and return text; or coding agents (Claude Code, Cursor) that operate on a single repo per session and lose context between runs. Neither category remembers the business across days, initiates work on its own schedule, or takes consequential actions on behalf of the team.

NanoClaw is built around three commitments those categories don't make:

1. Continuous business context, shared across the team

Every agent run boots with awareness of (a) what the company is currently working on (WORK_CONTEXT.md, refreshed hourly), (b) recent thread history per channel/DM, (c) long-term memory per user and per group, and (d) 24 indexed knowledge caches covering Slack, Notion, email, GitHub, Canton/Splice/CIP docs, and source code. A new employee onboarded today gets the same domain expertise as a 2-year veteran; a question about a deal from 8 months ago is answered with the same fidelity as one from yesterday. This is what lets NanoClaw cover a chief-of-staff function — it doesn't need to be re-briefed on what the company does, who customers are, or what's blocked. It already knows.

2. Proactive execution, not reactive answering

NanoClaw runs ~80 scheduled tasks at any time — daily BD digests, hourly knowledge-compiler crons, heartbeat monitors, dev-pipeline auto-promotes, DR drills, customer-channel watchers. The system initiates work on its own schedule and surfaces results. A Q&A system can tell you the answer if you ask; NanoClaw notices the question is worth asking and brings the answer to you. This shifts the company from human-driven triage (someone has to remember to check) to agent-driven monitoring (the system surfaces only what changed). Cost-per-watch drops to near-zero, so the company can afford to monitor things it previously couldn't justify watching.

3. Authority to act, with structured human-in-the-loop

NanoClaw doesn't just answer — it ships code through CI to dev to prod, posts in Slack with named identities, drafts and sends email, files research items in the ARQ, creates Notion pages, runs database queries, manages the market-maker bot. It has graduated permission tiers: routine actions flow freely; high-risk actions (mass cross-channel posting, prod deploys with schema changes, financial moves) gate on explicit human approval through the admin-bot RPC pattern or 3-of-3 review. Every action it takes correctly is one less thing a human had to do. Mistakes surface via audit logs and severity-tagged admin pings; corrections become memory entries that prevent the same class of error in the future.

What this gets us that Notion AI / Claude Code can't

vs. Q&A systems (Notion AI, ChatGPT, Claude.ai)

Multi-turn conversations persist across sessions and across days, not just within a single chat window.
Personalized per-user (own DM context, own preferences) AND shared org context, simultaneously.
Acts on the answer rather than handing it back as text — sends the message, files the doc, ships the code.
Operates across systems (Slack ↔ Notion ↔ GitHub ↔ Gmail ↔ Heroku ↔ Fly) instead of within a single product silo.
Initiates work on a schedule and on triggers, not just on prompt.

vs. coding agents (Claude Code, Cursor)

Aware of the business context around the code, not just the repo — knows who customers are, what deals are open, what the strategy is.
Persists context across sessions instead of restarting fresh each time.
Coordinates many agent runs in parallel (~80 scheduled tasks + per-Slack-trigger spawns) instead of one at a time.
Operates as a team member with a stable identity (Slack handle, persistent memory, role) rather than a per-session tool the human picks up and puts down.

How NanoClaw multiplies every person we hire

BitSafe's plan is to hire as much top talent as is ROI-positive and as much as we can afford. NanoClaw isn't a substitute for that — it's a force multiplier on it. Every hire we make operates at materially higher leverage because the system handles the part of the job that scales poorly with headcount.

24/7 backup for every person on the team. No one is the single point of failure on a customer thread, a deal, or an alert. Sleep, vacation, and focus time stop being risks to the business.
Perfect institutional recall on demand. Each employee can ask "what did we tell this customer last quarter" or "what was the rationale on that decision" and get a sourced answer in seconds — across millions of Slack messages, tens of thousands of Notion pages, every commit, every email.
New workflows compound across the whole company. Adding a skill is adding a file; the moment it ships, every person benefits — no rollout, no training, no lossy handoff.
Onboarding accelerates dramatically. New hires inherit the system's memory on day one — context that previously took months of pattern-matching to absorb is queryable from the first week.

Business value (part evidence, part bet)

Multiplies every hire. We want to hire as much top talent as is ROI-positive and as much as we can afford — and we want each of those people operating at the highest possible leverage. NanoClaw is the substrate that makes that possible: a chief-of-staff layer, sales-ops layer, and dev-ops layer that every employee gets to lean on, not a replacement for any of them.
Compounds with use. Every correction becomes a memory entry; every recurring task becomes a skill; every new data source becomes another cache the system queries instinctively. An organization using NanoClaw for 12 months has meaningfully more capability than one using it for 1 month — the marginal capability cost decreases over time.
Faster deal velocity. BD digests, automatic LinkedIn signal collection, customer-channel monitoring, and the marketing-ABM workflow shorten the cycle from "we should reach out to X" to "we did, here's the thread." Compresses sales cycle without adding sales headcount.
Reduces single-points-of-knowledge. When senior employees hold critical context in their heads, the company is fragile to their absence. NanoClaw mirrors that context as queryable memory — the company becomes more resilient to vacations, transitions, and turnover.
Defensible operating advantage. Most competitors will use off-the-shelf AI tooling (Notion AI, ChatGPT-for-business). The companies that build agent-native operations layers — where the AI is part of the team, not a tool the team uses — will move materially faster on every recurring process. This advantage compounds annually and is hard to replicate by buying a SaaS subscription.
Lower-bound estimate. If NanoClaw saves the executive team ~1 hour/day each on digests, drafting, search, and triage, at typical Series-A executive comp that’s over $100k annually in time wasted, miscommunications, and bad data, against a fully-loaded infra cost in the low five figures. The return scales linearly with team size and superlinearly with how deeply the system is wired into recurring workflows.

The deeper bet is that "company with persistent agent memory + autonomous execution" is a different category of company than "company that uses AI tools." NanoClaw is the attempt to build the former at BitSafe — and to learn, in production, what the constraints and economics of that category actually are.

1. Task Queue

Every recurring or future-dated job lives in a single SQLite table on the host VM (store/messages.db), replicated to GCS via Litestream every ~1 second. Three schedule types: cron (e.g. 0 9 ** * for 9am daily), interval (milliseconds between runs), and once (one-shot, auto-deleted after firing).

When a task fires, the host spawns a fresh Claude Code container, mounts the workspace, runs the prompt, captures output, and kills the container. crontab inside a container is useless — it dies on restart. Every job must be registered via the schedule_task MCP tool, which writes to the host DB and survives container churn, host reboots, and VM rebuilds.

Pre-flight scripts. Each task can include a bash script that runs first (30s timeout) and emits {wakeAgent: true/false}. If nothing changed, the agent never wakes — saves API credits. ~40% of recurring tasks use this.
Isolated vs. group context. Tasks declare isolated (fresh session, no history) or group (joins the Slack thread). Default is isolated — stale history pollutes scheduled execution.
Auto-repair. A nightly cron walks the table fixing drifted next_run columns (timezone bugs, daylight savings). Shipped after one job ran 12 hours late.

~80 active scheduled tasks as of May 2026: BD digests, knowledge compilers, doc cache syncs, DR drills, agent-credit watchdogs, design pipelines.

2. Goals — Notion as the Control Plane

NanoClaw's "what should I work on" is not in code — it's in Notion. Three databases drive behavior:

Open Decisions — things waiting on Aki. Agents check this before proposing work that overlaps a pending call. Currently 13 open.
Admin Research Queue (ARQ) — capability gaps and skill proposals. When an agent hits something it can't do, it files an ARQ entry instead of failing silently. Aki triages weekly.
Active Projects — current sprint work. An hourly cron mirrors the top entries into WORK_CONTEXT.md so every container boot sees them.

The Skills database follows the same pattern — 74 skills defined as Notion pages, synced hourly to /workspace/skills/<name>/SKILL.md. On-disk is read-only cache. Edit in Notion; the next sync ships it globally to every future agent invocation without a deploy.

3. Memory Management

Each agent has a memory directory at /home/node/.claude/projects/-workspace-group/memory/ with one file per memory and a flat MEMORY.md index. Four typed memory categories:

user — role, expertise, communication style
feedback — corrections AND validated approaches. Only saving corrections produces drift toward over-caution; saving both keeps the agent calibrated.
project — what's shipping, why, by when. Each entry includes a Why: line so future-you can judge whether the fact is still load-bearing. These decay fast.
reference — pointers to external systems (Linear projects, Grafana dashboards, Notion DB IDs)

The core discipline: if you didn't write it down, it doesn't exist. "Noted" without a file write = not remembered. MEMORY.md is loaded into every conversation context. Details are paged in on demand via memory-search (FTS5 full-text search across all memory files).

The index is intentionally capped at 200 lines. The model has a finite context window — keeping the loaded surface small leaves room for actual work.

4. Context Management

Every container boot reads, in order:

THREAD_CONTEXT.md — last N messages of the current Slack thread (per-thread file, written by the host before spawn)
WORK_CONTEXT.md — global "what NanoClaw is doing right now" (refreshed hourly, pulls from Notion + memory + recent commits)
MEMORY.md — long-term curated memory index
Daily logs — today and yesterday's session log
User CLAUDE.md — per-user preferences (Aki vs. Mayank vs. Anna get different defaults)

The anti-pattern we ripped out: stuffing the entire conversation into every prompt. Long sessions degrade as context fills — we call this "context rot." The fix: write intermediate results to files, reference them by path, let the agent re-read what it actually needs. Sub-agents get fresh context windows for research-heavy work and return condensed results.

5. Agent Swarm + Parallelism

Build tasks dispatch sub-agents in isolated git worktrees so they can write to the same repo without colliding. Two coordination primitives keep parallel work safe:

File-claim locks (claim_file / release_file / list_locks) — for shared external state: Notion rows, package.json, state JSON files. Default 5-min TTL with heartbeat refresh. Another agent finding a conflict gets the owner's ID back and can coordinate before overwriting.
Commit-message trailers — instead of editing CHANGELOG.md directly (parallel agents collide on merge), agents emit CHANGELOG-Features: trailers in their commit body. The orchestrator runs consolidate-changelog.py --apply once after merge, serially. This eliminated the merge-conflict-on-CHANGELOG class entirely.

Agent swarms use named sender identities (Researcher, Coder, Reviewer) that appear as distinct bot identities in Slack, making multi-step workflows readable. Sub-agents NEVER call send_message — only the main agent sends output to the user.

6. Knowledge Layer — 24 SQLite Caches

NanoClaw doesn't call APIs at agent runtime if it can help it. 24 data sources are mirrored to local SQLite with FTS5 indexes, updated by background crons:

Slack history, Notion content, Fathom transcripts, Google Calendar
Salesforce, QuickBooks Online, Cryptio (financials and accounting)
GitHub repos — DLC-Link source, Canton Foundation repos
Splice docs, Canton CIPs, DA docs, Brale docs, Temple docs, Nightly docs
n8n workflows, Ninety.io KPIs, Telegram community

search-all queries all 24 caches in parallel in ~400ms. Three caches (Slack, Fathom, Calendar) are now SQLCipher-encrypted at rest — Phase 2 shipped May 2026, Phase 3 (Notion, Salesforce) in queue.

Agents never say "I don't have access" without searching local caches first. The rule: search-all before any external API call. This dramatically reduces latency and API cost on information retrieval.

7. Ship Pipeline

NanoClaw runs a three-environment topology: prod (nanoclaw-01, us-central1-c), dev (nanoclaw-staging, us-central1-a), and test (Litestream replica + Sunday DR drill).

The standard flow for a functional change: push a branch → CI runs (lint, typecheck, Vitest) → if green, staging-deploy job rebuilds the dev VM → auto-merge-after-staging-smoke job sleeps 30 min watching journalctl on nanoclaw-staging → if clean, promotes to prod main → prod cron picks up the restart within 5 min.

Hard exceptions that always require manual review: container/Dockerfile, src/db.ts schema migrations, scripts/setup-egress-firewall.sh, package.json major version bumps. The auto job refuses these; a human runs promote-to-prod.sh manually after review.

Litestream replicates store/messages.db to GCS continuously (~1s RPO). The Sunday DR drill (run-litestream-drill.sh) is the standing health check for the test environment.

The Unifying Principle

Six layers. One principle: write everything to durable storage; treat each agent invocation as fresh. Containers die — memory, tasks, caches, and skills persist. This means the system gets smarter over time without any individual agent needing to "remember" between sessions.

<aside> 📖

This is Part 2 of a two-part series. Read Part 1: Building a Company-Wide AI Assistant — Architecture, Security, and Self-Improvement

</aside>

Build-vs-Buy Audit — 2026-05-28