v0.4.0 · now with Mamba SSM working memory

Your terminal has a memory. Finally.

MemDev observes your terminal sessions, distils them through a state-space model and a fine-tuned CodeT5-Large, then injects the right context into your AI assistant — so you never have to repeat yourself.

Open core · MIT Local-first · your data stays yours bash · zsh · fish

~/projects/auth-svc

recording

inject

Claude Code

context loaded

YOU · 09:02

help me fix the login bug from yesterday

<mempress-context> · 4 memories injected

CLAUDE · just now

I see — you were working on JWT validation in src/auth.py on feature/oauth2. The 401 was caused by a clock-skew check you added at line 84. Want me to widen the tolerance to 60 seconds?

01 · the problem

Groundhog day for developers.

Context windows are finite. Development sessions are not. Every time the window fills up — or you start a new shell — your AI assistant loses the plot.

monday 09:14 3rd time this week

YOU

fix the auth bug

Sure — which auth system are you using? Can you share the file?

YOU

JWT, in src/auth.py, like we discussed yesterday

I don't have access to yesterday's conversation. Could you re-share the file and explain the bug?

YOU

…

You spend twenty minutes re-explaining what you already explained yesterday. Or you accept worse help. Or you give up.

Context windows fill up

A long session blows through 200k tokens. Anything before that is gone.

Sessions don't carry over

New shell, new tab, new chat — your AI starts from zero again.

Manual notes don't happen

In theory you'd document everything. In practice — nobody does.

02 · how it works

Observe. Remember. Compress. Recall.

Four quiet processes running in the background. A working memory that never resets, an episodic compressor that distils, and a retriever that injects context exactly when you need it.

01 / OBSERVE

Watches your shell.

A lightweight hook captures every command, exit code, file change, and git state transition. Non-blocking JSONL log, 2-second debounce on file events.

$git checkout -b feature/oauth2

→ event 1841 · branch · 0ms

$vim src/auth.py

→ event 1842 · edit · +45 -3

$pytest tests/auth/

→ event 1843 · test · pass

02 / REMEMBER

Mamba state vector.

A frozen Mamba-130M state-space model ingests session text incrementally. Its hidden state is a fixed ~2–10MB vector that persists across sessions — capturing flow, salience, and unresolved threads.

tokens streamed → 14,821

state dim · 768 · O(n)

▌ surprise spikes:

jwt skew check · t=18m

401 retry loop · t=22m

→ ssm_state.pt · 4.2 MB

03 / COMPRESS

Distil to intent.

CodeT5-Large (770M) — fine-tuned on 75K cleaned CommitBench examples — turns raw events plus the SSM salience signal into a 1–2 sentence intent summary.

events 1841–1858 + state vec

▌ Implemented JWT validation in

auth module with 60s clock-skew

tolerance. Tests pass.

→ memory #1247 · stored

04 / RECALL

Inject on demand.

Ask a natural-language question. Get ranked memories with causal chains. Format as XML. Inject into your next AI prompt — preserving cache hits.

$mempress context --for "fix login"

<mempress-context>

memory #1247 · jwt validation

memory #1244 · login flow refactor

memory #1239 · oauth2 branch start

</mempress-context>

03 · features

Everything a memory layer should be.

Built specifically for the way developers actually work — terminal-first, semantic, private, fast.

Terminal-native

Works with bash, zsh, and fish. Drops into your existing shell with a single hook line. No IDE lock-in, no proprietary terminal required.

eval "$(mempress hooks --shell zsh)"

State-space working memory

A frozen Mamba SSM ingests session text incrementally. Fixed ~2–10MB state vector persists across sessions, capturing flow, salience and unresolved threads — without context-window limits.

$ mempress state --inspect

Semantic search

"Why did we switch to JWT?" — not just keyword grep. Hybrid ranking combines embedding similarity, recency, and full-text matching, biased by SSM state.

$ mempress query "why jwt?"

Causal chains

Every memory links to its predecessor. Trace decision history backwards through time — and inject the chain alongside the memory, cache-safe.

$ mempress chain --from #1247

Privacy-first

Local SQLite + your own object storage. Nothing leaves your control. Self-host the optional cloud sync on your own S3 bucket.

$ mempress config storage=hetzner

Web dashboard

FastAPI + HTMX, zero JavaScript build. Browse your memory graph, explore causal chains, manage tags. Runs on localhost in 2 seconds.

$ mempress serve --port 8787

04 · architecture

Boring tech, applied carefully.

Six components, one pipeline, everything testable. A frozen state-space model holds context; a fine-tuned encoder-decoder turns it into prose; the rest is plumbing.

Observer

watchdog · JSONL · shell hooks

daemon

Working memory · SSM

Mamba-130M · frozen · O(n)

state

Compressor

CodeT5-Large 770M · CommitBench

model

Memory store

SQLite + ChromaDB · hybrid

storage

Retriever

semantic + temporal + FTS

query

Injector

XML · cache-safe prefix

output

Hierarchical memory: SSM + episodic.

Two models, one purpose. A frozen state-spaces/mamba-130m ingests every token of the session incrementally. Its hidden state — a fixed ~2–10MB vector — is the working memory: it captures conversational flow, topic drift, and which threads were left unresolved. State persists to disk between sessions and never grows with session length.

Above it sits the episodic compressor: Salesforce/codet5-large (770M params), fine-tuned on 75K cleaned examples from maxscha/commitbench. The SSM state acts as a salience prefix, telling the compressor which moments mattered.

770M

CodeT5-Large params

75K

cleaned commit pairs

~10MB

SSM state · fixed size

90%

token cost saved · cache

05 · faq

Questions developers actually ask.

Is my code sent to your servers?

No. The default mode is local-only — SQLite database in ~/.local/state/mempress/, no network calls. The Pro tier adds optional cloud sync, but you bring your own S3-compatible bucket. Even there, your code never touches our infrastructure.

Which shells are supported?

bash, zsh, and fish. The shell hook captures preexec and precmd events. PowerShell support is on the roadmap.

Which AI assistants does it work with?

Any assistant that accepts text context — Claude Code, Cursor, GitHub Copilot Chat, Windsurf, ChatGPT, Aider, plain CLI claude -p. The output is just a structured XML block; you decide where to paste it.

How is this different from memO or Zep?

Those are general-purpose memory layers. MemDev is shell-native, code-aware, and hierarchical: a Mamba SSM holds the working memory of an entire session as a fixed-size state vector, while a CodeT5-Large compressor — fine-tuned on 75K cleaned commit-message pairs — turns it into prose. It also captures how you got somewhere: the failed commands, the debug detours, the unresolved threads — not just the outcome.

Can I self-host?

Yes. The core is open source under MIT. The Docker compose file spins up the FastAPI server, ChromaDB, and the model worker. Bring your own object storage.

What's the pricing?

TBC. The local-only tier will always be free and open source. Cloud and team tiers will be announced at general availability.

Stop repeating yourself to your AI.

Early access opens to invitees first. Drop your email to get the next slot.