MemDev observes your terminal sessions, distils them through a state-space model and a fine-tuned CodeT5-Large, then injects the right context into your AI assistant — so you never have to repeat yourself.
src/auth.py on
feature/oauth2. The 401 was caused by a clock-skew check
you added at line 84. Want me to widen the tolerance to 60 seconds?
Context windows are finite. Development sessions are not. Every time the window fills up — or you start a new shell — your AI assistant loses the plot.
You spend twenty minutes re-explaining what you already explained yesterday. Or you accept worse help. Or you give up.
Context windows fill up
A long session blows through 200k tokens. Anything before that is gone.
Sessions don't carry over
New shell, new tab, new chat — your AI starts from zero again.
Manual notes don't happen
In theory you'd document everything. In practice — nobody does.
Four quiet processes running in the background. A working memory that never resets, an episodic compressor that distils, and a retriever that injects context exactly when you need it.
A lightweight hook captures every command, exit code, file change, and git state transition. Non-blocking JSONL log, 2-second debounce on file events.
A frozen Mamba-130M state-space model ingests session text incrementally. Its hidden state is a fixed ~2–10MB vector that persists across sessions — capturing flow, salience, and unresolved threads.
CodeT5-Large (770M) — fine-tuned on 75K cleaned CommitBench examples — turns raw events plus the SSM salience signal into a 1–2 sentence intent summary.
Ask a natural-language question. Get ranked memories with causal chains. Format as XML. Inject into your next AI prompt — preserving cache hits.
Built specifically for the way developers actually work — terminal-first, semantic, private, fast.
Works with bash, zsh, and fish. Drops into your existing shell with a single hook line. No IDE lock-in, no proprietary terminal required.
A frozen Mamba SSM ingests session text incrementally. Fixed ~2–10MB state vector persists across sessions, capturing flow, salience and unresolved threads — without context-window limits.
"Why did we switch to JWT?" — not just keyword grep. Hybrid ranking combines embedding similarity, recency, and full-text matching, biased by SSM state.
Every memory links to its predecessor. Trace decision history backwards through time — and inject the chain alongside the memory, cache-safe.
Local SQLite + your own object storage. Nothing leaves your control. Self-host the optional cloud sync on your own S3 bucket.
FastAPI + HTMX, zero JavaScript build. Browse your memory graph, explore causal chains, manage tags. Runs on localhost in 2 seconds.
Six components, one pipeline, everything testable. A frozen state-space model holds context; a fine-tuned encoder-decoder turns it into prose; the rest is plumbing.
Two models, one purpose. A frozen state-spaces/mamba-130m ingests every token of the session incrementally. Its hidden state — a fixed ~2–10MB vector — is the working memory: it captures conversational flow, topic drift, and which threads were left unresolved. State persists to disk between sessions and never grows with session length.
Above it sits the episodic compressor: Salesforce/codet5-large (770M params), fine-tuned on 75K cleaned examples from maxscha/commitbench. The SSM state acts as a salience prefix, telling the compressor which moments mattered.
No. The default mode is local-only — SQLite database in ~/.local/state/mempress/, no network calls. The Pro tier adds optional cloud sync, but you bring your own S3-compatible bucket. Even there, your code never touches our infrastructure.
bash, zsh, and fish. The shell hook captures preexec and precmd events. PowerShell support is on the roadmap.
Any assistant that accepts text context — Claude Code, Cursor, GitHub Copilot Chat, Windsurf, ChatGPT, Aider, plain CLI claude -p. The output is just a structured XML block; you decide where to paste it.
Those are general-purpose memory layers. MemDev is shell-native, code-aware, and hierarchical: a Mamba SSM holds the working memory of an entire session as a fixed-size state vector, while a CodeT5-Large compressor — fine-tuned on 75K cleaned commit-message pairs — turns it into prose. It also captures how you got somewhere: the failed commands, the debug detours, the unresolved threads — not just the outcome.
Yes. The core is open source under MIT. The Docker compose file spins up the FastAPI server, ChromaDB, and the model worker. Bring your own object storage.
TBC. The local-only tier will always be free and open source. Cloud and team tiers will be announced at general availability.
Early access opens to invitees first. Drop your email to get the next slot.