Geoffrey Litt on Malleable Software and Notion-Based Task Management for Vibe Coding

Key Insights

CLI as bridge technology: The command line is “a mediocre GUI and a mediocre API, but the fact that it’s both is what’s great” (00:17:03). CLIs became popular for AI agents because they’re easy to make agent-native, but GUIs are making a comeback as apps build true parity between human and AI capabilities.
Understanding tax matters more than speed: Don’t send AI-generated PRs unless you can pass a quiz about what changed. The key constraint isn’t how fast you can ship, but how fast you can ship while maintaining understanding. Auto-generated quizzes provide an “automated understanding check” to prevent fooling yourself about comprehension.
Stable task maps beat context-switching chaos: Managing AI agents through a Kanban board creates a “stable map” of what’s in progress versus endless tab-switching. The AI marks cards red when it needs human input, reversing the mental model from “human checks on AI” to “AI raises hand when blocked.”
Documents that become apps: The future of malleable software starts with document editors, not app builders. People should start by writing things down, then gradually add structure and capabilities until documents evolve into custom tools—like how businesses accidentally end up running on spreadsheets.
Presidential brief standard for AI output: When reviewing AI work, demand explanations that start zoomed out (concepts and architecture) before diving into code diffs. AI should prepare “briefing books” with visualizations, quizzes, and overproduced artifacts optimized for human learning, not raw markdown files.

Summary

Geoffrey Litt is a design engineer at Notion focused on malleable software—democratizing creative control over the digital tools that shape our lives. Previously at Ink & Switch research lab, Litt explores how AI can enable anyone to build their own tools without traditional coding bottlenecks. His core belief is that software should feel like owning a house versus renting an apartment: users should have local agency to modify and control their tools rather than accepting centralized, one-size-fits-all designs.

In this conversation, Litt demonstrates a workflow he built using Notion as a task management interface for Claude Code agents. The system emerged from a personal project—an evolution simulator inspired by Richard Dawkins’ The Selfish Gene—where he struggled to track multiple AI coding tasks across dozens of browser tabs. His solution: a Notion Kanban board where tasks flow from planning to in-progress to done, with agents updating status and marking cards red when they need human input. Two years after building apps live on stream with ChatGPT, Litt is now running experiments in vibe coding while maintaining production-grade code quality standards.

The discussion explores agent-native architecture principles, the tension between speed and understanding in AI-assisted development, and emerging conventions for human vouching of AI work. Litt emphasizes the importance of “shared state” between humans and AI, using high-bandwidth communication tools like voice input (for conveying intent) and rich explanatory documents with quizzes (for verifying understanding).

Main Topics

Malleable Software Philosophy

Litt’s core mission is enabling more people to “play with the material of software” and have creative control over digital tools. Before AI, the bottleneck was that “coding is hard”—teaching people was the limiting factor. The past few years have brought an “insane change” where AI can potentially democratize tool-building.

The house-versus-apartment metaphor captures Litt’s vision: when renting an apartment, you can’t move walls; owning a house provides more options for modification. He wants software to embody “having more control and having more local agency and less like far away, centralized, someone designs the whole thing for you energy” (00:02:44).

“My core interest in my life is malleable software, which means how do we get more people playing with the material of software and democratizing this creative control over like the digital medium that drives so much of our lives.” (00:01:02)

Why Notion for Malleable Software

Litt believes Notion has “one of the most promising and interesting approaches to this problem” (00:03:13). Instead of starting with app-building, users start with documents—writing things down. Gradually, the document becomes richer and evolves from information storage into custom tools.

The hypothesis: “the future starts out looking like a document editor” with a really low floor (just type to get started) and high ceiling (build custom tools with your team). Spreadsheets are the key inspiration—people don’t open Excel to make an app, they just start writing numbers, and “before you know it, your whole business is running on spreadsheets” (00:04:18). It’s an “accidental snowball” virtuous cycle.

“Really, really low floor, just type in to get started, and then really high ceiling of letting you build custom tools with your team.” (00:03:45)

Notion-Based Task Management Workflow

While working on his evolution simulator side project, Litt faced the challenge of managing “so many Claude Code tabs”—20+ tabs with tasks scattered everywhere. He built a Notion Kanban board to serve as the “stable map” of what’s coming next and what’s in progress.

The workflow works as follows: 1. Create tasks in a Notion board (via voice notes, manual typing, or AI generation) 2. Use the command notion tasks plan [task URL] to kick off planning 3. Tasks move through columns: To Do → Planning → In Progress → Done 4. When AI needs input, it marks the card red and posts questions in Notion comments 5. Human responds in comments, unblocking the agent 6. AI updates the task with detailed explanations, code snippets, and quizzes

The system allows parallel planning (5 things at once, since they don’t touch each other) and parallel building (2-3 things on a good codebase). The key shift: instead of humans checking on multiple AI agents, agents “raise their hand when they need me” (00:11:47).

“I have like 20 tabs, I’m tabbing between them trying to figure out like, wait, like, where did that task go? And where did that task go?” (00:06:46)

Agent-Native Architecture and Parity Principle

Litt strongly endorses the “parity principle”—anything a user can do in the app, the agent should be able to do. This architectural choice is why the Notion integration works: Notion built granular, composable tools that work equally well for humans and AI. The tools are “like Lego bricks” that combine into unpredictable forms.

He demonstrates by showing how he created multiple tasks just by recording a voice note in Notion, then asking Notion AI to “fill in that task board with tasks” based on the brainstorm. The voice-to-transcript-to-structured-tasks pipeline shows the “emergence of being able to take unstructured stuff” and process it automatically into structured data.

The discussion touches on whether agents should be embedded in apps or external. Litt’s answer: both. Integrated agents get special UI experiences (like streaming edits), but bridges to external agents enable division of labor—Claude Code for coding, Notion for task management.

“Notion has one of the most promising and interesting approaches to this problem, which is that… instead of starting from people building apps, what if people start from building documents.” (00:03:13)

Understanding Over Speed: The Quiz System

Litt has developed a practice around AI-generated explanations to maintain understanding while moving fast. His explanations “never start with a code diff” but begin zoomed out—explaining concepts before diving into changes (00:24:24).

For a simple task (reducing gravity in a lunar lander game), the explanation started with: “this is what lunar landers are, and here’s how the physics works.” For larger codebases, these explanations become invaluable for learning “here’s how all the existing stuff works and here’s what we changed.”

The innovation: auto-generated quizzes at the end of explanations. “What happens if no thrust is applied?” → “Increases by the gravity value.” → “Correct.” He has a strict policy: “I refuse to send [a PR] unless I can pass a quiz of understanding what’s in the PR” (00:25:10).

This is his “AI-enhanced damper or brake on the speed” that prevents getting lazy: “I want to move as fast as possible while maintaining my understanding in an automated way” (00:25:46).

“It’s really easy to convince yourself that you understand something, even if you don’t. Everyone who is smart about learning has encountered this before. Like you read a book and then you try to explain it to a friend and you realize like one question and that you have no idea what’s going on.” (00:25:21)

Slop and the Spectrum of Code Quality

When asked to define “slop,” Litt describes his range of practices. For production code maintained for years, he’s “pretty in the weeds” doing code reviews, using small model steps, and really dialing in each logical step—“I’m tab completing, I’m rarely typing” but very involved in architecting (00:22:03).

For side projects, experiments, and prototypes, he goes “full vibe coding”—letting AI generate more freely. The key distinction isn’t whether you read the code, but “the number of decisions you’re injecting and the amount of taste you’re injecting” (00:22:52). For his evolution simulator, he hasn’t seen the code but has done “dozens of studies and sketches with Claude” prototyping design decisions in extreme detail.

Litt rejects the idea that English code can’t be well-crafted, drawing an analogy to JavaScript: “We don’t spend a lot of time reading machine code… and [JavaScript] was kind of a controversial take like 15 years ago” (00:23:17). English might be a higher-level abstraction, but you can still have taste and craft at that level.

The key is “understanding what’s going on”—not just to check the AI, but because “knowing about how it works can help you have better ideas for the next thing to do” (00:23:53).

High-Bandwidth Human-AI Communication

The bottleneck and promise of AI is “how do you make sure that the state between the AI and the human is shared?” (00:27:50). There are two directions to optimize:

Human to AI (conveying intent): - Voice input for high-bandwidth communication - Walking around talking into headphones to brainstorm - Voice notes that get automatically processed into structured tasks

AI to Human (conveying what was done): - Rich visualizations instead of markdown files - Interactive presentations with slide decks showing simulation results - Explanatory documents that feel like “presidential briefing books” - Auto-generated quizzes to verify understanding

Litt describes his ideal: “Whenever I’m sitting down to see what the AI did, I want it to feel like I’m the president, like a staff spent a day preparing this briefing book for me. And it is like the most insanely overproduced, digested, like beautiful artifact, just waiting for me to have like an optimal learning experience” (00:28:54).

“We have so many powerful tools [for high-bandwidth communication] going both directions on those arrows.” (00:28:12)

Conventions for Vouching and Verification

The conversation touches on emerging needs for conventions around humans vouching for AI work. What prevents someone from just highlighting the whole page and saying “yep, I stamped this”? Litt jokes: “social shame, we just have to eject you from society” (00:27:03).

But seriously, the challenge is that AI is “designed to make things that sound plausible” yet are sometimes subtly wrong—not factually incorrect, but misaligned with what you actually wanted. Traditional code review is difficult because everything looks reasonable.

The solution: “automate and mechanize the process of making sure humans are staying in the loop” (00:27:36). Quizzes are one mechanism. Other possibilities include requiring humans to mark specific sections as reviewed, creating paper trails of understanding checks, or building tools that quantify “how much effort went into” any token being read.

Litt references the need to know “does Austin really believe in this sentence?” when reading AI-generated planning documents from his colleague (00:26:26). The team needs ways to signal genuine human endorsement versus AI generation with perfunctory review.

Actionable Details

Tools and Technologies

Claude Code: AI coding assistant used for the primary development work
Notion: Document editor and task management system; serves as the UI layer for agent workflow
Notion MCP Server: Model Context Protocol server that lets Claude Code read and write Notion pages
Notion AI: Built-in AI agent with parity to user capabilities
Claude Opus 4: Model used for the evolution simulator project (mentioned as “the new Opus model” during holiday break)
Conductor: Tool Litt previously used but still found tab management challenging
Kora (at Every): AI email product that sends twice-daily briefs (mentioned as inspiration for “brief” terminology)
GitHub plugin: Claude Code plugin for Notion available on GitHub

Commands and Workflows

Starting a planning task:

notion tasks plan [task URL]

Alternatively, can ask Claude to find the task: “go find the task for you”

Voice-to-task workflow: 1. Record voice note in Notion 2. Tell Notion AI: “Based on this brainstorm, can you fill in that task board with tasks?” 3. AI creates structured task cards from unstructured audio

Task progression: - To Do → Planning → In Progress → Done - Red cards indicate agent needs human input - Comments section for human-AI back-and-forth

Explanation Structure Template

When AI completes a task, explanations should follow this structure: 1. Context: High-level explanation of concepts (“this is what lunar landers are”) 2. Architecture: How the existing system works 3. Changes: What was modified and why 4. Code snippets: Specific implementation details 5. Quiz: Test questions to verify understanding

Parallelization Strategy

Planning: Can do 5 things in parallel (they don’t touch each other)
Building: 2-3 things in parallel on a good codebase
Check the board regularly to see what’s blocked versus what can proceed

Production vs. Prototype Standards

Production code (maintained for years): - In the weeds code reviews - Small model steps with tight control - Fully understand each logical step - Heavy architectural involvement

Prototypes and experiments: - Full vibe coding - Dozens of studies/sketches - Focus on design decisions and taste injection - Don’t necessarily read the code, but inject decisions

The Quiz Practice

Before sending any AI-generated PR to colleagues: 1. AI generates auto quiz about the changes 2. Take the quiz yourself 3. Must pass before sending to others 4. Ensures you actually understand what changed

Quotes Worth Saving

“My core interest in my life is malleable software, which means how do we get more people playing with the material of software and democratizing this creative control over like the digital medium that drives so much of our lives.” (00:01:02)

“It’s this idea of like, oh, like, how could this be better? How could I change it? And you know, when you’re renting an apartment, there’s only so much you can do, you can’t move the walls. But then if you own a house, like there’s more options available to you.” (00:02:20)

“A CLI is a mediocre GUI and a mediocre API, but the fact that it’s both is what’s great… You can do stuff manually in a CLI, but then you can also up level to automations and compositions in a CLI. And the fact that you can do both is both what makes it really powerful for super nerds and what makes it powerful for AI, which is, you know, like the ultimate super nerd, I guess.” (00:17:01)

“I think a lot about… the number of decisions you’re injecting and the amount of taste you’re injecting is sort of the key thing. It’s not what you’re reading the code. It’s like… I’ve done like dozens of studies and sketches with code, you know, and like, we’re like prototyping my new design decisions in extreme detail. So like I’m in it.” (00:22:52)

“Whenever I’m sitting down to see what the AI did, I want it to feel like I’m the president, like a staff spent a day preparing this briefing book for me. And it is like the most insanely overproduced, digested, like beautiful artifact, just waiting for me to have like an optimal learning experience.” (00:28:54)

“I refuse to send [a PR] unless I can pass a quiz of understanding what’s in the PR… It’s really easy to convince yourself that you understand something, even if you don’t.” (00:25:10)

“The bottleneck and promise of AI right now is just, how do you make sure that the state between the AI and the human is shared?” (00:27:50)