Naveen Naidu - Building Monologue: Voice-to-Text for the AI Age
Key Insights
-
Competition validates markets: Having competitors who’ve raised $10-80M is actually an advantage for a solo builder - they’re spending millions educating the market while you build a better product with minimal investment.
-
Implementation speed matters more than code quality: With AI coding tools, you can vibe code a full iOS app prototype in one hour, get user feedback immediately, and iterate. The bottleneck is no longer “can we build it?” but “should we build it?”
-
Codex vs Claude Code have different strengths: Use Codex for precise edits in large existing codebases (it’s the “senior engineer who knows everything”), use Claude Code for creative vibe coding and rapid prototyping of new features.
-
Users talk way more than expected: Average monologue is 48 words (P50), but P90 users are recording 400-1000 words at a time. Two users have crossed 1 million words total. The product is processing 1.5 million words per day.
-
Auto-learning modes are the future of voice dictation: Instead of static templates, the next evolution is having modes automatically update based on how you edit the transcribed text - learning your writing style as you use it.
Summary
Naveen Naidu, the solo builder behind Monologue at Every, presented a deep dive into building a competitive voice-to-text product in a crowded market. Naveen joined Every about 15 months ago as an engineer-in-residence, ran multiple experiments, and built Monologue - which immediately saw internal users recording 100+ times per day. The product has since grown to process 1.5 million words daily.
The talk covered Monologue’s key differentiating features (modes, auto-enter, per-app activation) and provided a preview of the upcoming iOS app launching February 9th, 2026. Naveen shared his workflow for building with AI coding tools, using Codex for working with large Swift/iOS codebases and Claude Code for rapid prototyping. He emphasized that in the current era, the constraint isn’t implementation ability but knowing what to build - you can vibe code a complete feature in an hour and start getting real user feedback immediately.
Main Topics
Introduction to Monologue
Monologue is a “smart voice to text Mac app” with iOS launching in early February 2026. The core workflow is simple: set a keyboard shortcut (Naveen uses right-side option key), hold to record short clips (5-10 seconds) or double-tap for longer recordings, and text gets pasted wherever your cursor is.
Key differentiators from competitors: - Per-app modes that auto-activate - Custom instructions for personalization - Auto-enter feature for hands-free operation - Paste last transcript with keyboard shortcut (Ctrl+V for Naveen)
“If you want to live in the future, you cannot be typing anything. You have to be using your voice and you should use Monologue from Every, built by the one and only Naveen.” [00:00:34 - 00:00:46]
Modes: The Power Feature
Modes allow different transcription behavior based on context. You can create per-app modes - for example, a “Cloud Code mode” that automatically activates when you’re in your terminal or IDE.
How to set up: 1. Go to settings and create a new mode 2. Add the apps where it should auto-activate (e.g., Ghostty terminal, Warp, Claude app) 3. Optionally add custom instructions for that mode 4. Enable “auto enter” to send text immediately without confirmation
Demonstration: Naveen opened his terminal, triggered Monologue, and said “Hey, can you record, uh, go through my code base and see if there are any kind of bugs.” The transcription auto-entered directly into the terminal. [00:09:01 - 00:09:18]
“Tap, talk, tap, it’s, uh, send, paste the text, it auto sends. I think that’s the, one of the best ways for you to work with codex or cloud code.” [00:09:49 - 00:09:53]
Pro tip: You can toggle between modes mid-recording using the UI or keyboard shortcuts. One user created a mode that adds clapping emojis between every word for emphasis.
Custom Instructions for Better Accuracy
In settings, you can add custom instructions telling Monologue who you are, what you do, and any specific terminology or formatting preferences.
What to include: - Your name and role - Calendar links or phone numbers (for proper transcription) - British English vs American English preference - Your specific speaking style
“What happens is monologue understands you much better. So the output that you get from monologue is much better. Uh, so that’s one quick tip I recommend everyone, uh, to just like do a brain dump here.” [00:07:45 - 00:07:59]
Usage Statistics and Patterns
Naveen shared surprising data about how people actually use Monologue:
- P50 (median): 48 words per recording
- P90: 400-1000 words per recording
- Top users: 300 recordings per day
- Milestones: Two users have crossed 1 million words total
- Current volume: 1.5 million words processed per day
- Growth trajectory: Went from 1 million words per month at launch to 1 million+ per day
“People are talking a lot, uh, to their cloud code or a codex. And that’s a good thing because you’re giving as much context and you get better.” [00:07:11 - 00:07:13]
iOS App Preview (Launching February 9th, 2026)
The iOS app brings all Mac features to mobile, with full sync between devices: - All modes sync automatically - Widget support for one-tap recording without opening the app - Background recording with timer visible in status bar - Settings sync across devices
Notes feature (iOS and Mac): A new feature for longer-form recordings that you want to save and reference later. Different from quick dictation - designed for capturing stream-of-consciousness thoughts on the go, like during a walk or hike. Plans to integrate with Spiral (Every’s writing product) to convert voice notes directly into blog posts.
“You can just start immediately recording and then stop it. It will be on your laptop. It will get synced. It will be on your phone as well.” [00:21:11 - 00:21:19]
Building with AI Coding Tools: Codex vs Claude Code
Naveen shared his workflow for deciding which AI coding assistant to use:
Use Codex when: - Fixing bugs in large, complex codebases - Working with Swift, iOS, or Mac codebases specifically - You need precise edits across many files - You need the tool to understand existing patterns and context
“Codex is really good at, uh, understanding Swift, uh, iOS and max code base right now, Mac, it’s a huge code base… codex is that one senior engineer where, uh, it understands all the code, everything. So, and it does that precise edits” [00:15:16 - 00:15:26]
Use Claude Code when: - Vibe coding new features from scratch - You want creative solutions and rapid prototyping - Starting from a blank slate
“When I’m wipe coding, I don’t usually do codex because codex is like, it’s not that creative, right? Personal fear. So that’s when I go to plot code.” [00:15:26 - 00:15:30]
The Notes feature workflow: The entire Notes feature was built by doing a “brain dump” to Claude Code using Monologue, describing what he wanted, and having it vibe code everything from scratch. First prototype took one hour to implement and share internally for feedback.
“When right now it looks everything well polished, but when we initially try a prototype, the, uh, the big thing that we able to do it as I’m able to write coded it and one hour and start sharing it internally.” [00:14:02 - 00:14:06]
Competition as Market Education
A counterintuitive insight about competing in a crowded space:
Monologue competes with well-funded voice dictation apps (competitors have raised between $10M-$80M). But Naveen and the team realized this is actually advantageous - those companies are spending millions educating the market about AI voice dictation, while Naveen (as a solo builder supported by Every’s ~$700K total funding) can focus on building a better product.
“It’s actually amazing in this day and age to build a product and have competitors that have raised a ton of money because getting people to use new AI to use products that are, that, that use AI is really hard. There’s a lot of education and educating, educating a market is so expensive and we have competitors that are spending millions of dollars educating a market and you building a product that is just as good, if not better.” [00:24:31 - 00:24:58]
Future Roadmap
Auto-learning modes (next major feature): Instead of static mode templates, Monologue will learn from your edits. When you paste transcribed text and then edit it, Monologue will detect what you changed and automatically update the underlying mode instructions. The system learns your writing style and preferences as you use it.
“You edited out Monolog actually actually learns from it and then goes and updates the mode underlying. So what happens is it learns from you as you use Monolog more” [00:19:23 - 00:19:29]
Other requested features: - Mode templates / public library of modes - Custom skins (mentioned: see-through Nintendo DS aesthetic) - Windows/PC version - Integration with Spiral for voice-note-to-blog-post workflow - Hardware product (Naveen’s “crazy idea” - on hold for now)
Notes vs Granola positioning: Notes is not meant to replace Granola (Every’s meeting notes product). Granola is for people who live in meetings all day. Monologue Notes is for people with fewer meetings (<5/week) and for capturing stream-of-consciousness thoughts on the go.
Philosophy: Implementation vs. Knowing What to Build
“In the age that right now we are living in implementing features is not really that, uh, important. It’s knowing what to implement and what actually gets, uh, people excited. That’s the most important part for us.” [00:14:28 - 00:14:35]
Naveen emphasized building quickly, sharing with real users immediately, and iterating based on feedback. The Every Discord community provides constant feedback that shapes the roadmap.
Actionable Details
Getting Started with Monologue
- Download: Go to monologue.to
- Set keyboard shortcut: In settings, configure your trigger key (right-side option recommended)
- Choose recording mode:
- Hold to record: Quick 5-10 second clips
- Double-tap: Longer recordings with manual stop
- Add custom instructions: Go to settings and describe who you are, what you do, preferred terminology
- Set up per-app modes:
- Create mode for coding (terminal, IDE)
- Create mode for messaging (Slack, Discord, iMessage)
- Create mode for writing (text editors)
- Enable auto-enter for hands-free operation
Pro Tips
- Access previous transcripts: Use Ctrl+V (or your configured shortcut) to paste the last transcript
- View all recent transcripts: Click the Mac menu bar icon
- Switch modes mid-recording: Click “change mode” in the recording UI
- iOS launch: Watch for February 9th, 2026 launch date (may shift by a few days)
Tools and Products Mentioned
- Monologue: monologue.to - Voice-to-text for Mac (iOS coming Feb 9)
- Codex: AI coding assistant (good for large codebases, precise edits)
- Claude Code: AI coding assistant (good for vibe coding, rapid prototyping)
- Spiral: Every’s writing product (planned integration with Monologue Notes)
- Granola: Every’s meeting notes product
- Every Discord: Access at every.to (requires paid subscription)
- Super Whisper: Competitor voice dictation app
- Whisperflow: Competitor voice dictation app
Technical Stack
- Swift for Mac and iOS apps
- Large codebase that Codex excels at navigating
- Cross-platform sync for modes and settings
- Widget support on iOS
Quotes Worth Saving
“I just love building products, that’s it.” [00:01:24 - 00:01:28]
“The average is around P 50 averages on 48 words, but P 90 ton of people who put in the words, it’s around 400 to 1000 words. So actually people are talking a lot, uh, to their cloud code or a codex. And that’s a good thing because you’re giving as much context and you get better.” [00:06:57 - 00:07:11]
“Tap, talk, tap, it’s, uh, send, paste the text, it auto sends. I think that’s the, one of the best ways for you to work with codex or cloud code. And I believe yeah, the apps don’t do it.” [00:09:49 - 00:09:57]
“Even though there are a ton of crowded, uh, things, the way I work, no app supports me. And that’s why we added mods. That’s why we had this auto enter feature.” [00:22:39 - 00:22:45]
“In the age that right now we are living in implementing features is not really that, uh, important. It’s knowing what to implement and what actually gets, uh, people excited. That’s the most important part for us.” [00:14:28 - 00:14:35]
“Having users, uh, in our discord, I think that’s a great, great feeling, uh, because ton of people just give feedback and we have a rich roadmap now.” [00:26:51 - 00:26:55]