OpenAI Whisper

Whisper runs OpenAI's speech-to-text model locally on your machine. Fast, accurate transcription across dozens of languages — audio never leaves your device.

Install:

npx clawhub@latest install openai-whisper

What it does

Transcribes audio files to text using the Whisper model running locally. Supports .mp3, .mp4, .m4a, .wav, .webm, and most common audio formats.

Use case	Example
Meeting notes	Transcribe a Zoom/Meet recording to a structured notes file
Voice memos	Convert voice recordings to searchable text in MEMORY.md
Podcast clips	Pull quotes from an episode without manual listening
Interviews	Transcribe a recorded interview for a written piece
Dictation	Speak your thoughts, get a draft document

Basic usage

Transcribe the audio file at ~/recordings/meeting-2026-05-25.m4a

Transcribe ~/voice-memo.m4a and save the result to ~/notes/2026-05-25-memo.md

Transcribe this meeting recording and format the output as structured notes with: attendees, key decisions, and action items.
~/recordings/team-standup.mp3

Local processing

Whisper runs fully on your machine — no audio is sent to OpenAI's servers or any external service. This matters for:

Confidential meetings and calls
Legal or medical recordings
Any audio you wouldn't want uploaded to a third party

The first run downloads the Whisper model weights to your machine (roughly 1–3 GB depending on model size). Subsequent runs are fast and fully offline.

Pair with other skills

Morning brief — transcribe your morning voice note and include the summary in the brief
Memory — save transcriptions to dated files in ~/memory/ for searchable recall
Humanizer — transcriptions often sound natural already; use Humanizer to clean up transcribed text before publishing
Nano PDF — transcribe a recorded presentation, then edit the accompanying PDF to match

Self-Improving Agent Nano Banana Pro