Now Available: Audio Transcription & Video Generation

Your team’s best context does not always live in text.
Sometimes it is buried in a podcast, a customer call, a meeting recording, or a voice note. And your team may want to produce more than text — a visual clip or video.
Tasklet can now help with both.
We just rolled out two new capabilities: audio transcription and video generation.
Audio Transcription
This means you can now drop an audio file into Tasklet, generate a transcript, and ask your agent to pull out the parts that matter: summaries, actionable insights, quotes, and more.
In the demo, we used a 90-minute episode of The Cognitive Revolution featuring Tasklet founder Andrew Lee. Tasklet transcribes it and pulls out insights like positioning, differentiation, campaign ideas, and strong one-liners.
The transcript is not the final output. It becomes the raw material and context your agent can use to get work done.
Try audio transcription:
- “Subscribe to this podcast RSS feed. When a new episode is published, transcribe it, monitor for topics related to my business, summarize the most relevant ideas, and send me the takeaways.”
- “Monitor this Google Drive folder. Whenever I upload a new voice note, transcribe it and turn it into the right next step: tasks, drafts, briefs, research prompts, or actions for you to take.”
- “Monitor this folder for new customer call recordings. When one is added, transcribe it, pull out pain points, objections, action items, and next steps, then draft the follow-up email.”
- “Monitor my podcast and webinar feeds to repurpose content. When a new episode is published, transcribe it, extract key themes, quotes, and clip-worthy moments, then draft social posts, a blog outline, and newsletter copy in our brand voice.”
Video Generation
With native video generation, you describe the video you want in plain text and Tasklet turns it into something like this:
Try video generation:
- “Generate a video of a drippy popsicle melting in sunny San Francisco.”
- “Generate a short video of a robot barista carefully making a latte in a coffee shop.”
Regardless of which model you choose in Tasklet, we may use specialized capabilities offered by other models to more effectively complete specific tasks. For example, image generation uses GPT Image 2.0, audio transcription uses Gemini 3.5 Flash, and video generation uses Veo 3.1. Tasklet can route different pieces of work to the model best suited for the job.
These features help your agents do more with the context your team already has. Audio files, transcripts, videos, images, documents, apps, tools, workflows — more of your team’s work can now happen in one place, no matter what format it’s in.
Have feedback? Drop me a note at tyler@tasklet.ai