Lumiscript
Industry
SaaS / AI Tool
Year
2025 - Current
Client
Manus
Services
Full-Stack Development, AI Engineering, Product Design
Description
An AI-Powered Transcription Engine for Short-Form Creators Lumiscript is a full-stack SaaS platform that transforms TikTok videos and Instagram Reels into timestamped, multi-language scripts with visual storyboards and creative direction analysis. Designed for content creators and agencies who reverse-engineer viral content, it automates the tedious process of manually transcribing, translating, and breaking down short-form videos into actionable production blueprints.
The Challenge: A Fragmented Creator Workflow Content creators studying viral short-form videos face a painfully manual process: transcribing audio by ear, screenshotting key frames one by one, and guessing at the creative strategy behind a video's success. Existing transcription tools only output raw text with no visual context, no scene analysis, and no multi-language support, forcing creators to juggle multiple disconnected tools. Additionally, videos without direct speech, such as skits set to background music, are mishandled entirely by standard speech-to-text engines, producing hallucinated transcripts instead of useful visual breakdowns.
The Solution: An Intelligent, End-to-End Analysis Pipeline I architected a multi-stage AI pipeline that goes far beyond basic transcription. The system downloads videos via yt-dlp and Apify, runs Whisper for speech-to-text, then passes the output through a custom LLM-based audio classifier (Gemini 2.5 Flash) that distinguishes direct speech from background music or singing, routing each video to the correct analysis path. Talking videos receive timestamped transcripts with synchronized storyboard screenshots, scene breakdowns, and a four-pillar Creative Direction analysis (Attention Hooks, Emotional Triggers, Pacing & Rhythm, Special Effects). Non-talking skit videos are automatically detected and processed through a visual analysis pipeline with scene-change detection and on-screen text extraction. The entire stack, React 19, tRPC, Drizzle ORM, TiDB, and S3, was built for production from day one.
Intelligent Audio Classification: Engineered a custom LLM classifier that analyzes Whisper output to distinguish direct speech from background music and singing, correctly routing 100% of skit-style videos to the visual analysis pipeline instead of producing hallucinated transcripts.
Multi-Language Reach: Built real-time translation into 4 languages (English, German, Japanese, Korean) with one-click copy and export (.txt, .srt, .zip), enabling creators across regions to study content in their native language.
Rapid Creator Adoption: Achieved 1.23K page views and 278 unique visitors within the first 55 days of launch, with 50+ creators across different regions using the platform daily to reverse-engineer viral content strategies.
Full AI Pipeline Orchestration: Integrated Whisper, Gemini 2.5 Flash Vision, ffmpeg, yt-dlp, and Apify into a single automated pipeline that processes a video from URL to full creative breakdown in under 60 seconds.


