15 KiB
Video Recording
Record browser automation sessions as MP4 video files. Uses CDP Page.startScreencast to capture JPEG frames and pipes them to ffmpeg for encoding.
Prerequisites
ffmpeg must be installed. Choose один из вариантов:
Вариант 1: в проект (рекомендуется)
Скачать essentials build с https://www.gyan.dev/ffmpeg/builds/, распаковать в tools/ffmpeg/ проекта:
tools/ffmpeg/
├── bin/
│ ├── ffmpeg.exe ← этот файл ищет startRecording()
│ ├── ffplay.exe
│ └── ffprobe.exe
└── ...
Код автоматически найдёт tools/ffmpeg/bin/ffmpeg.exe — ничего больше настраивать не нужно.
Вариант 2: глобально (один раз на машину)
Скачать, распаковать в любой каталог (напр. C:\tools\ffmpeg), добавить bin/ в системный PATH.
После этого ffmpeg доступен во всех проектах.
Вариант 3: через .v8-project.json (общий путь)
Чтобы не копировать ffmpeg в каждый проект, указать путь в конфиге:
{
"ffmpegPath": "C:\\tools\\ffmpeg\\bin\\ffmpeg.exe"
}
Модель прочитает это поле и передаст в startRecording({ ffmpegPath }).
Порядок поиска ffmpeg
opts.ffmpegPath— явный путь (из.v8-project.jsonили параметра)FFMPEG_PATH— переменная окруженияffmpeg— в системном PATHtools/ffmpeg/bin/ffmpeg.exe— относительно корня проекта
API
startRecording(outputPath, opts?)
Start recording the browser viewport to an MP4 file.
| Parameter | Type | Default | Description |
|---|---|---|---|
outputPath |
string | required | Output .mp4 file path |
opts.fps |
number | 25 | Target framerate |
opts.quality |
number | 80 | JPEG quality (1-100) |
opts.ffmpegPath |
string | auto | Explicit path to ffmpeg binary |
opts.speechRate |
number | 70 | Ms per character for smart TTS wait. Increase for slower TTS providers (e.g. 85 for ElevenLabs) |
- Output directory is created automatically if it doesn't exist
- Throws if already recording or browser not connected
- Recording auto-stops when
disconnect()is called
stopRecording() → { file, duration, size, captions }
Stop recording and finalize the MP4 file. Saves .captions.json next to the video if captions were collected.
| Return field | Type | Description |
|---|---|---|
file |
string | Absolute path to the MP4 file |
duration |
number | Recording duration in seconds |
size |
number | File size in bytes |
captions |
number | Number of captions collected during recording |
isRecording() → boolean
Check if recording is active.
showCaption(text, opts?)
Display a text overlay on the page (visible in recording). Calling again updates the text.
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
string | required | Caption text |
opts.position |
'top' | 'bottom' |
'bottom' |
Vertical position |
opts.fontSize |
number | 24 | Font size in px |
opts.background |
string | 'rgba(0,0,0,0.7)' |
Background color |
opts.color |
string | '#fff' |
Text color |
opts.speech |
string | false | - | TTS narration text. Omit = use displayed text, string = custom narration, false = skip narration |
opts.voice |
string | - | Per-caption voice override (provider-specific voice name/ID). Used by addNarration instead of the global voice |
When text is empty but speech is a string, the caption is still recorded for TTS (no visible overlay). Useful for narration-only captions (e.g. podcast mode).
The overlay uses pointer-events: none — does not interfere with clicking.
Smart TTS wait (during recording): showCaption automatically pauses for the estimated TTS speech duration (default ~70ms per character, min 2s; configurable via startRecording({ speechRate })). The next wait() call accounts for this — if the explicit pause is shorter than the TTS wait already done, no extra delay is added. If longer, only the remaining difference is waited. This means script authors don't need to calculate TTS timing manually.
hideCaption()
Remove the caption overlay.
showTitleSlide(text, opts?)
Display a full-screen title slide overlay (gradient background, centered text). Useful for intro/outro frames in video recordings. Calling again updates the content.
| Parameter | Type | Default | Description |
|---|---|---|---|
text |
string | required | Title text (\n → line break) |
opts.subtitle |
string | '' |
Smaller text below the title |
opts.background |
string | dark gradient | CSS background |
opts.color |
string | '#fff' |
Text color |
opts.fontSize |
number | 36 | Title font size in px |
opts.speech |
string | false | - | TTS narration text. String = custom text, true = use title text, omit/false = no narration |
opts.voice |
string | - | Per-caption voice override for addNarration |
The overlay covers the entire viewport with z-index: 999999 and pointer-events: none.
hideTitleSlide()
Remove the title slide overlay.
showImage(imagePath, opts?)
Display a full-screen image overlay (e.g. presentation slide screenshot). Reads the file, base64-encodes it, and renders as <img> in a fixed overlay — captured by CDP screencast automatically.
| Parameter | Type | Default | Description |
|---|---|---|---|
imagePath |
string | required | Path to image file (PNG, JPG, GIF, WebP, SVG) |
opts.style |
'blur' | 'dark' | 'light' | 'full' |
'blur' |
Display style preset |
opts.background |
string | - | Custom background (overrides preset) |
opts.shadow |
boolean | preset | Show drop shadow on image |
opts.speech |
string | false | - | TTS narration text while image is shown |
opts.voice |
string | - | Per-caption voice override for addNarration |
Style presets:
blur— blurred+dimmed copy of the image as background, centered image with shadowdark— dark background (#2a2a2a) with shadowlight— white background with shadowfull— image fills entire screen (contain, no crop), black background, no shadow
Images are auto-scaled: small images scale up (min 50% of viewport), large images scale down (max 92%).
hideImage()
Remove the image overlay.
setHighlight(on)
Enable or disable auto-highlight mode. When enabled, action functions (navigateSection, openCommand, clickElement, selectValue, fillFields) automatically highlight the target element for 500ms before performing the action.
| Parameter | Type | Description |
|---|---|---|
on |
boolean | true to enable, false to disable |
How it works: each action highlights the element → waits 500ms (viewer reads) → removes highlight → performs the action. This prevents the highlight overlay from interfering with modals, dropdowns, or focus changes caused by the action.
Search priority: form elements (buttons, links, fields, grid rows) are searched first. Sections and commands are used as fallback only if the element is not found in the current form. This avoids false matches (e.g., "ОК" matching section "Покупки" via substring).
isHighlightMode() → boolean
Check if auto-highlight mode is active.
highlight(text)
Manually highlight a UI element by name (fuzzy match). Places a semi-transparent blue overlay (rgba(0,100,255,0.25)) with a border on the element. The overlay tracks element position via requestAnimationFrame.
| Parameter | Type | Description |
|---|---|---|
text |
string | Element name — button, link, field, group/panel, section, or command |
- Fuzzy match order: exact → startsWith → includes
- Search priority: popup items → commands → form groups/panels → form elements (buttons, fields) → sections
- Groups are matched by visible title or internal name (e.g.,
highlight('Оргструктура')finds the group panel) pointer-events: none— does not block clicks
unhighlight()
Remove the highlight overlay.
Example: Record a workflow with highlight, title slide, and captions
await startRecording('recordings/create-order.mp4');
// Title slide with narration
await showTitleSlide('Создание заказа клиента', {
subtitle: 'Демонстрация',
speech: 'Создание заказа клиента. Демонстрация.'
});
await wait(1);
await hideTitleSlide();
// Presentation slide (optional)
await showImage('slides/overview.png', {
speech: 'На этом слайде показана общая схема процесса'
});
await wait(1);
await hideImage();
setHighlight(true); // enable auto-highlight for all actions
// Steps: caption → pause → action (highlight is automatic)
await showCaption('Шаг 1. Переходим в раздел «Продажи»');
await wait(1.5);
await navigateSection('Продажи');
await showCaption('Шаг 2. Открываем заказы клиентов');
await wait(1.5);
await openCommand('Заказы клиентов');
await showCaption('Шаг 3. Создаём новый заказ');
await wait(1.5);
await clickElement('Создать');
await wait(2); // wait for form to load
await showCaption('Шаг 4. Заполняем шапку');
await wait(1.5);
await fillFields({ 'Организация': 'Конфетпром', 'Контрагент': 'Альфа' });
await wait(1);
await hideCaption();
setHighlight(false);
const result = await stopRecording();
console.log(`Recorded ${result.duration}s, ${(result.size / 1024 / 1024).toFixed(1)} MB`);
Caption timing: show the caption before the action — showCaption auto-waits for estimated TTS duration during recording. The subsequent wait() is absorbed by the credit system (no double-waiting). Add wait() after the action only when the next step needs the result to load (e.g., form opening).
Highlight timing: setHighlight(true) enables auto-mode — each action function highlights the target for 500ms, then removes the highlight before performing the action. No manual highlight()/unhighlight() calls needed. Enable after title slide, disable before stopRecording().
TTS Narration
Add voiceover to recorded videos. Captions shown via showCaption() are automatically collected during recording and can be synthesized into speech.
Prerequisites
- ffmpeg — same as for video recording (ffprobe must be next to ffmpeg)
- node-edge-tts —
npm install --prefix tools/tts node-edge-tts(for Edge TTS provider, free, no API key). Also works if installed globally or at project level — the resolver tries multiple locations automatically
Configuration in .v8-project.json
{
"tts": {
"provider": "edge",
"voice": "ru-RU-DmitryNeural"
}
}
For OpenAI-compatible provider:
{
"tts": {
"provider": "openai",
"apiKey": "sk-...",
"voice": "alloy"
}
}
For ElevenLabs:
{
"tts": {
"provider": "elevenlabs",
"apiKey": "sk_...",
"voice": "JBFqnCBsd6RMkjVDRZzb"
}
}
Note: voice is the ElevenLabs voice ID (not a name). Default model: eleven_multilingual_v2 (supports Russian and other languages).
showCaption() speech parameter
The speech option controls what text is narrated (vs displayed):
await showCaption('Дт 60.02 — Кт 51'); // narrates the displayed text
await showCaption('Дт 60.02 — Кт 51', { speech: 'Проводка: дебет шестьдесят ноль два, кредит пятьдесят один' }); // custom narration
await showCaption('Техническая информация', { speech: false }); // no narration for this caption
addNarration(videoPath, opts?)
Generate TTS and merge audio with video. Call after stopRecording().
Parameters:
| Parameter | Type | Description |
|---|---|---|
videoPath |
string |
Path to the recorded MP4 file |
opts.captions |
Array |
Explicit captions (default: from last recording or .captions.json). Each caption may include a voice field to override the global voice for that segment |
opts.provider |
string |
'edge' (default), 'openai', or 'elevenlabs' |
opts.voice |
string |
Voice name (provider-specific) |
opts.apiKey |
string |
API key (for openai) |
opts.apiUrl |
string |
Endpoint (for openai) |
opts.model |
string |
Model (for openai, default: tts-1) |
opts.ffmpegPath |
string |
Path to ffmpeg binary |
opts.outputPath |
string |
Output file (default: video-narrated.mp4) |
Returns: { file, duration, size, captions, warnings? }
getCaptions()
Returns captions from the current or last recording: Array<{ text, speech, time, voice? }>.
Example: Record and narrate
await startRecording('recordings/demo.mp4');
await showCaption('Переходим в раздел Банк и касса');
await wait(1.5);
await navigateSection('Банк и касса');
await showCaption('Открываем банковские выписки');
await wait(1.5);
await openCommand('Банковские выписки');
await hideCaption();
const video = await stopRecording();
// Add narration (reads tts config from .v8-project.json)
const narrated = await addNarration(video.file, { voice: 'ru-RU-DmitryNeural' });
console.log(`Narrated: ${narrated.file}, ${narrated.duration}s`);
Re-narration
After recording, a .captions.json file is saved next to the video. You can re-narrate with a different voice without re-recording:
const result = await addNarration('recordings/demo.mp4', { voice: 'ru-RU-SvetlanaNeural' });
Troubleshooting
| Problem | Solution |
|---|---|
| "ffmpeg not found" | Install ffmpeg and ensure it's discoverable (see Prerequisites) |
| Recording file is 0 bytes | Check that output path is writable. ffmpeg may have crashed |
| Video is choppy | Add wait() between steps. Reduce quality for faster capture |
| "Already recording" | Call stopRecording() before starting a new recording |
| Recording stops on disconnect | Expected — auto-stop prevents orphaned ffmpeg processes |
| "No captions available" | Use showCaption() during recording, or pass opts.captions |
| TTS timeout | Check internet connection. Edge TTS requires network access |
| Audio cuts off between captions | Smart TTS wait should handle this automatically. If warnings appear, add longer wait() after showCaption |