AI CLI Agent Editing Tools
AI CLI Agent Editing Tools
Complete reference guide for AI agents to use FFmpeg, ImageMagick, yt-dlp, mkvtoolnix, MoviePy & VapourSynth via CLI commands.
The universal video engine. Merge images/videos, add transitions (xfade, fade), resize, re-encode, extract frames, create slideshows, add audio, speed ramping, overlay text/images.
Turning a folder of images into a video, concatenating clips, format conversion.
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt update && sudo apt install ffmpeg
# Windows (via winget or chocolatey)
winget install Gyan.FFmpeg
# or
choco install ffmpeg
AI writes shell commands or Python subprocess calls. FFmpeg has no API—it's pure CLI.
# Slideshow from image folder (3s per image + crossfade)
ffmpeg -framerate 1/3 -pattern_type glob -i "folder/*.jpg" \
-vf "fps=30,scale=1920:1080:force_original_aspect_ratio=decrease,pad=1920:1080:(ow-iw)/2:(oh-ih)/2,format=yuv420p" \
-c:v libx264 -pix_fmt yuv420p output.mp4
# Merge videos with transition
ffmpeg -i video1.mp4 -i video2.mp4 -filter_complex \
"[0:v][1:v]xfade=transition=slideleft:duration=1:offset=4,format=yuv420p[video]; \
[0:a][1:a]acrossfade=d=1[audio]" \
-map "[video]" -map "[audio]" output.mp4
Image manipulation, batch resize, format conversion, composite overlays, create GIFs from image sequences, annotate/watermark.
Preprocessing images before video assembly, generating thumbnails, creating animated GIFs.
# macOS
brew install imagemagick
# Ubuntu/Debian
sudo apt install imagemagick
# Windows
winget install ImageMagick.ImageMagick
# Batch resize all images in folder
magick mogrify -resize 1920x1080^ -gravity center -extent 1920x1080 folder/*.jpg
# Create GIF animation from folder
magick convert -delay 20 -loop 0 folder/*.jpg output.gif
# Composite watermark
magick composite -gravity SouthEast watermark.png input.jpg output.jpg
Download videos/audio from 1000+ sites (YouTube, TikTok, Twitter, etc.). Extract metadata, thumbnails, subtitles.
Gathering source media for AI editing pipelines.
# macOS / Linux
brew install yt-dlp
# or
pip install yt-dlp
# Ubuntu
sudo apt install yt-dlp # (may be outdated; pip is better)
# Windows
pip install yt-dlp
# Download best quality video
yt-dlp "https://youtube.com/watch?v=XXXX" -o "downloads/%(title)s.%(ext)s"
# Download as audio only (for voiceover/music)
yt-dlp -x --audio-format mp3 "URL" -o "audio/%(title)s.mp3"
# Download with metadata JSON (AI can parse this)
yt-dlp --write-info-json --skip-download "URL"
Container manipulation. Merge video + audio + subtitles without re-encoding. Split/join MKV/MP4 files. Extract tracks.
Fast assembly when you don't want to re-encode (saves CPU).
# macOS
brew install mkvtoolnix
# Ubuntu
sudo apt install mkvtoolnix
# Windows
winget install MoritzBunkus.MKVToolNix
# Merge video + audio + subtitle without re-encoding
mkvmerge -o output.mkv video.mp4 audio.mp3 subtitles.srt
# Extract audio track
mkvextract tracks input.mkv 1:audio.aac
Pythonic video editing. Cuts, concatenation, transitions, text overlays, audio mixing, composite video clips.
AI agents that write Python code instead of shell scripts. More readable than FFmpeg filter_complex.
pip install moviepy
# Also requires ImageMagick for text/overlay features
from moviepy.editor import *
# Load clips
clip1 = VideoFileClip("video1.mp4").subclip(0, 5)
clip2 = VideoFileClip("video2.mp4").subclip(0, 5)
# Crossfade transition
final = concatenate_videoclips([clip1, clip2], method="compose")
# Overlay text
txt = TextClip("Hello", fontsize=70, color='white').set_duration(5).set_position('center')
result = CompositeVideoClip([final, txt])
result.write_videofile("output.mp4", fps=30)
Frame-accurate video processing framework. Python-scripted filtering, AI upscaling integration, frame interpolation, denoising.
Frame-level precision where FFmpeg filters are too blunt.
# Ubuntu
sudo apt install vapoursynth
# macOS
brew install vapoursynth
# Windows
pip install vapoursynth
AI writes a .vpy script (Python), then runs:
vspipe script.vpy output.raw | ffmpeg -i - output.mp4
Quick Reference: Tool by Use Case
| Use Case | Primary Tool | Helper Tool | Weight |
|---|---|---|---|
| Image folder → video slideshow | FFmpeg | ImageMagick (pre-crop) | Light |
| Images + videos merged with transitions | FFmpeg | — | Light |
| Python-scripted editing | MoviePy | FFmpeg (backend) | Medium |
| Download source media | yt-dlp | — | Ultra-light |
| Batch image resize/watermark/GIF | ImageMagick | — | Light |
Per-Tool Guide Summary
Resource Management Cheat Sheet
| Tool | RAM | CPU | GPU | Max Concurrent |
|---|---|---|---|---|
| yt-dlp | 50MB | Low | No | 10+ |
| ImageMagick | 300MB | Medium | No | 5–10 |
| FFmpeg (CPU) | 1GB | 100% cores | No | 1–2 |
| FFmpeg (NVENC) | 1GB | Low | Yes | 2–3 |
| MoviePy | 800MB | Medium | No | 2–3 |
| ComfyUI / RIFE / Blender | 2–12GB | Varies | Required | 1 |
Rule: Heavy GPU tools (RIFE, ComfyUI) never run together. Pair 1 heavy tool + multiple light tools safely.
AI Agent Prompt Templates
Look at `/mnt/agents/images/`. Create a 1920x1080 30fps MP4. Each image displays 3 seconds with 0.5s crossfade. Add audio from `/mnt/agents/audio/music.mp3`. Save to `/mnt/agents/output/slideshow.mp4`. Save the script you used as `/mnt/agents/output/render.sh`.
Analyze `/mnt/agents/assets/` containing images and videos. Normalize everything to 1080x1920 30fps. Hold images for 4 seconds, play videos at full duration with original audio. Apply 0.5s crossfade between all items. Output to `/mnt/agents/output/mixed.mp4`.
Start ComfyUI in headless mode on port 8188. Load the AnimateDiff workflow to animate `/mnt/agents/input/photo.png` into a 2-second video. Execute via API and save output to `/mnt/agents/output/animated.mp4`.
Output & Next Steps
| Step | Action | Command |
|---|---|---|
| Output | File lands in your specified path | ls /mnt/agents/output/ |
| Validate | Check duration, resolution, audio | ffprobe file.mp4 |
| Compress | Optimize for web upload | ffmpeg -crf 28 compressed.mp4 |
| Iterate | If wrong, edit the saved script | bash render.sh |
Pro tip: Always tell the AI to save its script. This lets you re-run, debug, or modify without burning API tokens.