diff --git a/.codex/INSTALL.md b/.codex/INSTALL.md index 620e6af..3e76199 100644 --- a/.codex/INSTALL.md +++ b/.codex/INSTALL.md @@ -34,6 +34,7 @@ Enable MiniMax skills in Codex via native skill discovery. Just clone and symlin - **android-native-dev** — Android native application development with Material Design 3 - **ios-application-dev** — iOS application development with UIKit, SnapKit, and SwiftUI - **shader-dev** — GLSL shader techniques for stunning visual effects (ShaderToy-compatible) +- **gif-sticker-maker** — Convert photos into animated GIF stickers (Funko Pop / Pop Mart style) ## Verify diff --git a/.cursor-plugin/plugin.json b/.cursor-plugin/plugin.json index 4e27a02..3293520 100644 --- a/.cursor-plugin/plugin.json +++ b/.cursor-plugin/plugin.json @@ -1,7 +1,7 @@ { "name": "minimax-skills", "displayName": "MiniMax Skills", - "description": "MiniMax AI skills library: frontend development, fullstack development, Android native development, iOS application development, and shader development", + "description": "MiniMax AI skills library: frontend development, fullstack development, Android native development, iOS application development, shader development, and GIF sticker maker", "version": "1.0.0", "author": { "name": "MiniMax AI" @@ -9,7 +9,7 @@ "homepage": "https://github.com/MiniMax-AI/skills", "repository": "https://github.com/MiniMax-AI/skills", "license": "MIT", - "keywords": ["skills", "frontend", "fullstack", "android", "ios", "shader", "minimax"], + "keywords": ["skills", "frontend", "fullstack", "android", "ios", "shader", "gif", "sticker", "minimax"], "logo": "assets/logo.png", "skills": "./skills/" } diff --git a/.opencode/INSTALL.md b/.opencode/INSTALL.md index 6981d57..21b96fc 100644 --- a/.opencode/INSTALL.md +++ b/.opencode/INSTALL.md @@ -39,6 +39,7 @@ Verify by asking: "List available skills" - **android-native-dev** — Android native application development with Material Design 3 - **ios-application-dev** — iOS application development with UIKit, SnapKit, and SwiftUI - **shader-dev** — GLSL shader techniques for stunning visual effects (ShaderToy-compatible) +- **gif-sticker-maker** — Convert photos into animated GIF stickers (Funko Pop / Pop Mart style) ## Updating diff --git a/README.md b/README.md index 7a73e72..6dfeb48 100644 --- a/README.md +++ b/README.md @@ -15,6 +15,7 @@ Development skills for AI coding agents. Plug into your favorite AI coding tool | `android-native-dev` | Android native application development with Material Design 3. Kotlin / Jetpack Compose, adaptive layouts, Gradle configuration, accessibility (WCAG), build troubleshooting, performance optimization, and motion system. | | `ios-application-dev` | iOS application development guide covering UIKit, SnapKit, and SwiftUI. Touch targets, safe areas, navigation patterns, Dynamic Type, Dark Mode, accessibility, collection views, and Apple HIG compliance. | | `shader-dev` | Comprehensive GLSL shader techniques for creating stunning visual effects — ray marching, SDF modeling, fluid simulation, particle systems, procedural generation, lighting, post-processing, and more. ShaderToy-compatible. | +| `gif-sticker-maker` | Convert photos (people, pets, objects, logos) into 4 animated GIF stickers with captions. Funko Pop / Pop Mart style, powered by MiniMax Image & Video Generation API. | ## Installation diff --git a/README_zh.md b/README_zh.md index 6cfc7f6..eb5f64c 100644 --- a/README_zh.md +++ b/README_zh.md @@ -15,6 +15,7 @@ | `android-native-dev` | 基于 Material Design 3 的 Android 原生应用开发。Kotlin / Jetpack Compose、自适应布局、Gradle 配置、无障碍(WCAG)、构建问题排查、性能优化与动效系统。 | | `ios-application-dev` | iOS 应用开发指南,涵盖 UIKit、SnapKit 和 SwiftUI。触控目标、安全区域、导航模式、Dynamic Type、深色模式、无障碍、集合视图,符合 Apple HIG 规范。 | | `shader-dev` | 全面的 GLSL 着色器技术,用于创建惊艳的视觉效果 — 光线行进、SDF 建模、流体模拟、粒子系统、程序化生成、光照、后处理等。兼容 ShaderToy。 | +| `gif-sticker-maker` | 将照片(人物、宠物、物品、Logo)转换为 4 张带字幕的动画 GIF 贴纸。Funko Pop / Pop Mart 盲盒风格,基于 MiniMax 图片与视频生成 API。 | ## 安装 diff --git a/skills/gif-sticker-maker/SKILL.md b/skills/gif-sticker-maker/SKILL.md new file mode 100644 index 0000000..48bbeea --- /dev/null +++ b/skills/gif-sticker-maker/SKILL.md @@ -0,0 +1,127 @@ +--- +name: gif-sticker-maker +description: | + Convert photos (people, pets, objects, logos) into 4 animated GIF stickers with captions. + Use when: user wants to create cartoon stickers, GIF expressions, emoji packs, animated avatars, + or convert photos to Funko Pop / Pop Mart blind box style animations. + Triggers: sticker, GIF, cartoon, emoji, expression pack, avatar animation. +license: MIT +metadata: + version: "1.2" + category: creative-tools + style: Funko Pop / Pop Mart + output_format: GIF + output_count: 4 + sources: + - MiniMax Image Generation API + - MiniMax Video Generation API +--- + +# GIF Sticker Maker + +Convert user photos into 4 animated GIF stickers (Funko Pop / Pop Mart style). + +## Style Spec + +- Funko Pop / Pop Mart blind box 3D figurine +- C4D / Octane rendering quality +- White background, soft studio lighting +- Caption: black text + white outline, bottom of image + +## Prerequisites + +Before starting any generation step, ensure: + +1. **Python venv** is activated with dependencies from [requirements.txt](references/requirements.txt) installed +2. **`MINIMAX_API_KEY`** is exported (e.g. `export MINIMAX_API_KEY='your-key'`) +3. **`ffmpeg`** is available on PATH (for Step 3 GIF conversion) + +If any prerequisite is missing, set it up first. Do NOT proceed to generation without all three. + +## Workflow + +### Step 0: Collect Captions + +Ask user (in their language): +> "Would you like to customize the captions for your stickers, or use the defaults?" + +- **Custom**: Collect 4 short captions (1–3 words). Actions auto-match caption meaning. +- **Default**: Look up [captions table](references/captions.md) by **detected user language**. **Never mix languages.** + +### Step 1: Generate 4 Static Sticker Images + +**Tool**: `scripts/minimax_image.py` + +1. Analyze the user's photo — identify subject type (person / animal / object / logo). +2. For each of the 4 stickers, build a prompt from [image-prompt-template.txt](assets/image-prompt-template.txt) by filling `{action}` and `{caption}`. +3. **If subject is a person**: pass `--subject-ref ` so the generated figurine preserves the person's actual facial likeness. +4. Generate (all 4 are independent — **run concurrently**): + +```bash +python3 scripts/minimax_image.py "" -o output/sticker_hi.png --ratio 1:1 --subject-ref +python3 scripts/minimax_image.py "" -o output/sticker_laugh.png --ratio 1:1 --subject-ref +python3 scripts/minimax_image.py "" -o output/sticker_cry.png --ratio 1:1 --subject-ref +python3 scripts/minimax_image.py "" -o output/sticker_love.png --ratio 1:1 --subject-ref +``` + +> `--subject-ref` only works for person subjects (API limitation: type=character). +> For animals/objects/logos, omit the flag and rely on text description. + +### Step 2: Animate Each Image → Video + +**Tool**: `scripts/minimax_video.py` with `--image` flag (image-to-video mode) + +For each sticker image, build a prompt from [video-prompt-template.txt](assets/video-prompt-template.txt), then: + +```bash +python3 scripts/minimax_video.py "" --image output/sticker_hi.png -o output/sticker_hi.mp4 +python3 scripts/minimax_video.py "" --image output/sticker_laugh.png -o output/sticker_laugh.mp4 +python3 scripts/minimax_video.py "" --image output/sticker_cry.png -o output/sticker_cry.mp4 +python3 scripts/minimax_video.py "" --image output/sticker_love.png -o output/sticker_love.mp4 +``` + +All 4 calls are independent — **run concurrently**. + +### Step 3: Convert Videos → GIF + +**Tool**: `scripts/convert_mp4_to_gif.py` + +```bash +python3 scripts/convert_mp4_to_gif.py output/sticker_hi.mp4 output/sticker_laugh.mp4 output/sticker_cry.mp4 output/sticker_love.mp4 +``` + +Outputs GIF files alongside each MP4 (e.g. `sticker_hi.gif`). + +### Step 4: Deliver + +Output format (strict order): +1. Brief status line (e.g. "4 stickers created:") +2. `` block with all GIF files +3. **NO text after deliver_assets** + +```xml + +output/sticker_hi.gif +output/sticker_laugh.gif +output/sticker_cry.gif +output/sticker_love.gif + +``` + +## Default Actions + +| # | Action | Filename ID | Animation | +|---|--------|-------------|-----------| +| 1 | Happy waving | hi | Wave hand, slight head tilt | +| 2 | Laughing hard | laugh | Shake with laughter, eyes squint | +| 3 | Crying tears | cry | Tears stream, body trembles | +| 4 | Heart gesture | love | Heart hands, eyes sparkle | + +See [references/captions.md](references/captions.md) for multilingual caption defaults. + +## Rules + +- Detect user's language, all outputs follow it +- Captions MUST come from [captions.md](references/captions.md) matching user's language column — never mix languages +- All image prompts must be in **English** regardless of user language (only caption text is localized) +- `` must be LAST in response, no text after diff --git a/skills/gif-sticker-maker/assets/image-prompt-template.txt b/skills/gif-sticker-maker/assets/image-prompt-template.txt new file mode 100644 index 0000000..62ff644 --- /dev/null +++ b/skills/gif-sticker-maker/assets/image-prompt-template.txt @@ -0,0 +1,23 @@ +Transform the subject into a Funko Pop / Pop Mart blind box style 3D figurine. + +Style: +- Cute cartoon proportions (large head, small body) +- 3D rendered (C4D/Octane quality), premium plastic/vinyl finish +- Clean white background, soft studio lighting + +Subject handling: +- Person: preserve facial features, hairstyle, clothing +- Animal/Pet: preserve species, fur color, markings +- Object: stylize into cute mascot figurine +- Logo/Icon: transform to 3D toy, preserve original colors and shape + +Action: {action} +Caption: "{caption}" + +Caption rendering (CRITICAL — follow exactly): +- Black bold text with thick white outline stroke +- Large, clear sans-serif font (e.g. Impact, Helvetica Bold) +- MUST be placed at the absolute bottom center of the image as a standalone text banner +- MUST NOT appear on the character's body, clothing, or any accessory +- Leave visible gap between the character's feet and the caption text +- Text must have sharp anti-aliased edges — it must survive video animation without warping diff --git a/skills/gif-sticker-maker/assets/video-prompt-template.txt b/skills/gif-sticker-maker/assets/video-prompt-template.txt new file mode 100644 index 0000000..2c5cfc5 --- /dev/null +++ b/skills/gif-sticker-maker/assets/video-prompt-template.txt @@ -0,0 +1,14 @@ +Animate this cute 3D cartoon figurine performing: {action} + +Requirements: +- Smooth loopable motion, keep action within 6 seconds +- Character stays centered, white background remains static +- Text at bottom must stay sharp and stable — no warping, no blur + +Action reference: +- hi: wave hand cheerfully, slight head tilt +- laugh: shake with laughter, eyes squint shut +- cry: tears stream down, body trembles gently +- love: make heart gesture with both hands, eyes sparkle + +CRITICAL: The caption text must remain perfectly readable throughout the entire animation. Zero text distortion. diff --git a/skills/gif-sticker-maker/references/captions.md b/skills/gif-sticker-maker/references/captions.md new file mode 100644 index 0000000..0396ad4 --- /dev/null +++ b/skills/gif-sticker-maker/references/captions.md @@ -0,0 +1,25 @@ +# Default Captions by Language + +Select captions based on user's conversation language. + +| Action | English | Spanish | French | German | Chinese | Japanese | Korean | +|--------|---------|---------|--------|--------|---------|----------|--------| +| Waving | Hi~ | ¡Hola! | Salut~ | Hallo~ | 嗨~ | やあ~ | 안녕~ | +| Laughing | LOL | Jajaja | MDR | Haha | 哈哈哈 | 笑 | ㅋㅋㅋ | +| Crying | Boo-hoo | Buaaa | Snif | Heul | 呜呜呜 | えーん | 흑흑 | +| Heart | Love ya | Te quiero | Je t'aime | Liebe | 爱你哦 | 大好き | 사랑해 | + +## Filename Convention + +| Action | Filename ID | +|--------|-------------| +| Happy waving | hi | +| Laughing hard | laugh | +| Crying tears | cry | +| Heart gesture | love | + +## Custom Caption Guidelines + +- Keep captions short: 1-3 words work best +- Actions auto-match caption meaning (e.g., "Sleepy" → yawning action) +- Users can provide captions in any language diff --git a/skills/gif-sticker-maker/references/requirements.txt b/skills/gif-sticker-maker/references/requirements.txt new file mode 100644 index 0000000..26b28de --- /dev/null +++ b/skills/gif-sticker-maker/references/requirements.txt @@ -0,0 +1,5 @@ +# Python dependencies +requests>=2.28 + +# System dependency (install separately): +# ffmpeg — brew install ffmpeg (macOS) / apt install ffmpeg (Ubuntu) diff --git a/skills/gif-sticker-maker/scripts/convert_mp4_to_gif.py b/skills/gif-sticker-maker/scripts/convert_mp4_to_gif.py new file mode 100644 index 0000000..11fdbb5 --- /dev/null +++ b/skills/gif-sticker-maker/scripts/convert_mp4_to_gif.py @@ -0,0 +1,89 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: MIT +""" +Batch MP4 → GIF converter using ffmpeg. + +Usage: + python convert_mp4_to_gif.py sticker_hi.mp4 sticker_laugh.mp4 sticker_cry.mp4 sticker_love.mp4 + python convert_mp4_to_gif.py *.mp4 --fps 12 --width 320 + python convert_mp4_to_gif.py input.mp4 -o custom_output.gif + +Requires: ffmpeg (must be on PATH) +""" + +import os +import sys +import argparse +import subprocess +import shutil + + +def check_ffmpeg(): + if not shutil.which("ffmpeg"): + raise SystemExit("ERROR: ffmpeg not found. Install via: brew install ffmpeg / apt install ffmpeg") + + +def mp4_to_gif(input_path: str, output_path: str, fps: int = 15, width: int = 360): + """Convert a single MP4 to GIF via ffmpeg two-pass (palette for quality).""" + if not os.path.isfile(input_path): + print(f"SKIP: {input_path} not found", file=sys.stderr) + return False + + palette = output_path + ".palette.png" + scale_filter = f"fps={fps},scale={width}:-1:flags=lanczos" + + try: + subprocess.run( + ["ffmpeg", "-y", "-i", input_path, + "-vf", f"{scale_filter},palettegen=stats_mode=diff", + palette], + check=True, capture_output=True, + ) + subprocess.run( + ["ffmpeg", "-y", "-i", input_path, "-i", palette, + "-lavfi", f"{scale_filter} [x]; [x][1:v] paletteuse=dither=bayer:bayer_scale=5:diff_mode=rectangle", + output_path], + check=True, capture_output=True, + ) + except subprocess.CalledProcessError as e: + print(f"FAIL: {input_path} -> {e.stderr.decode()[-200:]}", file=sys.stderr) + return False + finally: + if os.path.exists(palette): + os.remove(palette) + + size = os.path.getsize(output_path) + print(f"OK: {size:,} bytes -> {output_path}") + return True + + +def main(): + p = argparse.ArgumentParser(description="Batch MP4 → GIF converter (ffmpeg two-pass palette)") + p.add_argument("inputs", nargs="+", help="MP4 file(s) to convert") + p.add_argument("-o", "--output", default=None, help="Output path (only for single file input)") + p.add_argument("--fps", type=int, default=15, help="GIF frame rate (default: 15)") + p.add_argument("--width", type=int, default=360, help="GIF width in pixels, height auto-scaled (default: 360)") + args = p.parse_args() + + if args.output and len(args.inputs) > 1: + raise SystemExit("ERROR: -o/--output only works with a single input file") + + check_ffmpeg() + + ok, fail = 0, 0 + for mp4 in args.inputs: + if args.output: + gif_path = args.output + else: + gif_path = os.path.splitext(mp4)[0] + ".gif" + + if mp4_to_gif(mp4, gif_path, fps=args.fps, width=args.width): + ok += 1 + else: + fail += 1 + + print(f"\nDone: {ok} converted, {fail} failed") + + +if __name__ == "__main__": + main() diff --git a/skills/gif-sticker-maker/scripts/minimax_image.py b/skills/gif-sticker-maker/scripts/minimax_image.py new file mode 100755 index 0000000..7210c09 --- /dev/null +++ b/skills/gif-sticker-maker/scripts/minimax_image.py @@ -0,0 +1,158 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: MIT +""" +MiniMax Text-to-Image — synchronous generation with optional character reference. + +Usage: + python3 minimax_image.py "A cat in space" -o cat.png + python3 minimax_image.py "Mountain landscape" -o bg.png --ratio 16:9 + python3 minimax_image.py "Funko Pop figurine waving" -o sticker.png --subject-ref photo.jpg + +Env: MINIMAX_API_KEY (required) +""" + +import os +import sys +import json +import base64 +import argparse +import requests + +API_KEY = os.getenv("MINIMAX_API_KEY") +API_BASE = "https://api.minimax.io/v1" + +ASPECT_RATIOS = ["1:1", "16:9", "4:3", "3:2", "2:3", "3:4", "9:16", "21:9"] + + +def _headers(): + if not API_KEY: + raise SystemExit("ERROR: MINIMAX_API_KEY is not set.\n export MINIMAX_API_KEY='your-key'") + return { + "Authorization": f"Bearer {API_KEY}", + "Content-Type": "application/json", + } + + +def _encode_image(image_path: str) -> str: + """Read local image file and return base64 data URI.""" + ext = os.path.splitext(image_path)[1].lower().lstrip(".") + mime_map = {"jpg": "jpeg", "jpeg": "jpeg", "png": "png", "webp": "webp"} + mime = mime_map.get(ext, "jpeg") + with open(image_path, "rb") as f: + raw = f.read() + return f"data:image/{mime};base64,{base64.b64encode(raw).decode()}" + + +def generate_image( + prompt: str, + model: str = "image-01", + aspect_ratio: str = "1:1", + n: int = 1, + response_format: str = "url", + prompt_optimizer: bool = False, + seed: int = None, + subject_reference: list = None, +) -> dict: + """Generate image(s). Returns API response dict.""" + payload = { + "model": model, + "prompt": prompt, + "aspect_ratio": aspect_ratio, + "n": n, + "response_format": response_format, + "prompt_optimizer": prompt_optimizer, + } + if seed is not None: + payload["seed"] = seed + if subject_reference: + payload["subject_reference"] = subject_reference + + resp = requests.post( + f"{API_BASE}/image_generation", + headers=_headers(), + json=payload, + timeout=120, + ) + resp.raise_for_status() + data = resp.json() + + base_resp = data.get("base_resp", {}) + if base_resp.get("status_code", 0) != 0: + raise SystemExit(f"API Error [{base_resp.get('status_code')}]: {base_resp.get('status_msg')}") + + return data + + +def download_and_save(url: str, output_path: str): + """Download image from URL and save.""" + resp = requests.get(url, timeout=60) + resp.raise_for_status() + with open(output_path, "wb") as f: + f.write(resp.content) + return len(resp.content) + + +def main(): + p = argparse.ArgumentParser(description="MiniMax Text-to-Image") + p.add_argument("prompt", help="Image description (max 1500 chars)") + p.add_argument("-o", "--output", required=True, help="Output file path (.png/.jpg)") + p.add_argument("--model", default="image-01", help="Model (default: image-01)") + p.add_argument("--ratio", default="1:1", choices=ASPECT_RATIOS, help="Aspect ratio (default: 1:1)") + p.add_argument("-n", "--count", type=int, default=1, choices=range(1, 10), help="Number of images (1-9, default: 1)") + p.add_argument("--seed", type=int, default=None, help="Random seed for reproducibility") + p.add_argument("--optimize", action="store_true", help="Enable prompt auto-optimization") + p.add_argument("--base64", action="store_true", help="Use base64 response instead of URL") + p.add_argument("--subject-ref", default=None, + help="Reference image for character likeness (local path or URL, person only)") + p.add_argument("--subject-type", default="character", + help="Subject reference type (default: character)") + args = p.parse_args() + + os.makedirs(os.path.dirname(args.output) or ".", exist_ok=True) + + subject_ref = None + if args.subject_ref: + ref_value = args.subject_ref + if not ref_value.startswith(("http://", "https://", "data:")): + ref_value = _encode_image(ref_value) + subject_ref = [{"type": args.subject_type, "image_file": ref_value}] + + fmt = "base64" if args.base64 else "url" + result = generate_image( + prompt=args.prompt, + model=args.model, + aspect_ratio=args.ratio, + n=args.count, + response_format=fmt, + prompt_optimizer=args.optimize, + seed=args.seed, + subject_reference=subject_ref, + ) + + meta = result.get("metadata", {}) + print(f"Generated: {meta.get('success_count', '?')} success, {meta.get('failed_count', '?')} failed") + + if args.base64: + images = result.get("data", {}).get("image_base64", []) + for i, b64 in enumerate(images): + path = args.output if len(images) == 1 else _numbered_path(args.output, i) + raw = base64.b64decode(b64) + with open(path, "wb") as f: + f.write(raw) + print(f"OK: {len(raw)} bytes -> {path}") + else: + urls = result.get("data", {}).get("image_urls", []) + for i, url in enumerate(urls): + path = args.output if len(urls) == 1 else _numbered_path(args.output, i) + size = download_and_save(url, path) + print(f"OK: {size} bytes -> {path}") + + +def _numbered_path(path: str, index: int) -> str: + """Insert index before extension: out.png -> out-0.png""" + base, ext = os.path.splitext(path) + return f"{base}-{index}{ext}" + + +if __name__ == "__main__": + main() diff --git a/skills/gif-sticker-maker/scripts/minimax_video.py b/skills/gif-sticker-maker/scripts/minimax_video.py new file mode 100755 index 0000000..4348b80 --- /dev/null +++ b/skills/gif-sticker-maker/scripts/minimax_video.py @@ -0,0 +1,226 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: MIT +""" +MiniMax Video Generation — supports both Text-to-Video and Image-to-Video. + +Usage (T2V): + python minimax_video.py "A cat playing piano" -o cat.mp4 + python minimax_video.py "Ocean waves [Truck left]" -o waves.mp4 --duration 10 + +Usage (I2V): + python minimax_video.py "Character waves cheerfully" --image sticker.png -o sticker.mp4 + python minimax_video.py "Figurine laughing" --image laugh.png -o laugh.mp4 --duration 6 + +Env: MINIMAX_API_KEY (required) +""" + +import os +import sys +import json +import time +import base64 +import argparse +import requests + +API_KEY = os.getenv("MINIMAX_API_KEY") +API_BASE = "https://api.minimax.io/v1" + +I2V_MODELS = [ + "MiniMax-Hailuo-2.3", + "MiniMax-Hailuo-2.3-Fast", + "MiniMax-Hailuo-02", + "I2V-01-Director", + "I2V-01-live", + "I2V-01", +] + +T2V_MODELS = [ + "MiniMax-Hailuo-2.3", + "MiniMax-Hailuo-02", + "T2V-01-Director", + "T2V-01", +] + + +def _headers(): + if not API_KEY: + raise SystemExit("ERROR: MINIMAX_API_KEY is not set.\n export MINIMAX_API_KEY='your-key'") + return { + "Authorization": f"Bearer {API_KEY}", + "Content-Type": "application/json", + } + + +def _check_resp(data): + base_resp = data.get("base_resp", {}) + code = base_resp.get("status_code", 0) + if code != 0: + msg = base_resp.get("status_msg", "Unknown error") + raise SystemExit(f"API Error [{code}]: {msg}") + + +def _encode_image(image_path: str) -> str: + """Read local image file and return base64 data URI.""" + ext = os.path.splitext(image_path)[1].lower().lstrip(".") + mime_map = {"jpg": "jpeg", "jpeg": "jpeg", "png": "png", "webp": "webp"} + mime = mime_map.get(ext, "png") + + with open(image_path, "rb") as f: + raw = f.read() + + return f"data:image/{mime};base64,{base64.b64encode(raw).decode()}" + + +def create_task( + prompt: str, + model: str = "MiniMax-Hailuo-2.3", + duration: int = 6, + resolution: str = "768P", + prompt_optimizer: bool = True, + first_frame_image: str = None, +) -> str: + """Submit a video generation task (T2V or I2V). Returns task_id.""" + payload = { + "model": model, + "prompt": prompt, + "duration": duration, + "resolution": resolution, + "prompt_optimizer": prompt_optimizer, + } + + if first_frame_image: + payload["first_frame_image"] = first_frame_image + + resp = requests.post( + f"{API_BASE}/video_generation", + headers=_headers(), + json=payload, + timeout=30, + ) + resp.raise_for_status() + data = resp.json() + _check_resp(data) + + task_id = data.get("task_id") + if not task_id: + raise SystemExit(f"No task_id in response: {json.dumps(data, indent=2)}") + return task_id + + +def poll_task(task_id: str, interval: int = 10, max_wait: int = 600) -> str: + """Poll task status until Success. Returns file_id.""" + elapsed = 0 + while elapsed < max_wait: + resp = requests.get( + f"{API_BASE}/query/video_generation", + headers=_headers(), + params={"task_id": task_id}, + timeout=30, + ) + resp.raise_for_status() + data = resp.json() + _check_resp(data) + + status = data.get("status", "") + file_id = data.get("file_id", "") + + if status == "Success": + if not file_id: + raise SystemExit("Task succeeded but no file_id returned") + print(f" Done! file_id={file_id}") + return file_id + elif status == "Fail": + raise SystemExit(f"Video generation failed: {json.dumps(data, indent=2)}") + else: + print(f" [{elapsed}s] Status: {status}...") + time.sleep(interval) + elapsed += interval + + raise SystemExit(f"Timeout after {max_wait}s. task_id={task_id}, check manually.") + + +def download_video(file_id: str, output_path: str): + """Retrieve download URL via file_id and save the video.""" + resp = requests.get( + f"{API_BASE}/files/retrieve", + headers=_headers(), + params={"file_id": file_id}, + timeout=30, + ) + resp.raise_for_status() + data = resp.json() + _check_resp(data) + + download_url = data.get("file", {}).get("download_url", "") + if not download_url: + raise SystemExit(f"No download_url in response: {json.dumps(data, indent=2)}") + + print(f" Downloading from {download_url[:80]}...") + video_resp = requests.get(download_url, timeout=300) + video_resp.raise_for_status() + + os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True) + with open(output_path, "wb") as f: + f.write(video_resp.content) + + print(f"OK: {len(video_resp.content)} bytes -> {output_path}") + + +def generate( + prompt: str, + output_path: str, + model: str = "MiniMax-Hailuo-2.3", + duration: int = 6, + resolution: str = "768P", + prompt_optimizer: bool = True, + poll_interval: int = 10, + max_wait: int = 600, + image_path: str = None, +): + """Full pipeline: create task -> poll -> download.""" + mode = "I2V" if image_path else "T2V" + print(f"Creating {mode} task...") + print(f" Model: {model} | Duration: {duration}s | Resolution: {resolution}") + if image_path: + print(f" Image: {image_path}") + print(f" Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}") + + first_frame = _encode_image(image_path) if image_path else None + task_id = create_task(prompt, model, duration, resolution, prompt_optimizer, first_frame) + print(f" task_id={task_id}") + print(f"Waiting for generation...") + + file_id = poll_task(task_id, poll_interval, max_wait) + download_video(file_id, output_path) + + +def main(): + all_models = sorted(set(T2V_MODELS + I2V_MODELS)) + p = argparse.ArgumentParser(description="MiniMax Video Generation (T2V + I2V)") + p.add_argument("prompt", help="Video description (max 2000 chars). Use [Camera Command] for camera control.") + p.add_argument("-o", "--output", required=True, help="Output file path (.mp4)") + p.add_argument("--image", default=None, help="First frame image path for I2V mode (jpg/png/webp, <20MB)") + p.add_argument("--model", default="MiniMax-Hailuo-2.3", choices=all_models, + help="Model (default: MiniMax-Hailuo-2.3)") + p.add_argument("--duration", type=int, default=6, choices=[6, 10], help="Duration in seconds (default: 6)") + p.add_argument("--resolution", default="768P", choices=["720P", "768P", "1080P"], help="Resolution (default: 768P)") + p.add_argument("--no-optimize", action="store_true", help="Disable prompt auto-optimization") + p.add_argument("--poll-interval", type=int, default=10, help="Poll interval in seconds (default: 10)") + p.add_argument("--max-wait", type=int, default=600, help="Max wait time in seconds (default: 600)") + args = p.parse_args() + + generate( + prompt=args.prompt, + output_path=args.output, + model=args.model, + duration=args.duration, + resolution=args.resolution, + prompt_optimizer=not args.no_optimize, + poll_interval=args.poll_interval, + max_wait=args.max_wait, + image_path=args.image, + ) + + +if __name__ == "__main__": + main()