Add gif-sticker-maker skill

2026-03-17 19:54:03 +08:00
parent 76bd138fa4
commit 2e6511d8f6
13 changed files with 673 additions and 2 deletions
--- a/.codex/INSTALL.md
+++ b/.codex/INSTALL.md
@@ -34,6 +34,7 @@ Enable MiniMax skills in Codex via native skill discovery. Just clone and symlin
 - **android-native-dev** — Android native application development with Material Design 3
 - **ios-application-dev** — iOS application development with UIKit, SnapKit, and SwiftUI
 - **shader-dev** — GLSL shader techniques for stunning visual effects (ShaderToy-compatible)
 - **gif-sticker-maker** — Convert photos into animated GIF stickers (Funko Pop / Pop Mart style)
 ## Verify
--- a/.cursor-plugin/plugin.json
+++ b/.cursor-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
  "name": "minimax-skills",
  "displayName": "MiniMax Skills",
-  "description": "MiniMax AI skills library: frontend development, fullstack development, Android native development, iOS application development, and shader development",
+  "description": "MiniMax AI skills library: frontend development, fullstack development, Android native development, iOS application development, shader development, and GIF sticker maker",
  "version": "1.0.0",
  "author": {
    "name": "MiniMax AI"
@@ -9,7 +9,7 @@
  "homepage": "https://github.com/MiniMax-AI/skills",
  "repository": "https://github.com/MiniMax-AI/skills",
  "license": "MIT",
-  "keywords": ["skills", "frontend", "fullstack", "android", "ios", "shader", "minimax"],
+  "keywords": ["skills", "frontend", "fullstack", "android", "ios", "shader", "gif", "sticker", "minimax"],
  "logo": "assets/logo.png",
  "skills": "./skills/"
 }
--- a/.opencode/INSTALL.md
+++ b/.opencode/INSTALL.md
@@ -39,6 +39,7 @@ Verify by asking: "List available skills"
 - **android-native-dev** — Android native application development with Material Design 3
 - **ios-application-dev** — iOS application development with UIKit, SnapKit, and SwiftUI
 - **shader-dev** — GLSL shader techniques for stunning visual effects (ShaderToy-compatible)
 - **gif-sticker-maker** — Convert photos into animated GIF stickers (Funko Pop / Pop Mart style)
 ## Updating
--- a/README.md
+++ b/README.md
@@ -15,6 +15,7 @@ Development skills for AI coding agents. Plug into your favorite AI coding tool
 | `android-native-dev` | Android native application development with Material Design 3. Kotlin / Jetpack Compose, adaptive layouts, Gradle configuration, accessibility (WCAG), build troubleshooting, performance optimization, and motion system. |
 | `ios-application-dev` | iOS application development guide covering UIKit, SnapKit, and SwiftUI. Touch targets, safe areas, navigation patterns, Dynamic Type, Dark Mode, accessibility, collection views, and Apple HIG compliance. |
 | `shader-dev` | Comprehensive GLSL shader techniques for creating stunning visual effects — ray marching, SDF modeling, fluid simulation, particle systems, procedural generation, lighting, post-processing, and more. ShaderToy-compatible. |
 | `gif-sticker-maker` | Convert photos (people, pets, objects, logos) into 4 animated GIF stickers with captions. Funko Pop / Pop Mart style, powered by MiniMax Image & Video Generation API. |
 ## Installation
--- a/README_zh.md
+++ b/README_zh.md
@@ -15,6 +15,7 @@
 | `android-native-dev` | 基于 Material Design 3 的 Android 原生应用开发。Kotlin / Jetpack Compose、自适应布局、Gradle 配置、无障碍（WCAG）、构建问题排查、性能优化与动效系统。 |
 | `ios-application-dev` | iOS 应用开发指南，涵盖 UIKit、SnapKit 和 SwiftUI。触控目标、安全区域、导航模式、Dynamic Type、深色模式、无障碍、集合视图，符合 Apple HIG 规范。 |
 | `shader-dev` | 全面的 GLSL 着色器技术，用于创建惊艳的视觉效果 — 光线行进、SDF 建模、流体模拟、粒子系统、程序化生成、光照、后处理等。兼容 ShaderToy。 |
 | `gif-sticker-maker` | 将照片（人物、宠物、物品、Logo）转换为 4 张带字幕的动画 GIF 贴纸。Funko Pop / Pop Mart 盲盒风格，基于 MiniMax 图片与视频生成 API。 |
 ## 安装
--- a/skills/gif-sticker-maker/SKILL.md
+++ b/skills/gif-sticker-maker/SKILL.md
@@ -0,0 +1,127 @@
 ---
 name: gif-sticker-maker
 description: |
  Convert photos (people, pets, objects, logos) into 4 animated GIF stickers with captions.
  Use when: user wants to create cartoon stickers, GIF expressions, emoji packs, animated avatars,
  or convert photos to Funko Pop / Pop Mart blind box style animations.
  Triggers: sticker, GIF, cartoon, emoji, expression pack, avatar animation.
 license: MIT
 metadata:
  version: "1.2"
  category: creative-tools
  style: Funko Pop / Pop Mart
  output_format: GIF
  output_count: 4
  sources:
    - MiniMax Image Generation API
    - MiniMax Video Generation API
 ---
 # GIF Sticker Maker
 Convert user photos into 4 animated GIF stickers (Funko Pop / Pop Mart style).
 ## Style Spec
 - Funko Pop / Pop Mart blind box 3D figurine
 - C4D / Octane rendering quality
 - White background, soft studio lighting
 - Caption: black text + white outline, bottom of image
 ## Prerequisites
 Before starting any generation step, ensure:
 1. **Python venv** is activated with dependencies from [requirements.txt](references/requirements.txt) installed
 2. **`MINIMAX_API_KEY`** is exported (e.g. `export MINIMAX_API_KEY='your-key'`)
 3. **`ffmpeg`** is available on PATH (for Step 3 GIF conversion)
 If any prerequisite is missing, set it up first. Do NOT proceed to generation without all three.
 ## Workflow
 ### Step 0: Collect Captions
 Ask user (in their language):
 > "Would you like to customize the captions for your stickers, or use the defaults?"
 - **Custom**: Collect 4 short captions (1–3 words). Actions auto-match caption meaning.
 - **Default**: Look up [captions table](references/captions.md) by **detected user language**. **Never mix languages.**
 ### Step 1: Generate 4 Static Sticker Images
 **Tool**: `scripts/minimax_image.py`
 1. Analyze the user's photo — identify subject type (person / animal / object / logo).
 2. For each of the 4 stickers, build a prompt from [image-prompt-template.txt](assets/image-prompt-template.txt) by filling `{action}` and `{caption}`.
 3. **If subject is a person**: pass `--subject-ref <user_photo_path>` so the generated figurine preserves the person's actual facial likeness.
 4. Generate (all 4 are independent — **run concurrently**):
 ```bash
 python3 scripts/minimax_image.py "<prompt>" -o output/sticker_hi.png --ratio 1:1 --subject-ref <photo>
 python3 scripts/minimax_image.py "<prompt>" -o output/sticker_laugh.png --ratio 1:1 --subject-ref <photo>
 python3 scripts/minimax_image.py "<prompt>" -o output/sticker_cry.png --ratio 1:1 --subject-ref <photo>
 python3 scripts/minimax_image.py "<prompt>" -o output/sticker_love.png --ratio 1:1 --subject-ref <photo>
 ```
 > `--subject-ref` only works for person subjects (API limitation: type=character).
 > For animals/objects/logos, omit the flag and rely on text description.
 ### Step 2: Animate Each Image → Video
 **Tool**: `scripts/minimax_video.py` with `--image` flag (image-to-video mode)
 For each sticker image, build a prompt from [video-prompt-template.txt](assets/video-prompt-template.txt), then:
 ```bash
 python3 scripts/minimax_video.py "<prompt>" --image output/sticker_hi.png -o output/sticker_hi.mp4
 python3 scripts/minimax_video.py "<prompt>" --image output/sticker_laugh.png -o output/sticker_laugh.mp4
 python3 scripts/minimax_video.py "<prompt>" --image output/sticker_cry.png -o output/sticker_cry.mp4
 python3 scripts/minimax_video.py "<prompt>" --image output/sticker_love.png -o output/sticker_love.mp4
 ```
 All 4 calls are independent — **run concurrently**.
 ### Step 3: Convert Videos → GIF
 **Tool**: `scripts/convert_mp4_to_gif.py`
 ```bash
 python3 scripts/convert_mp4_to_gif.py output/sticker_hi.mp4 output/sticker_laugh.mp4 output/sticker_cry.mp4 output/sticker_love.mp4
 ```
 Outputs GIF files alongside each MP4 (e.g. `sticker_hi.gif`).
 ### Step 4: Deliver
 Output format (strict order):
 1. Brief status line (e.g. "4 stickers created:")
 2. `<deliver_assets>` block with all GIF files
 3. **NO text after deliver_assets**
 ```xml
 <deliver_assets>
 <item><path>output/sticker_hi.gif</path></item>
 <item><path>output/sticker_laugh.gif</path></item>
 <item><path>output/sticker_cry.gif</path></item>
 <item><path>output/sticker_love.gif</path></item>
 </deliver_assets>
 ```
 ## Default Actions
 | # | Action | Filename ID | Animation |
 |---|--------|-------------|-----------|
 | 1 | Happy waving | hi | Wave hand, slight head tilt |
 | 2 | Laughing hard | laugh | Shake with laughter, eyes squint |
 | 3 | Crying tears | cry | Tears stream, body trembles |
 | 4 | Heart gesture | love | Heart hands, eyes sparkle |
 See [references/captions.md](references/captions.md) for multilingual caption defaults.
 ## Rules
 - Detect user's language, all outputs follow it
 - Captions MUST come from [captions.md](references/captions.md) matching user's language column — never mix languages
 - All image prompts must be in **English** regardless of user language (only caption text is localized)
 - `<deliver_assets>` must be LAST in response, no text after
--- a/skills/gif-sticker-maker/assets/image-prompt-template.txt
+++ b/skills/gif-sticker-maker/assets/image-prompt-template.txt
@@ -0,0 +1,23 @@
 Transform the subject into a Funko Pop / Pop Mart blind box style 3D figurine.
 Style:
 - Cute cartoon proportions (large head, small body)
 - 3D rendered (C4D/Octane quality), premium plastic/vinyl finish
 - Clean white background, soft studio lighting
 Subject handling:
 - Person: preserve facial features, hairstyle, clothing
 - Animal/Pet: preserve species, fur color, markings
 - Object: stylize into cute mascot figurine
 - Logo/Icon: transform to 3D toy, preserve original colors and shape
 Action: {action}
 Caption: "{caption}"
 Caption rendering (CRITICAL — follow exactly):
 - Black bold text with thick white outline stroke
 - Large, clear sans-serif font (e.g. Impact, Helvetica Bold)
 - MUST be placed at the absolute bottom center of the image as a standalone text banner
 - MUST NOT appear on the character's body, clothing, or any accessory
 - Leave visible gap between the character's feet and the caption text
 - Text must have sharp anti-aliased edges — it must survive video animation without warping
--- a/skills/gif-sticker-maker/assets/video-prompt-template.txt
+++ b/skills/gif-sticker-maker/assets/video-prompt-template.txt
@@ -0,0 +1,14 @@
 Animate this cute 3D cartoon figurine performing: {action}
 Requirements:
 - Smooth loopable motion, keep action within 6 seconds
 - Character stays centered, white background remains static
 - Text at bottom must stay sharp and stable — no warping, no blur
 Action reference:
 - hi: wave hand cheerfully, slight head tilt
 - laugh: shake with laughter, eyes squint shut
 - cry: tears stream down, body trembles gently
 - love: make heart gesture with both hands, eyes sparkle
 CRITICAL: The caption text must remain perfectly readable throughout the entire animation. Zero text distortion.
--- a/skills/gif-sticker-maker/references/captions.md
+++ b/skills/gif-sticker-maker/references/captions.md
@@ -0,0 +1,25 @@
 # Default Captions by Language
 Select captions based on user's conversation language.
 | Action | English | Spanish | French | German | Chinese | Japanese | Korean |
 |--------|---------|---------|--------|--------|---------|----------|--------|
 | Waving | Hi~ | ¡Hola! | Salut~ | Hallo~ | 嗨~ | やあ~ | 안녕~ |
 | Laughing | LOL | Jajaja | MDR | Haha | 哈哈哈 | 笑 | ㅋㅋㅋ |
 | Crying | Boo-hoo | Buaaa | Snif | Heul | 呜呜呜 | えーん | 흑흑 |
 | Heart | Love ya | Te quiero | Je t'aime | Liebe | 爱你哦 | 大好き | 사랑해 |
 ## Filename Convention
 | Action | Filename ID |
 |--------|-------------|
 | Happy waving | hi |
 | Laughing hard | laugh |
 | Crying tears | cry |
 | Heart gesture | love |
 ## Custom Caption Guidelines
 - Keep captions short: 1-3 words work best
 - Actions auto-match caption meaning (e.g., "Sleepy" → yawning action)
 - Users can provide captions in any language
--- a/skills/gif-sticker-maker/references/requirements.txt
+++ b/skills/gif-sticker-maker/references/requirements.txt
@@ -0,0 +1,5 @@
 # Python dependencies
 requests>=2.28
 # System dependency (install separately):
 #   ffmpeg — brew install ffmpeg (macOS) / apt install ffmpeg (Ubuntu)
--- a/skills/gif-sticker-maker/scripts/convert_mp4_to_gif.py
+++ b/skills/gif-sticker-maker/scripts/convert_mp4_to_gif.py
@@ -0,0 +1,89 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: MIT
 """
 Batch MP4 → GIF converter using ffmpeg.
 Usage:
  python convert_mp4_to_gif.py sticker_hi.mp4 sticker_laugh.mp4 sticker_cry.mp4 sticker_love.mp4
  python convert_mp4_to_gif.py *.mp4 --fps 12 --width 320
  python convert_mp4_to_gif.py input.mp4 -o custom_output.gif
 Requires: ffmpeg (must be on PATH)
 """
 import os
 import sys
 import argparse
 import subprocess
 import shutil
 def check_ffmpeg():
    if not shutil.which("ffmpeg"):
        raise SystemExit("ERROR: ffmpeg not found. Install via: brew install ffmpeg / apt install ffmpeg")
 def mp4_to_gif(input_path: str, output_path: str, fps: int = 15, width: int = 360):
    """Convert a single MP4 to GIF via ffmpeg two-pass (palette for quality)."""
    if not os.path.isfile(input_path):
        print(f"SKIP: {input_path} not found", file=sys.stderr)
        return False
    palette = output_path + ".palette.png"
    scale_filter = f"fps={fps},scale={width}:-1:flags=lanczos"
    try:
        subprocess.run(
            ["ffmpeg", "-y", "-i", input_path,
             "-vf", f"{scale_filter},palettegen=stats_mode=diff",
             palette],
            check=True, capture_output=True,
        )
        subprocess.run(
            ["ffmpeg", "-y", "-i", input_path, "-i", palette,
             "-lavfi", f"{scale_filter} [x]; [x][1:v] paletteuse=dither=bayer:bayer_scale=5:diff_mode=rectangle",
             output_path],
            check=True, capture_output=True,
        )
    except subprocess.CalledProcessError as e:
        print(f"FAIL: {input_path} -> {e.stderr.decode()[-200:]}", file=sys.stderr)
        return False
    finally:
        if os.path.exists(palette):
            os.remove(palette)
    size = os.path.getsize(output_path)
    print(f"OK: {size:,} bytes -> {output_path}")
    return True
 def main():
    p = argparse.ArgumentParser(description="Batch MP4 → GIF converter (ffmpeg two-pass palette)")
    p.add_argument("inputs", nargs="+", help="MP4 file(s) to convert")
    p.add_argument("-o", "--output", default=None, help="Output path (only for single file input)")
    p.add_argument("--fps", type=int, default=15, help="GIF frame rate (default: 15)")
    p.add_argument("--width", type=int, default=360, help="GIF width in pixels, height auto-scaled (default: 360)")
    args = p.parse_args()
    if args.output and len(args.inputs) > 1:
        raise SystemExit("ERROR: -o/--output only works with a single input file")
    check_ffmpeg()
    ok, fail = 0, 0
    for mp4 in args.inputs:
        if args.output:
            gif_path = args.output
        else:
            gif_path = os.path.splitext(mp4)[0] + ".gif"
        if mp4_to_gif(mp4, gif_path, fps=args.fps, width=args.width):
            ok += 1
        else:
            fail += 1
    print(f"\nDone: {ok} converted, {fail} failed")
 if __name__ == "__main__":
    main()
--- a/skills/gif-sticker-maker/scripts/minimax_image.py
+++ b/skills/gif-sticker-maker/scripts/minimax_image.py
@@ -0,0 +1,158 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: MIT
 """
 MiniMax Text-to-Image — synchronous generation with optional character reference.
 Usage:
  python3 minimax_image.py "A cat in space" -o cat.png
  python3 minimax_image.py "Mountain landscape" -o bg.png --ratio 16:9
  python3 minimax_image.py "Funko Pop figurine waving" -o sticker.png --subject-ref photo.jpg
 Env: MINIMAX_API_KEY (required)
 """
 import os
 import sys
 import json
 import base64
 import argparse
 import requests
 API_KEY = os.getenv("MINIMAX_API_KEY")
 API_BASE = "https://api.minimax.io/v1"
 ASPECT_RATIOS = ["1:1", "16:9", "4:3", "3:2", "2:3", "3:4", "9:16", "21:9"]
 def _headers():
    if not API_KEY:
        raise SystemExit("ERROR: MINIMAX_API_KEY is not set.\n  export MINIMAX_API_KEY='your-key'")
    return {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    }
 def _encode_image(image_path: str) -> str:
    """Read local image file and return base64 data URI."""
    ext = os.path.splitext(image_path)[1].lower().lstrip(".")
    mime_map = {"jpg": "jpeg", "jpeg": "jpeg", "png": "png", "webp": "webp"}
    mime = mime_map.get(ext, "jpeg")
    with open(image_path, "rb") as f:
        raw = f.read()
    return f"data:image/{mime};base64,{base64.b64encode(raw).decode()}"
 def generate_image(
    prompt: str,
    model: str = "image-01",
    aspect_ratio: str = "1:1",
    n: int = 1,
    response_format: str = "url",
    prompt_optimizer: bool = False,
    seed: int = None,
    subject_reference: list = None,
 ) -> dict:
    """Generate image(s). Returns API response dict."""
    payload = {
        "model": model,
        "prompt": prompt,
        "aspect_ratio": aspect_ratio,
        "n": n,
        "response_format": response_format,
        "prompt_optimizer": prompt_optimizer,
    }
    if seed is not None:
        payload["seed"] = seed
    if subject_reference:
        payload["subject_reference"] = subject_reference
    resp = requests.post(
        f"{API_BASE}/image_generation",
        headers=_headers(),
        json=payload,
        timeout=120,
    )
    resp.raise_for_status()
    data = resp.json()
    base_resp = data.get("base_resp", {})
    if base_resp.get("status_code", 0) != 0:
        raise SystemExit(f"API Error [{base_resp.get('status_code')}]: {base_resp.get('status_msg')}")
    return data
 def download_and_save(url: str, output_path: str):
    """Download image from URL and save."""
    resp = requests.get(url, timeout=60)
    resp.raise_for_status()
    with open(output_path, "wb") as f:
        f.write(resp.content)
    return len(resp.content)
 def main():
    p = argparse.ArgumentParser(description="MiniMax Text-to-Image")
    p.add_argument("prompt", help="Image description (max 1500 chars)")
    p.add_argument("-o", "--output", required=True, help="Output file path (.png/.jpg)")
    p.add_argument("--model", default="image-01", help="Model (default: image-01)")
    p.add_argument("--ratio", default="1:1", choices=ASPECT_RATIOS, help="Aspect ratio (default: 1:1)")
    p.add_argument("-n", "--count", type=int, default=1, choices=range(1, 10), help="Number of images (1-9, default: 1)")
    p.add_argument("--seed", type=int, default=None, help="Random seed for reproducibility")
    p.add_argument("--optimize", action="store_true", help="Enable prompt auto-optimization")
    p.add_argument("--base64", action="store_true", help="Use base64 response instead of URL")
    p.add_argument("--subject-ref", default=None,
                   help="Reference image for character likeness (local path or URL, person only)")
    p.add_argument("--subject-type", default="character",
                   help="Subject reference type (default: character)")
    args = p.parse_args()
    os.makedirs(os.path.dirname(args.output) or ".", exist_ok=True)
    subject_ref = None
    if args.subject_ref:
        ref_value = args.subject_ref
        if not ref_value.startswith(("http://", "https://", "data:")):
            ref_value = _encode_image(ref_value)
        subject_ref = [{"type": args.subject_type, "image_file": ref_value}]
    fmt = "base64" if args.base64 else "url"
    result = generate_image(
        prompt=args.prompt,
        model=args.model,
        aspect_ratio=args.ratio,
        n=args.count,
        response_format=fmt,
        prompt_optimizer=args.optimize,
        seed=args.seed,
        subject_reference=subject_ref,
    )
    meta = result.get("metadata", {})
    print(f"Generated: {meta.get('success_count', '?')} success, {meta.get('failed_count', '?')} failed")
    if args.base64:
        images = result.get("data", {}).get("image_base64", [])
        for i, b64 in enumerate(images):
            path = args.output if len(images) == 1 else _numbered_path(args.output, i)
            raw = base64.b64decode(b64)
            with open(path, "wb") as f:
                f.write(raw)
            print(f"OK: {len(raw)} bytes -> {path}")
    else:
        urls = result.get("data", {}).get("image_urls", [])
        for i, url in enumerate(urls):
            path = args.output if len(urls) == 1 else _numbered_path(args.output, i)
            size = download_and_save(url, path)
            print(f"OK: {size} bytes -> {path}")
 def _numbered_path(path: str, index: int) -> str:
    """Insert index before extension: out.png -> out-0.png"""
    base, ext = os.path.splitext(path)
    return f"{base}-{index}{ext}"
 if __name__ == "__main__":
    main()
--- a/skills/gif-sticker-maker/scripts/minimax_video.py
+++ b/skills/gif-sticker-maker/scripts/minimax_video.py
@@ -0,0 +1,226 @@
 #!/usr/bin/env python3
 # SPDX-License-Identifier: MIT
 """
 MiniMax Video Generation — supports both Text-to-Video and Image-to-Video.
 Usage (T2V):
  python minimax_video.py "A cat playing piano" -o cat.mp4
  python minimax_video.py "Ocean waves [Truck left]" -o waves.mp4 --duration 10
 Usage (I2V):
  python minimax_video.py "Character waves cheerfully" --image sticker.png -o sticker.mp4
  python minimax_video.py "Figurine laughing" --image laugh.png -o laugh.mp4 --duration 6
 Env: MINIMAX_API_KEY (required)
 """
 import os
 import sys
 import json
 import time
 import base64
 import argparse
 import requests
 API_KEY = os.getenv("MINIMAX_API_KEY")
 API_BASE = "https://api.minimax.io/v1"
 I2V_MODELS = [
    "MiniMax-Hailuo-2.3",
    "MiniMax-Hailuo-2.3-Fast",
    "MiniMax-Hailuo-02",
    "I2V-01-Director",
    "I2V-01-live",
    "I2V-01",
 ]
 T2V_MODELS = [
    "MiniMax-Hailuo-2.3",
    "MiniMax-Hailuo-02",
    "T2V-01-Director",
    "T2V-01",
 ]
 def _headers():
    if not API_KEY:
        raise SystemExit("ERROR: MINIMAX_API_KEY is not set.\n  export MINIMAX_API_KEY='your-key'")
    return {
        "Authorization": f"Bearer {API_KEY}",
        "Content-Type": "application/json",
    }
 def _check_resp(data):
    base_resp = data.get("base_resp", {})
    code = base_resp.get("status_code", 0)
    if code != 0:
        msg = base_resp.get("status_msg", "Unknown error")
        raise SystemExit(f"API Error [{code}]: {msg}")
 def _encode_image(image_path: str) -> str:
    """Read local image file and return base64 data URI."""
    ext = os.path.splitext(image_path)[1].lower().lstrip(".")
    mime_map = {"jpg": "jpeg", "jpeg": "jpeg", "png": "png", "webp": "webp"}
    mime = mime_map.get(ext, "png")
    with open(image_path, "rb") as f:
        raw = f.read()
    return f"data:image/{mime};base64,{base64.b64encode(raw).decode()}"
 def create_task(
    prompt: str,
    model: str = "MiniMax-Hailuo-2.3",
    duration: int = 6,
    resolution: str = "768P",
    prompt_optimizer: bool = True,
    first_frame_image: str = None,
 ) -> str:
    """Submit a video generation task (T2V or I2V). Returns task_id."""
    payload = {
        "model": model,
        "prompt": prompt,
        "duration": duration,
        "resolution": resolution,
        "prompt_optimizer": prompt_optimizer,
    }
    if first_frame_image:
        payload["first_frame_image"] = first_frame_image
    resp = requests.post(
        f"{API_BASE}/video_generation",
        headers=_headers(),
        json=payload,
        timeout=30,
    )
    resp.raise_for_status()
    data = resp.json()
    _check_resp(data)
    task_id = data.get("task_id")
    if not task_id:
        raise SystemExit(f"No task_id in response: {json.dumps(data, indent=2)}")
    return task_id
 def poll_task(task_id: str, interval: int = 10, max_wait: int = 600) -> str:
    """Poll task status until Success. Returns file_id."""
    elapsed = 0
    while elapsed < max_wait:
        resp = requests.get(
            f"{API_BASE}/query/video_generation",
            headers=_headers(),
            params={"task_id": task_id},
            timeout=30,
        )
        resp.raise_for_status()
        data = resp.json()
        _check_resp(data)
        status = data.get("status", "")
        file_id = data.get("file_id", "")
        if status == "Success":
            if not file_id:
                raise SystemExit("Task succeeded but no file_id returned")
            print(f"  Done! file_id={file_id}")
            return file_id
        elif status == "Fail":
            raise SystemExit(f"Video generation failed: {json.dumps(data, indent=2)}")
        else:
            print(f"  [{elapsed}s] Status: {status}...")
            time.sleep(interval)
            elapsed += interval
    raise SystemExit(f"Timeout after {max_wait}s. task_id={task_id}, check manually.")
 def download_video(file_id: str, output_path: str):
    """Retrieve download URL via file_id and save the video."""
    resp = requests.get(
        f"{API_BASE}/files/retrieve",
        headers=_headers(),
        params={"file_id": file_id},
        timeout=30,
    )
    resp.raise_for_status()
    data = resp.json()
    _check_resp(data)
    download_url = data.get("file", {}).get("download_url", "")
    if not download_url:
        raise SystemExit(f"No download_url in response: {json.dumps(data, indent=2)}")
    print(f"  Downloading from {download_url[:80]}...")
    video_resp = requests.get(download_url, timeout=300)
    video_resp.raise_for_status()
    os.makedirs(os.path.dirname(output_path) or ".", exist_ok=True)
    with open(output_path, "wb") as f:
        f.write(video_resp.content)
    print(f"OK: {len(video_resp.content)} bytes -> {output_path}")
 def generate(
    prompt: str,
    output_path: str,
    model: str = "MiniMax-Hailuo-2.3",
    duration: int = 6,
    resolution: str = "768P",
    prompt_optimizer: bool = True,
    poll_interval: int = 10,
    max_wait: int = 600,
    image_path: str = None,
 ):
    """Full pipeline: create task -> poll -> download."""
    mode = "I2V" if image_path else "T2V"
    print(f"Creating {mode} task...")
    print(f"  Model: {model} | Duration: {duration}s | Resolution: {resolution}")
    if image_path:
        print(f"  Image: {image_path}")
    print(f"  Prompt: {prompt[:100]}{'...' if len(prompt) > 100 else ''}")
    first_frame = _encode_image(image_path) if image_path else None
    task_id = create_task(prompt, model, duration, resolution, prompt_optimizer, first_frame)
    print(f"  task_id={task_id}")
    print(f"Waiting for generation...")
    file_id = poll_task(task_id, poll_interval, max_wait)
    download_video(file_id, output_path)
 def main():
    all_models = sorted(set(T2V_MODELS + I2V_MODELS))
    p = argparse.ArgumentParser(description="MiniMax Video Generation (T2V + I2V)")
    p.add_argument("prompt", help="Video description (max 2000 chars). Use [Camera Command] for camera control.")
    p.add_argument("-o", "--output", required=True, help="Output file path (.mp4)")
    p.add_argument("--image", default=None, help="First frame image path for I2V mode (jpg/png/webp, <20MB)")
    p.add_argument("--model", default="MiniMax-Hailuo-2.3", choices=all_models,
                   help="Model (default: MiniMax-Hailuo-2.3)")
    p.add_argument("--duration", type=int, default=6, choices=[6, 10], help="Duration in seconds (default: 6)")
    p.add_argument("--resolution", default="768P", choices=["720P", "768P", "1080P"], help="Resolution (default: 768P)")
    p.add_argument("--no-optimize", action="store_true", help="Disable prompt auto-optimization")
    p.add_argument("--poll-interval", type=int, default=10, help="Poll interval in seconds (default: 10)")
    p.add_argument("--max-wait", type=int, default=600, help="Max wait time in seconds (default: 600)")
    args = p.parse_args()
    generate(
        prompt=args.prompt,
        output_path=args.output,
        model=args.model,
        duration=args.duration,
        resolution=args.resolution,
        prompt_optimizer=not args.no_optimize,
        poll_interval=args.poll_interval,
        max_wait=args.max_wait,
        image_path=args.image,
    )
 if __name__ == "__main__":
    main()