Video & 3D Tools

Generate videos, add captions, create 3D models, and optimize them for the web.

These tools go beyond static images. Generate short video clips for social media and ads, add captions to existing videos, convert product photos into 3D models, and optimize those models for web delivery. Each tool takes a bit longer to process than the image tools, but the results are worth the wait.

Video Generation

Credit cost varies by duration and model. Generate short video clips from a text prompt, a reference image, or both.

You can create product showcases, social media clips, or animated scenes. Provide an image as a starting frame to keep the video consistent with your brand visuals. Text-only prompts work too, but image-guided generation gives you more control over the look.

Input requirements: For image-to-video, use a high-quality source image (at least 720p). JPG and PNG are both supported. For text-to-video, just write your prompt.

Output: MP4 format. Typical duration is 3-5 seconds depending on the model. Resolution varies by model but generally outputs at 720p or 1080p.

Video Captions

1 credit per video. Automatically transcribes speech and burns captions directly into your video.

Upload a video with spoken audio, and the tool generates accurate captions with proper timing. The captions are embedded in the video itself, so they show up on any platform without needing a separate subtitle file.

Input requirements: MP4 or MOV format with clear audio. Background music is fine as long as the speech is audible. Videos up to 10 minutes are supported.

Output: MP4 with embedded captions. You can customize font, size, and position before rendering. The tool supports multiple languages for transcription.

Image to 3D

80 credits. Converts a single photograph into a full 3D model. This is the most resource-intensive tool in the platform, which is why the credit cost is higher.

Give it a product photo, and it generates a textured 3D model you can rotate, embed on your website, or use in AR experiences. The AI infers the back and sides of the object from the single front-facing view.

Input requirements: One clear photo of the object against a simple background. The subject should fill most of the frame. Front-facing or slightly angled shots produce the best models. Avoid extreme angles or heavy cropping.

Output: GLB format (the standard for web 3D). The model includes geometry and a texture map. You can preview it directly in the browser or download it for use in 3D software, AR viewers, or your website.

GLB Optimizer

Free. Compresses and optimizes GLB (3D model) files for faster web delivery without visible quality loss.

3D models straight from the Image-to-3D tool or other 3D software are often larger than they need to be. The optimizer reduces file size by compressing textures, simplifying geometry where it is not visible, and applying Draco mesh compression. Typical size reduction is 40-70%.

Input requirements: GLB or GLTF files. No size limit, but larger files take longer to process.

Output: Optimized GLB file. Visually identical to the original at normal viewing distances, just much smaller. Perfect for embedding 3D viewers on product pages where load time matters.

Credit costs at a glance