Generating images

Need a picture? Just ask for one. In an agentic chat, describing the image you want is enough — Catalyst generates it and drops it straight into the reply, ready to use in the rest of your work.

A Catalyst chat where the user asks for an image; the assistant generates it and shows the finished picture inline in the reply.

No menus, no separate tool. You write what you see in your head — “generate an image of a red bicycle leaning against a sunlit brick wall” — and the picture comes back in the conversation. From there you can keep iterating in plain language (“make it warmer,” “remove the basket”), refer back to earlier images, or flow a generated image into an artifact or a workflow.

The models

Catalyst doesn’t lock you into one image engine. You enable the models you want in Settings, and the assistant picks the best one for each request — or you can name a specific model right in your message (“…using Ideogram”).

Catalyst Settings showing the image-generation section: a checklist of image models — Z-Image (default), Ideogram 4, OpenAI gpt-image-2, and Gemini nano-banana — each able to be enabled independently.

Here’s the same prompt — “cozy bookstore café, rainy evening” — through all four, so you can see how their personalities differ:

Z-Image — fast, free default

A cozy bookstore café on a rainy evening, generated by Z-Image — warm lamplight, shelves of books, rain on the window.

A quick, no-cost general-purpose model. The default, and a great first pass for almost anything.

Ideogram 4 — typography & posters

A cozy bookstore café on a rainy evening, generated by Ideogram 4 — crisp detail and strong composition.

Also free. The strongest of the free pair for text in images — posters, logos, signage, anything with words that need to read cleanly.

Gemini (nano-banana) — creative & high-fidelity

A cozy bookstore café on a rainy evening, generated by Gemini nano-banana — rich, atmospheric, photographic detail.

A paid model with strong creative composition, multilingual text, high fidelity, and the multi-image editing powers below.

OpenAI (gpt-image-2) — photoreal & precise

A cozy bookstore café on a rainy evening, generated by OpenAI gpt-image-2 — photorealistic lighting and texture.

A paid model that leans photorealistic, renders reliable text, and follows detailed instructions closely.

Sizes & aspect ratios

By default images come back square. To change the shape, just say so in your request — Catalyst maps your words to whatever the chosen model supports:

Ask for an aspect ratio like 16:9, 9:16, or 21:9 — best with Ideogram 4 and Gemini.
Ask for a size like 1024x1536 — best with Z-Image and OpenAI.

You don’t have to memorize which is which: ask for the shape you want in plain language (“make it a wide 16:9 banner” or “1024x1536, portrait”) and the assistant routes it to a model that can deliver it.

Same prompt — “serene mountain lake at sunrise” — in four shapes via Gemini:

A serene mountain lake at sunrise in a 1:1 square frame.

Text in images

Putting legible words inside a picture — a headline, a logo, a label — is the hardest thing for image models, and where the right choice matters most. Ideogram 4 and Gemini are the strongest here.

The trick: put the exact words in quotes so the model knows precisely what to render — “a conference poster, headline reads ‘VOOV SUMMIT 2026’, subtitle ‘The AI Workspace’”.

Ideogram 4

Gemini

Editing an image

You don’t have to regenerate from scratch to make a change. Hand Catalyst an existing image — reply with one attached, or just refer to a picture from earlier in the conversation — and say what to change:

Change the background to a beach.

Make it look like a watercolor painting.

Remove the person on the left.

Catalyst automatically picks the best available editor for the job and returns the revised image inline. Keep going with more edits to dial it in.

Combining multiple images

Catalyst can also blend several images into one. Attach two or more pictures and describe how to merge them — this is composition, not a side-by-side collage.

Attach two or more images to your message — for example, the café and the lake from above.
Describe the composition in plain language, giving each image a role:

Combine these: put the café on the left and the lake on the right, as one seamless scene.
Catalyst composes them using Gemini and returns a single new image.

A Catalyst chat with two source images attached, the compose tool call, and the combined result below it.

And the blended result — two separate pictures merged into one coherent scene:

A single composed image blending the bookstore café and the mountain lake into one seamless scene.

Prompting tips

A few habits that make a big difference:

Be specific and visual. Name the subject, setting, lighting, mood, and style. “Soft morning light, muted colors, shallow depth of field” beats “nice photo.”
Quote any text exactly. For words inside the image, put them in quotes and say where they go — and lean on Ideogram 4 or Gemini.
Pick the model for the job. Typography and posters → Ideogram 4 or Gemini; photorealism → OpenAI or Gemini; a fast, free first draft → Z-Image.
Iterate in small steps. Generate, then edit one thing at a time (“warmer,” “remove the basket,” “tighter crop”) rather than rewriting the whole prompt.

Working with the result

Every generated image carries a small panel — hover (or tap) to see which model produced it, and to expand it full-screen or download the file.

A generated image in chat with its hover panel visible: a model badge plus expand and download controls.

Generated images don’t have to stay in the chat. Pull one into an artifact like an HTML gallery, or wire image generation into a workflow — the workflow Image node uses the same engine — to produce a whole batch on a schedule.