Image generation

Generating realistic images with the company's merchandise

Applications

In my practice, I have developed a system for generating photorealistic images for company merchandise—T-shirts, hoodies, sweatshirts, caps, and accessories. The system replaces traditional photoshoots involving studio rentals, photographers, and models. Instead, images are created on an RTX 5090 GPU using modern image generation and editing models. This allows marketing and design teams to obtain dozens of visual options for landing pages, advertising campaigns, and marketplaces within minutes, while significantly reducing production costs.

Who Is It For?

I recommend this solution to e-commerce businesses and clothing brands that regularly update their collections and need fast, high-quality visuals for product cards. It is also ideal for marketing agencies conducting A/B creative testing who want to cheaply generate dozens of variations for different audiences and platforms. SaaS and product companies can use it to create branded merch and promotional materials without organizing photoshoots. Finally, it serves small and medium-sized businesses that are not ready to invest in expensive production but want a visual result on par with studio photography.

Technologies

RTX 5090 and CUDA for Performance

The system is deployed on an NVIDIA RTX 5090 graphics card using CUDA, providing a significant performance overhead for high-definition image generation. According to independent tests, the RTX 5090 demonstrates a tangible speed increase compared to the 4090 in diffusion generation and neural network tasks, especially during batch processing. In practice, this allows for stable operation at a resolution of 1024×1024 with 25–30 diffusion steps to obtain maximum detail and commercially viable merch images without compromising quality.

Text-to-Image Generation with Tongyi-MAI/Z-Image-Turbo

For generating primary merch images, I use the Tongyi-MAI/Z-Image-Turbo model—a modern text-to-image engine optimized for high speed and photorealism. The model creates images of people wearing clothes with a specified design: you can describe the type of merch, logo placement, brand style, model pose, and lighting conditions. With correctly formulated English prompts, Z-Image-Turbo returns clean, photorealistic pictures without noticeable artifacts, making it a fundamental tool for generating new merch visuals.

Image-to-Image Editing with diffusers/FLUX.2-dev-bnb-4bit

When it is necessary to not just generate a model from scratch but to modify existing photos of people (e.g., employees, ambassadors, or models), I use diffusers/FLUX.2-dev-bnb-4bit. This is a quantized version of FLUX.2, which is excellent for photorealistic image-to-image tasks. The system takes the original photo and accurately replaces the clothing, prints, and colors with branded merch while preserving the face, pose, background, and lighting. The result is "live" shots that are visually indistinguishable from a high-quality studio photoshoot.

Thanks to 4-bit quantization (bnb-4bit), FLUX.2 operates within a compact amount of video memory while maintaining high generation quality and realistic anatomy. This allows image-to-image transformations to run on the same RTX 5090 in parallel with text-to-image tasks without hitting VRAM limitations.

Quality Optimization: 1024×1024 and 25–30 Diffusion Steps

To focus on quality rather than just speed, I optimized the pipeline for a fixed resolution of 1024×1024 and an increased number of diffusion steps—usually 25–30 steps for both Z-Image-Turbo and FLUX.2. This results in noticeably sharper fabric textures, clean logo edges, natural shadows, and realistic faces compared to "fast" modes using 6–10 steps. The models run in FP16/NF4, pipelines are constantly warmed up in the RTX 5090 memory, and all heavy operations are performed on the GPU. If necessary, the system can work in batches, generating several merch options simultaneously without significant degradation in response time.

Correct English Prompts for Merch

A key role in the quality of the result is played by well-crafted prompts in English. I use templates that describe: the type of clothing, logo placement and size, model pose, lighting style, lens type, and level of photorealism. Example: "a full-body photo of a young adult wearing a white t-shirt with a large centered dark blue company logo, studio lighting, 85mm lens, hyper realistic, detailed fabric texture, no extra text." Such prompts help the model generate clean commercial images without unnecessary labels, "extra hands," or visual clutter.

Realistic Faces and Artifact Control

To achieve clean faces and hands in image-to-image mode, I limit the degree of model intervention in the original photo: instead of a radical redraw of the frame, the system focuses only on clothing and prints. Prompts explicitly state the preservation of the face and pose, while negative prompts set bans on artifacts (no extra limbs, no distorted hands, no deformed faces). This reduces the likelihood of typical defects found in generative models. In practice, the result is natural photos of people in branded merch that can be confidently used on websites, in advertising, and in printed materials.

Savings on Photoshoots and Content Scaling

The system significantly reduces photo production costs: there is no longer a need to regularly rent a studio or hire a photographer, models, and retouchers for every change in merch design. Instead, the marketing team receives a tool with which dozens or hundreds of visual variations for different platforms and audiences can be generated in a single day. If a new logo, slogan, or clothing collection appears, it is enough to update the prompts and start generation without preparing and conducting a new photoshoot.

Contact me on Telegram →