Something totally different – AI image generation capabilities comparison. The big free ones compared.

I have been trying out the image generation capabilities of the big general purpose AI’s (Google, Microsoft, XAi, ChatGPT, Meta AI) with a quite complex subject which is a dragon with a human rider based on an old game I played 25 years ago called Drakan: Order of the Flame. Almost all of the AI models made the same mistakes; they cannot draw dragon feet consistently. What I mean by this is sometimes one foot might have 3 toes, another 5 and when using Grok Imagine to make videos from these images often the dragon will morph another toe during the video. There’s also other errors such as the saddle strap going through the wings etc so some significant after editing in photoshop / Gimp was required. All AI models were given the same prompt. Anyway without too much rambling here are some image comparisons:-

The prompt

Arokh Leaping more detailled centred 4K — Image used for the prompt.

“using this dragon create an image with a female human warrior riding him with a saddle, no helmet on the woman. Background is a forest mountain canyon with two wooden huts on the canyon floor, smoke coming from the chimneys. A stream runs through the canyon. There are snow capped mountains in the distance.”

This prompt was used for all AI programs without any corrections to guide the AI.

Microsoft Copilot

As you can see this produced a fairly high resolution (1024 pixels high) somewhat dark oil painting style image, closest to what a human would actually paint rather than a 3D render. Accurately draws the dragon’s feet without any extra toes however Rynn the rider’s face leaves a lot to be desired. The dragon accurately represents the original reference image. Further testing proves that Copilot produces accurate results from the prompt although it sometimes misses details for example the stream and huts are missing. Generally images are dark and muddy although they have an artistic style rather than something that would be considered digital art or a 3D render. Even with a prompt telling it to produce a 3D render style image it won’t. Makes some errors such as an image of a dragon in a medieval street put the dragon’s wings behind a large building which looked wrong. Cons with Copilot is that is extremely slow at generating images and will only produce one sample to try.

ChatGPT (free version)

This produces results similar to Copilot as in the images produced are more “human like” artistic style rather than a 3D render which would make sense as Copilot is loosely based on the GPT model. This made the most errors such as extra toes on the dragon’s foot, two claws on the end of one toe, missing eye and some clipping. Images are lower resolution than the other AI’s I have tried at 533 pixels high. Accurately represented (albeit rendering) issues the dragon from the reference image and did not ignore any of the prompt producing exactly what it was asked to do. Produced excellent details and crisp textures. The edited and upscaled version is the header image of this article. The paid versions of ChatGPT will produce higher resolution images and let you generate more images per 24 hour period. Generates images fairly quickly but not as fast as Grok or Gemini.

Grok (XAi) free

Produced a similar looking dragon to the reference image but did not ignore any of the prompt. This AI often does mess up dragon’s feet, the worst of the lot especially with Grok Imagine where extra toes will morph into shape during the video. It’s unable to produce consistent images between generations but will produce lots of variants for you to scroll through and choose which one you like best. It is extremely fast at generating images and they are in the 3D render style rather than artistic styles of Copilot and ChatGPT. They tend to lack detail in the textures giving a somewhat cartoony, cell shaded image. Images are fairly high resolution and aspect ratio will depend on the source but generally they are around 720 pixels high. The example on the right has been upscaled to 4K resolution. This is the only AI that will produce NSFW content too although many images and videos are auto moderated. I’m not even going to mention what Grok Imagine image to video did when given an image of a man kneeling before a dragon.

Facebook (Meta AI)

This one is probably the worst producing inaccurate image relating to the prompt. I think I will let the image speak for itself here. Images are low resolution at 595 pixels high with the width depending on the source image in this case 16:9 aspect ratio so it generated a 1080×595 pixel image. Accuracy is kind of there but this creature it made is not a dragon. Subsequent alterations of the prompt and telling it what it is doing wrong didn’t get what I wanted. It is quick at generating images and will produce a couple for you to choose from.

Google Gemini Nano Banana standard

Google have the best AI image generator by far but the free versions are limited to only a few generations per day (even more so with Banana Pro) it messed up the dragon’s horns merging them into one another (the image to the right has been edited) but otherwise produces high quality high resolution images at 1080 pixels high. It is accurate to the prompt and produces images in whatever style you tell it to; the image example on the right is the default unprompted rendering style. It is very fast with only Grok being faster but only produces one image at a time. If you ask it to change anything or retry you can run out of credits for the day and will have to wait.

Google Gemini Nano Banana Pro

Produces the highest quality and is capable of editing existing images by simply describing what you want to add or change. It can improve low resolution images and add more detailed textures to images. However it makes many rendering mistakes such as extra toes and / or claws on the dragon’s feet, the saddle strap going through the wing etc. The image to the left is an upscaled, edited example. Unfortunately I deleted the original before writing this article. Cons is you only get 2 goes per 24 hours without paying for a subscription. Image resolution is fairly high – around 1080 pixels vertical resolution. Best at image editing after generating with another AI model.

Finally two images of Nano Banana Pro’s image enhancing capabilities. I hope you find this article useful. It’s not electronics but I thought I’d share my experiences with AI image generation and editing.