Exploring the Power of AI Image Agents: OpenAI vs Grok vs Gemini Compared
Explore the power of AI image agents as we compare OpenAI, Grok, and Gemini. Discover the latest advancements in image generation and learn how to leverage these tools for your creative projects. Dive into the pros and cons of each provider and get insights from an industry expert. This comprehensive overview is a must-watch for anyone interested in the cutting-edge of AI-powered image creation.
March 26, 2025

Discover the power of AI-generated images as we put OpenAI, Grok, and Gemini to the test in a live comparison. Explore the capabilities and differences between these leading image generation models, and learn how to leverage them to create stunning visuals for your content.
The World's Greatest ChatBot Builder and AI Expert (Open AI)
Generating Images with Gemini
Removing Watermarks from Images (Gemini Edit)
Comparing Pancake Photos from Different LLMs
Conclusion
The World's Greatest ChatBot Builder and AI Expert (Open AI)
The World's Greatest ChatBot Builder and AI Expert (Open AI)
In this section, we'll be exploring the capabilities of Open AI's image generation model. We'll start by creating a new agent called "image gen agent" and connecting it to the Open AI integration.
Next, we'll use a pre-built flow template called "Generate Image" to create an image based on a user's prompt. The flow will use the Open AI "generate image" action to create the image, and then display it in the chat.
We'll test out the flow by asking it to generate an image of "the world's greatest chatbot builder and AI expert working hard in a skyscraper style office overlooking the city". Open AI's Dall-E model will then generate an image based on this prompt.
After seeing the results from Open AI, we'll repeat the process using the Gemini and Grock image generation models. This will allow us to compare the quality and style of the images produced by the different providers.
Finally, we'll use a function to edit the Grock-generated image, removing the watermark that was added. This demonstrates how the different image generation models can be combined and manipulated to achieve the desired results.
Throughout the process, we'll discuss the pros and cons of each image generation provider, as well as considerations around cost, quality, and customization. The goal is to provide a comprehensive overview of the current state of AI-powered image generation and how it can be leveraged within chatbot and conversational AI applications.
Generating Images with Gemini
Generating Images with Gemini
In this section, we'll explore the process of generating images using the Gemini language model. Gemini is one of the AI language models available in the ChatbotBuilder AI platform, and it has the capability to generate images.
We'll start by connecting to the Gemini model and setting up a flow to generate images. We'll then compare the image generation capabilities of Gemini with other models like OpenAI and Grock, and explore the differences in the generated images.
Next, we'll dive into the image editing capabilities of Gemini. We'll learn how to use the "edit image" function to modify the generated images, such as removing watermarks or adding additional elements.
Throughout the process, we'll emphasize the importance of using AI responsibly and ethically. While the technology allows for some creative and powerful applications, we'll caution against using it for unethical purposes.
By the end of this section, you'll have a solid understanding of how to leverage Gemini's image generation and editing capabilities within your ChatbotBuilder AI agents, and you'll be equipped with the knowledge to make informed decisions about the appropriate use of these tools.
Removing Watermarks from Images (Gemini Edit)
Removing Watermarks from Images (Gemini Edit)
In this section, we'll explore how to remove watermarks from images using the Gemini AI model. This can be a powerful but potentially controversial technique, so it's important to use it ethically and responsibly.
First, we'll generate an image using one of the AI models, such as Grock or OpenAI. This image will likely have a watermark from the provider.
Next, we'll use a Gemini edit function to remove the watermark from the bottom right corner of the image. This is done by passing the image to the Gemini edit function and providing instructions to remove the watermark.
The edited image, with the watermark removed, will then be saved to a custom field and displayed.
It's important to note that removing watermarks may violate the terms of service of the image providers, so this technique should only be used for educational purposes, not for commercial applications. Additionally, it's crucial to respect intellectual property rights and not use this method to reproduce copyrighted images without permission.
The key takeaway here is that while AI models can provide powerful image editing capabilities, they must be used responsibly and ethically. Removing watermarks should be done with caution and an understanding of the potential consequences.
Comparing Pancake Photos from Different LLMs
Comparing Pancake Photos from Different LLMs
We started by generating an image of "an authentic delicious looking photograph of Real Deal Canadian Flapjacks with Real Canadian syrup being drizzled on top with a big thing of butter in the middle like you see in the commercials and stuff, professionally lit with proper photography lighting as well as be taken with a 90mm macro lens by a professional photographer who studied at least 25 years in food photography out in Italy."
The results were quite interesting:
OpenAI: This image looked like a professional food photography shot, with the pancakes beautifully lit and the syrup and butter looking appetizing. It captured the essence of the prompt well.
Gemini: This image also looked very professional, with great lighting and styling. The pancakes, syrup, and butter looked mouthwatering and true to the prompt.
Grock: This image was a bit more chaotic, with the pancakes looking more like cornbread and the styling not as polished. It seemed to interpret the prompt more literally without the same level of food photography expertise.
We then tried editing the Grock image, asking the system to "replace the astronaut with text that says 'thank you for watching'." The result was a bit comical, with the text overlaid on the messy pancake image.
Overall, the comparison showed that the different LLMs have varying strengths when it comes to generating high-quality, realistic images based on detailed prompts. Gemini and OpenAI seemed to excel at producing professional-looking food photography, while Grock had more difficulty interpreting the nuances of the prompt. The ability to edit the images also varied, with the text overlay not integrating as seamlessly.
This exercise highlighted the importance of understanding the capabilities and limitations of each LLM when choosing which to use for your specific image generation needs. The speed, quality, and customization options can make a significant difference in the final results.
Conclusion
Conclusion
In this session, we explored the capabilities of various large language models (LLMs) in generating and editing images. We started by connecting to different LLMs, including OpenAI, Gemini, and Grock, and used their image generation capabilities to create a variety of images based on user prompts.
We then delved into the differences between these LLMs, examining the quality, style, and unique characteristics of the images they produced. This allowed us to gain a better understanding of the strengths and limitations of each provider.
Additionally, we experimented with the image editing capabilities of the Gemini LLM, successfully removing watermarks from generated images. This highlighted the potential for LLMs to be used for more advanced image manipulation tasks.
Throughout the session, we emphasized the importance of using AI responsibly and ethically, cautioning against the misuse of these powerful technologies.
In the conclusion, we discussed the current state of the LLM landscape, noting the potential for an "LLM bubble" as the market becomes saturated with various providers. We advised participants to remain flexible and consider multiple LLM options, as the landscape is likely to continue evolving rapidly.
The key takeaways from this session are:
- LLMs have diverse capabilities in image generation and editing, with each provider offering unique strengths and characteristics.
- Responsible and ethical use of AI is crucial, as these technologies can be misused if not handled with care.
- The LLM market is rapidly evolving, and it's important to maintain a diverse portfolio of options to adapt to the changing landscape.
By understanding the current state of LLM-powered image generation and editing, participants can make informed decisions and leverage these technologies effectively in their own projects and applications.
FAQ
FAQ