ChatGPT now interprets photos better than an art critic and an investigator combined

ChatGPT’s recent image generation capabilities have challenged our previous understanding of AI-generated media. The recently announced GPT-4o model demonstrates noteworthy abilities of interpreting images with high accuracy and recreating them with viral effects, such as that inspired by Studio Ghibli. It even masters text in AI-generated images, which has previously been difficult for AI. And now, it is launching two new models capable of dissecting images for cues to gather far more information that might even fail a human glance.

OpenAI announced two new models earlier this week that take ChatGPT’s thinking abilities up a notch. Its new o3 model, which OpenAI calls its “most powerful reasoning model” improves on the existing interpretation and perception abilities, getting better at “coding, math, science, visual perception, and more,” the organization claims. Meanwhile, the o4-mini is a smaller and faster model for “cost-efficient reasoning” in the same avenues. The news follows OpenAI’s recent launch of the GPT-4.1 class of models, which brings faster processing and deeper context.

ChatGPT is now “thinking with images”

With improvements to their abilities to reason, both models can now incorporate images in their reasoning process, which makes them capable of “thinking with images,” OpenAI proclaims. With this change, both models can integrate images in their chain of thought. Going beyond basic analysis of images, the o3 and o4-mini models can investigate images more closely and even manipulate them through actions such as cropping, zooming, flipping, or enriching details to fetch any visual cues from the images that could potentially improve ChatGPT’s ability to provide solutions.

Introducing OpenAI o3 and o4-mini—our smartest and most capable models to date.

For the first time, our reasoning models can agentically use and combine every tool within ChatGPT, including web search, Python, image analysis, file interpretation, and image generation. pic.twitter.com/rDaqV0x0wE

— OpenAI (@OpenAI) April 16, 2025

With the announcement, it is said that the models blend visual and textual reasoning, which can be integrated with other ChatGPT features such as web search, data analysis, and code generation, and is expected to become the basis for a more advanced AI agents with multimodal analysis.

Recommended Videos

Among other practical applications, you can expect to include pictures of a multitude of items, such flow charts or scribble from handwritten notes to images of real-world objects, and expect ChatGPT to have a deeper understanding for a better output, even without a descriptive text prompt. With this, OpenAI is inching closer to Google’s Gemini, which offers the impressive ability to interpret the real world through live video.

Despite bold claims, OpenAI is limiting access only to paid members, presumably to prevent its GPUs from “melting” again, as it struggles to keep up the compute demand for new reasoning features. As of now, the o3, o4-mini, and o4-mini-high models will be exclusively available to ChatGPT Plus, Pro, and Team members while Enterprise and Education tier users get it in one week’s time. Meanwhile, Free users will be able to limited access to o4-mini when they select the “Think” button in the prompt bar.

Comments on "ChatGPT now interprets photos better than an art critic and an investigator combined" :

Leave a Reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Apple loses AI whiz to Meta with an offer that will make your eyes water
COMPUTING

Apple loses AI whiz to Meta with an offer that will make your eyes water

It was just last month that OpenAI boss Sam Altman claimed that Meta had been trying to poach his to...

Read More →
Apple could finally fix Siri on iPhones with help from Google’s Gemini
COMPUTING

Apple could finally fix Siri on iPhones with help from Google’s Gemini

“Find me a decent coffee shop where I can sit and get work done?” I uttered into my iPhone’s m...

Read More →
The success of WWDC 2025 hangs on Apple Intelligence. This is what it needs to
COMPUTING

The success of WWDC 2025 hangs on Apple Intelligence. This is what it needs to

Apple WWDC This story is part of our complete Apple WWDC covera...

Read More →
ChatGPT can now remember more details from your past conversations
COMPUTING

ChatGPT can now remember more details from your past conversations

OpenAI has just announced that ChatGPT received a major upgrade to its memory features. The chatbot ...

Read More →
ChatGPT’s Advanced Voice Mode now has a ‘better personality’
COMPUTING

ChatGPT’s Advanced Voice Mode now has a ‘better personality’

If you find that ChatGPT’s Advanced Voice Mode is a little too keen to jump in when you’re engag...

Read More →
ChatGPT Plus is free for a limited time: Here’s how to check if you qualify
COMPUTING

ChatGPT Plus is free for a limited time: Here’s how to check if you qualify

ChatGPT didn’t just emerge onto the AI scene, it birthed an entire revolution of AI assistants and...

Read More →
Ray-Ban Meta AI glasses go high fashion with Coperni limited edition
COMPUTING

Ray-Ban Meta AI glasses go high fashion with Coperni limited edition

Meta delivered an unexpected runaway success with its Ray-Ban Stories smart glasses, and now, it is ...

Read More →
The Gemini app is now the only way to access Google’s AI on iOS
COMPUTING

The Gemini app is now the only way to access Google’s AI on iOS

GoogleGoogle announced Wednesday that it is removing its Gemini AI model from the Google app on iOS,...

Read More →
Google Pixel 9 is getting a scam detection upgrade you’ll want on your phone
COMPUTING

Google Pixel 9 is getting a scam detection upgrade you’ll want on your phone

Over three months ago, Google started beta testing a new safety feature for Pixel phones that can se...

Read More →