OpenAI has recently announced the release of two new models, o3 and o4-mini, which feature an exciting new capability: “Thinking with Images.” This ability allows the models to integrate images into their reasoning processes, making their analyses far more comprehensive.
In a demonstration, one user tested o3 by submitting a three-year-old photograph. Remarkably, the model took just seven minutes to determine the specific city, industrial park, and even the river where the photo was taken. The user’s initial skepticism about the model’s analytical speed quickly faded upon reviewing its reasoning process, which was impressively thorough, involving observation, searching, inference, and verification.
O3 deduced that the photo depicted a newly developed tourist area or unique town, rather than an established water town like Wuzhen or Zhouzhuang, by examining modern-day details such as skylights, chain railings, and concealed lighting features. A sign in the image led o3 to uncover that the establishment was a sub-brand of an accommodation group in Zhejiang province, which is located in a notable area of Hangzhou.
To confirm its findings, o3 considered external factors like weather conditions on the date indicated in the photo file name. By verifying that Hangzhou experienced snowfall on that day in 2022, the model wrapped up its analysis and accurately pinpointed the location in Yuhang District.
In another astonishing demonstration, o3 was presented with a photo containing no textual information—just plants, a distant windmill, and mountains. In under two minutes, o3 confidently identified the location as the Umon Grassland in Guizhou. It achieved this by recognizing the unique combination of features apparent in the image, including high altitude grasslands and specific plant types.
When asked to speculate the date of a winter photograph from 1996 featuring the Oriental Pearl Tower in Shanghai, o3 utilized the absence of nearby landmarks, like the Jin Mao Tower and the Shanghai World Financial Center, to determine a timeframe between 1995 and 1998.
Beyond simply identifying locations and dates, o3 demonstrated versatility by pinpointing individuals in group photos, recognizing automotive dashboard designs, and even identifying bird species from images posted on social media.
However, despite its impressive capabilities, o3 is not infallible. Users reported several instances where the model misidentified locations, highlighting that it still has room for growth. The underlying technology driving these upgrades relies on “reinforcement learning,” emphasizing that longer thought processes yield increasingly accurate results.
While OpenAI’s advancements push the boundaries of artificial intelligence, they also raise questions about the implications for personal privacy. As AI technology becomes more adept at analyzing images, the challenge of safeguarding individual information intensifies.