OpenAI Launches First Reasoning Model O1: Solves 83% Math Problems

OpenAI Launches First Reasoning Model O1: Solves 83% Math Problems

On September 13, OpenAI announced the launch of a new model named o1, which is the first in a planned series of “reasoning” models designed to answer complex questions more quickly than humans. Alongside o1, the company also introduced a smaller and more affordable version called o1-mini, which had previously been referred to in discussions as the Strawberry model.

The introduction of o1 signifies a significant step for OpenAI towards achieving more human-like artificial intelligence. This model is notably better at writing code and resolving multi-step problems compared to its predecessors, although it is also more expensive to use and operates at a slower speed than GPT-4o. OpenAI describes this release as a “preview,” underscoring that it remains in its early stages.

Starting today, ChatGPT Plus and Team users can access o1-preview and o1-mini, while enterprise and educational users will gain access early next week. OpenAI has announced plans to eventually extend access to o1-mini for all free users, although a specific date has not yet been established.

Costs for developers to access o1 are considerably higher; the API pricing for o1-preview is set at $15 for every million input tokens and $60 for each million output tokens. In contrast, the pricing for GPT-4o is significantly lower, at $5 per million input tokens and $15 per million output tokens.

Jerry Tworek, OpenAI’s head of research, elaborated that the training process for o1 is fundamentally different from that of earlier models, although the company is vague on the specific details. He mentioned that o1 utilizes a completely new optimization algorithm along with a custom dataset for its training.

OpenAI claims that this innovative training methodology should enhance the model’s accuracy, with Tworek noting a decrease in “hallucinations,” or incorrect outputs, associated with the new model. However, he cautioned that challenges still persist, indicating, “We can’t say we’ve solved the hallucination problem.”

The primary distinction between this new model and GPT-4o lies in its improved ability to tackle intricate problems, such as programming and mathematics, while also providing clarity on its reasoning process. OpenAI’s Chief Research Scientist, Bob McGrew, noted, “This model absolutely surpasses my capabilities in solving AP math exams, and I pursued a minor in math in college.” He also revealed that o1 was tested using qualification exams for the International Mathematical Olympiad, where GPT-4o successfully resolved only 13% of the problems, while o1 achieved an impressive 83%.

In an online programming competition known as Codeforces, o1 ranked 89th among participants, and OpenAI has indicated that future updates of this model aim to perform at a doctoral-level on challenging tasks in fields like physics, chemistry, and biology.

However, o1 does not excel in many areas as effectively as its predecessor GPT-4o. It struggles with factual knowledge about the world and lacks the capability to browse the web or handle files and images. Despite these limitations, OpenAI perceives o1 as a representation of a new frontier in AI capabilities. The designation “o1” symbolizes a reset to the starting point in this ongoing evolution.

“I must admit, we have traditionally had poor naming conventions,” McGrew acknowledged, expressing hope that this designation reflects a clearer communication of the company’s intentions moving forward.

The media has yet to experience the new model firsthand, but OpenAI’s technical team provided a glimpse of its processing capabilities by asking it to solve a complex puzzle: “When the princess’s age is double the prince’s future age, the princess’s age equals the prince’s current age, and at that time, the princess’s age is half her current age plus the prince’s current age. What are the ages of the prince and princess? List all possible solutions.”

The model took approximately 30 seconds to arrive at the correct answers. OpenAI’s designed interface allows users to see the reasoning steps taken by the model during its thought process, creating an impressive illusion of human-like contemplation with phrases such as “I’m curious,” “I’m thinking,” and “Okay, let me see.”

Yet, the model is not genuinely thinking; it is certainly not human. When asked why it was designed to appear as if it was reflecting, Tworek asserted that OpenAI does not equate AI thinking with human thinking. He explained that the interface is intended to demonstrate how the model spends more time processing and delving into problem-solving, stating, “In some respects, it feels more human than previous models.”

McGrew added, “I think you’ll see many aspects where it appears a bit alien, but also surprisingly human.” The model is given limited time to process queries, which may lead it to express urgency, saying things like, “Oh, I’m running out of time; let me hurry and give an answer.” Early in its reasoning process, it may even seem to brainstorm, asking itself, “I could do this or that; what should I do?”

As OpenAI reportedly aims to raise additional funds at an astounding $150 billion valuation, the company’s progress relies heavily on further research breakthroughs. The introduction of reasoning capabilities in large language models is part of OpenAI’s vision of a future where autonomous systems can represent and make decisions on behalf of users.

For AI researchers, mastering reasoning represents a crucial next step toward achieving human-level intelligence. If a model can advance beyond mere pattern recognition, it could lead to breakthroughs in fields such as medicine and engineering. Currently, o1’s reasoning abilities are relatively slow, and its use remains costly for developers.

“We’ve spent months focused on reasoning because we believe it’s truly a key breakthrough,” McGrew stated. “Fundamentally, this is a new model paradigm designed to tackle genuinely difficult problems, which is essential for advancing towards human-level intelligence.”

Exit mobile version