With all the buzz around OpenAI’s project “Strawberry,” I was eager to try out OpenAI’s o1 preview when it launched. At first, it felt like an incremental update. The more I explored, the more I realized this model is a significant step forward and a preview of what is to come. Here’s why:
Three Innovations Power o1
I had hoped OpenAI would implement “self-taught reasoning,” where models can evaluate and refine their internal processing (something akin to human “thoughts”). While o1 isn’t there yet, it combines three key innovations: Deep Reinforcement Learning (Q-learning), “Chain of Thought” (CoT), and a “Tree of Thoughts” approach:
- Q-learning, likely the origin of the OpenAI code name “Q*,” allows the model to learn from rewards and penalties, enabling it to solve problems autonomously.
- CoT breaks complex problems into smaller steps, guiding the model through each one. Originally a prompting strategy, o1 includes mechanisms to recursively break down complex prompts into a series of steps and then execute those steps to arrive at answers.
- A “Tree of Thoughts” capability acts as a mental scratchpad, allowing the model to explore different solutions down a complex tree of possibilities, backtrack when it hits a dead end, and refine its reasoning. This step-by-step approach mirrors how humans tackle complex tasks, leading to more organized, transparent solutions.
Safety: A Key Need for Advanced AI Agents
OpenAI improved the model’s ability to reason through safety protocols, making it much more resistant to jailbreak attempts (efforts to bypass its safeguards). In safety tests, o1 scored 84 out of 100, compared to GPT-4’s score of just 22. OpenAI is also working with AI safety institutes in the U.S. and U.K. to further evaluate and refine these capabilities. This improvement will make o1 a strong candidate for future applications where AI agents must operate autonomously while adhering to company policies and regulations. In its current form a lack of tool access that prevent the model preview from taking actions.
Trading Speed for Accuracy and Explanation
If you want to do a fun demo, ask Chat GPT 4o how many r’s are in the word Strawberry. It may tell you 2. This is because the model represents the word as tokens rather than letter by letter. Ask o1 and you will see it think for a split second and get the answer right.
To test both model’s capabilities, I asked both ChatGPT 4o and o1 to develop a quantum circuit that solves a Max-Cut optimization problem. o1 clearly outperformed GPT-4, not only delivering a better solution than GPT 4o, but also providing a detailed explanation of its reasoning process. This transparency is crucial for business applications in regulated industries, where explainability is key.
The additional accuracy comes at the cost of time – o1 takes longer to generate results. In my case, o1 took 8 seconds more than GPT-4. This makes it unsuitable for real-time applications, but ideal for decision-support systems where detailed reasoning is more important than speed.
The model’s higher computational demands also translate into a higher price: $15 per 1 million input tokens and $60 per 1 million output tokens, compared to GPT-4o’s $5 and $15, respectively. Also, you pay for the tokens it uses in internal “thinking” as well as tokens for input and output. Businesses will need to weigh o1’s capabilities against its cost and determine where it fits into their system architecture.
Conclusion: A Small Step That Is Actually A Big Leap
At first glance, o1 may seem like a minor update, but it marks a major step forward in AI reasoning. As OpenAI’s strategy of steady improvements released incrementally continues, improvements in problem-solving, explainability, and safety lay the groundwork for future breakthroughs. I hope introspection and self-teaching are coming soon. While the higher cost and slower speed are trade-offs, o1 is better for use cases where transparency and accuracy are essential and can justify the extra resources.
As you think through what o1 means for your generative and agentic AI aspirations, clients can have a guidance session with me to discuss what this all means in the short and long term and how you plan for the rapid pace of AI progress.