More Than Meets The Eye

Contents

Three Innovations Power o1 Safety: A Key Need for Advanced AI Agents Trading Speed for Accuracy and Explanation Conclusion: A Small Step That Is Actually A Big Leap

With all the buzz around OpenAI’s project “Strawberry,” I was eager to try out OpenAI’s o1 preview when it launched. At first, it felt like an incremental update. The more I explored, the more I realized this model is a significant step forward and a preview of what is to come. Here’s why:

Three Innovations Power o1

I had hoped OpenAI would implement “self-taught reasoning,” where models can evaluate and refine their internal processing (something akin to human “thoughts”). While o1 isn’t there yet, it combines three key innovations: Deep Reinforcement Learning (Q-learning), “Chain of Thought” (CoT), and a “Tree of Thoughts” approach:

Q-learning, likely the origin of the OpenAI code name “Q*,” allows the model to learn from rewards and penalties, enabling it to solve problems autonomously.

CoT breaks complex problems into smaller steps, guiding the model through each one. Originally a prompting strategy, o1 includes mechanisms to recursively break down complex prompts into a series of steps and then execute those steps to arrive at answers.

A “Tree of Thoughts” capability acts as a mental scratchpad, allowing the model to explore different solutions down a complex tree of possibilities, backtrack when it hits a dead end, and refine its reasoning. This step-by-step approach mirrors how humans tackle complex tasks, leading to more organized, transparent solutions.

Safety: A Key Need for Advanced AI Agents

OpenAI improved the model’s ability to reason through safety protocols, making it much more resistant to jailbreak attempts (efforts to bypass its safeguards). In safety tests, o1 scored 84 out of 100, compared to GPT-4’s score of just 22. OpenAI is also working with AI safety institutes in the U.S. and U.K. to further evaluate and refine these capabilities. This improvement will make o1 a strong candidate for future applications where AI agents must operate autonomously while adhering to company policies and regulations. In its current form a lack of tool access that prevent the model preview from taking actions.

Trading Speed for Accuracy and Explanation

If you want to do a fun demo, ask Chat GPT 4o how many r’s are in the word Strawberry. It may tell you 2. This is because the model represents the word as tokens rather than letter by letter. Ask o1 and you will see it think for a split second and get the answer right.

To test both model’s capabilities, I asked both ChatGPT 4o and o1 to develop a quantum circuit that solves a Max-Cut optimization problem. o1 clearly outperformed GPT-4, not only delivering a better solution than GPT 4o, but also providing a detailed explanation of its reasoning process. This transparency is crucial for business applications in regulated industries, where explainability is key.

The additional accuracy comes at the cost of time – o1 takes longer to generate results. In my case, o1 took 8 seconds more than GPT-4. This makes it unsuitable for real-time applications, but ideal for decision-support systems where detailed reasoning is more important than speed.

The model’s higher computational demands also translate into a higher price: $15 per 1 million input tokens and $60 per 1 million output tokens, compared to GPT-4o’s $5 and $15, respectively. Also, you pay for the tokens it uses in internal “thinking” as well as tokens for input and output. Businesses will need to weigh o1’s capabilities against its cost and determine where it fits into their system architecture.

Conclusion: A Small Step That Is Actually A Big Leap

At first glance, o1 may seem like a minor update, but it marks a major step forward in AI reasoning. As OpenAI’s strategy of steady improvements released incrementally continues, improvements in problem-solving, explainability, and safety lay the groundwork for future breakthroughs. I hope introspection and self-teaching are coming soon. While the higher cost and slower speed are trade-offs, o1 is better for use cases where transparency and accuracy are essential and can justify the extra resources.

As you think through what o1 means for your generative and agentic AI aspirations, clients can have a guidance session with me to discuss what this all means in the short and long term and how you plan for the rapid pace of AI progress.

More Than Meets The Eye

Three Innovations Power o1

Safety: A Key Need for Advanced AI Agents

Trading Speed for Accuracy and Explanation

Conclusion: A Small Step That Is Actually A Big Leap

Leave a Reply Cancel reply

Stay Connected

Latest News

Forrester’s 2024 Technology Strategy Impact Award Winners For EMEA

Key Takeaways From The Forrester Wave™

DRÄXLMAIER Group Wins The 15th Annual Enterprise Architecture Award In EMEA

What Is Top Of Mind For B2B CMOs In Asia-Pacific

Quick Links

Important

Three Innovations Power o1

Safety: A Key Need for Advanced AI Agents

Trading Speed for Accuracy and Explanation

Conclusion: A Small Step That Is Actually A Big Leap

You Might Also Like

Sign Up For Daily Newsletter

Be keep up! Get the latest breaking news delivered straight to your inbox.

Leave a Reply Cancel reply

Stay Connected

Latest News

Quick Links

Important