rabbit launched its $199 personal AI device (PAD) through a virtual keynote at CES 2024. Consumers can use natural language to ask questions and get answers or get digital tasks done (e.g., order a pizza or rideshare) — if they are willing to train the agent.
Highlights of the hardware include its small size, a black and white screen, a push-to-talk button, a swivel camera, and three radios (Bluetooth, Wi-Fi, and cellular). Two other notable announcements: 1) rabbit says that it has created a large action model (LAM), but we are not sure what that actually is, and 2) the PAD comes with its own operating system — rabbit OS — which it claims is compatible with any device, platform, app, or otherwise (i.e., “it does everything”).
Here’s what’s exciting. rabbit’s r1 demonstrates that:
- The use of natural language to access information, control devices, or even complete tasks is finally a good enough interface in 2024.
- Multimodal (pointing, typing, and speaking) interfaces offer a powerful alternative to in-person conversations or even search in the right scenarios. The CES 2024 virtual keynote demonstrated the use of computer vision to aid or offer additional context to voice when the user asks a question or makes a request. Amazon’s Fire smartphone attempted this about a decade ago, but the process was too slow, since the right enabling technologies weren’t yet in place.
- Conversational interfaces can be agentive. Generative AI apps aren’t just a fun or productive means of getting answers, conducting analysis, drawing images, or ordering a pizza. They can potentially offer real convenience to consumers by performing tasks as your “agent.” The term “agentic AI” is now being bounced around, but I think that it’s back to the future. AI was born in the 1950s to create intelligent agents. rabbit r1 combines a natural language model with an agent’s ability to perform tasks.
Here’s why it is hard to imagine that the r1 will be a commercial success:
- Smartphones either do or will perform many of the same functions. Apple and Google will continue to evolve their virtual and voice assistants.
- It’s an extra device to buy, charge, configure, program, and carry. The novelty of using (and charging) a stand-alone device will wear off quickly. While this seems simple, this is one of the top reasons why consumers don’t use wearables.
- The “learning mode” will likely prove to be too complex for most users. For years, device manufacturers, operating systems, and software providers have rolled out tools to allow consumers to create shortcuts to their favorite features or apps. Few seem to do so.
- In order for the LAM to pay off, consumers must program it to do tasks that they’ll do often — not one-off tasks. Apps or services such as Uber could also build natural language into their apps — so the consumer has one extra step of opening the Uber app before doing the exact same thing that rabbit does.
- Borrowing moments is a great strategy in theory, but it hasn’t played out yet at scale. For more than a decade, brands have tried “loaning” moments to other brands to offer convenience to consumers. Borrowing moments allows consumers to complete tasks where they already are, rather than hopping to a different website or app. For example, United — along with other airlines — has embedded links for rideshare brands in its app. Even Google Maps makes suggestions for scooters, ridesharing, and taxis. Apple and Google have embedded “click to chat” functionality in their apps, as has Meta on its social media platforms. The idea is extremely powerful and holds potential that is still unrealized.
Here’s what it shows us about the future:
- Devices will someday learn by watching us, not being programmed. While the r1 will be too complex for most consumers, it illustrates the possibilities — at least for digital tasks. In the future, devices will wield just the right balance of natural language and agent capabilities that learn what we do, need, and want without programming. Their ability to converse in language and emulate empathy will lead us to trust them; we hope that the PAD makers are trustworthy.
- These devices challenge the assumption that brands need piles of consumer data. With cameras + edge computing/intelligence, devices can simply watch and listen to consumers, learn, and then tell brands what consumers want. When you think about this, this trend will unwind marketing as we know it. Fortunately, that is still some way off, but it’s something to watch for.
- These virtual assistants will serve some purposes — not all. They’ll do simple, tedious tasks that we don’t want to do. They will learn what we want and engage brands that we trust to get these things. They may even someday do work for us. They will still leave the heavy mental lifting — literal and figurative — to humans. I hope this lets us be less into the details and more creative and innovative as a species. Who knows where that will lead?
Questions we should be asking:
- Is society or human beings ready or not to have agents learn from us and perhaps give them some training? Are we ready to trust them to act on our behalf? How good will these personal agents get at understanding the nuances of human behavior, having values, and not harming others while they seek to serve us? AI safety is a hot topic today to answer precisely these types of questions.
- Are LAMs a real thing? The other term we hear is world model. Agents will need models of our physical world and the actions that we humans take in both the physical and digital realms. Today’s large language models are a start, but the AI community has much work to do.
- Who is ultimately responsible for the actions that a model takes? If you allow your car to drive itself and it hurts someone, who is at fault? What if you train a model to spend money or communicate on your behalf? Are humans ready to assume the risks of letting an agent order groceries? Move money? Communicate with friends?
If you’d like to discuss this topic further, please schedule a guidance session or inquiry with us.