With the recent developments in AI, the applicability of large language models has been growing to the point where they can be used anywhere. From helping you write your resume to generating images, they are everywhere. You can even find them popping up as chatbots on websites you never expected to find them on. The stuff they are capable of doing is impressive. However, a massive amount of text bombarding you to complete a simple task can sometimes be frustrating. This is where a Large Action Model comes in.
What is a Large Action Model?
Like Large Language Models (LLMs), you can think of Large Action Models (LAMs) as personal assistants. Both Models help by completing more straightforward tasks for you, but LAMs, unlike LLMs, aren’t limited to chat boxes.
In recent months, there has been a surge in the popularity of agents. Agents are LLMs capable of utilizing tools to complete tasks rather than just spewing out query responses. One example of such an agent is AutoGPT, which breaks a goal into sub-tasks and uses the internet and other tools to complete these in a loop until the final objective is achieved.
LAMs are similar to agents, taking users’ inputs and attempting to complete tasks. What makes them better, however, is that LAMs can adapt to changing circumstances and can complete complicated tasks from start to finish without constantly asking the user what the next step is.
Large Action Models vs Large Language Models
Large Language models are machine learning systems designed to understand and generate text. They are trained on vast amounts of data to understand human speech and use transformer models, a neural network that learns the meaning and context of words based on the other words surrounding them in a sentence. When you ask an LLM to, for example, book a flight, the most it will do is give you the instructions and a link to the website. This leaves a significant gap in the process of completing the task.
Sure, you won’t have to search through menus, but wouldn’t it be convenient if all of this could be automated?
LAMs, unlike LLMs, use neuro-symbolic models and learn from actions. They learn from human intention and interaction with interfaces and then mimic these actions: scrolling, clicking, and typing through menus, eliminating the need for the user to jump between apps to complete repetitive tasks. Take the example of booking flights. You can buy a ticket using an LAM in just one command. Users will no longer need to tediously navigate through confusing user interfaces to find what they are looking for.
LAMs and LLMs, although quite different in how they help complete tasks, work exceptionally well when paired. The first step to accomplishing any assignment is to understand it. The LLM understands and comprehends the meaning of the query. The LAM then divides the task into steps and carries them out in real-time. It may also utilize the LLM when needed, like when contacting customer service: It may be used to carry out the conversation.
Future of Large Action Models
LAMs are a product of relatively recent developments in AI. This technology being used in the devices we use today will be game-changing. The ability to automate using LAMs will allow people to focus on essential tasks and leave the repetitive stuff to AI. Rabbit demonstrated LAMs’ capabilities during CES 2024 perfectly through their rabbit r1.
Q: What is a Large Action Model (LAM)?
A Large Action Model (LAM) is a system that can understand and perform human actions on computer applications, such as web navigation, form filling, or online shopping. It uses a combination of neural networks and symbolic reasoning to directly model the structure and logic of various applications.
Q: Who developed the first LAM?
The first LAM was developed by the Rabbit Research Team, a new AI company that aims to revolutionize human-computer interactions. Their product, Rabbit R1, is a device that leverages LAM to execute complex tasks on any application with natural language commands.
Q: What are the advantages of LAM over other AI models?
LAM has several advantages over other AI models, such as:
Accuracy: LAM can perform actions with high precision and reliability, as it does not rely on intermediate representations such as text or images.
Interpretability: LAM can explain its actions and reasoning transparently and understandably, as it uses symbolic algorithms to model the logic of applications.
Speed: LAM can perform actions faster than other models, as it does not need to process large amounts of data or perform multiple steps of inference.
Simplicity: LAM can handle complex tasks with simple commands, as it learns by demonstration and does not require extensive training or programming.
Q: What are some examples of tasks that LAM can do?
LAM can do a variety of tasks on different applications, such as:
Booking a flight on Kayak by specifying the destination, date, and budget.
Filling out a form on Google Docs by providing the required information and formatting.
Shopping for groceries on Instacart by adding items to the cart and checking out.
Creating a playlist on Spotify by selecting the genre, mood, and artists.
Generating a summary of an article on Wikipedia by extracting the main points and keywords.