Google has introduced the Gemini 2.5 Computer Use model, an advanced AI designed to interact with web browsers in a manner akin to human users. This model leverages visual comprehension and reasoning to perform tasks such as clicking buttons, filling out forms, scrolling through pages, and typing—actions typically executed by human users navigating web interfaces.
Key Features:
- Human-Like Interaction: Unlike traditional models that rely on APIs, Gemini 2.5 utilizes a virtual browser to simulate human-like interactions with websites.
- Versatile Actions: The model supports a range of predefined actions, including opening browsers, typing, clicking, dragging elements, and scrolling, enabling it to navigate complex web interfaces effectively.
- Enhanced Performance: Google claims that Gemini 2.5 outperforms competitors in benchmark tests, offering lower latency and improved efficiency in web-based tasks.
- Developer Access: Currently, Gemini 2.5 is available to developers through Google AI Studio and Vertex AI, allowing for integration into various applications and workflows.
Applications:
- UI Testing: Internally, Google employs Gemini 2.5 for user interface testing, streamlining the process of identifying and addressing issues in web applications.
- Agentic Features: The model powers agentic capabilities in Google’s AI Mode in Search, Project Mariner, and Firebase Testing Agent, facilitating tasks like research, planning, and data entry through natural language instructions.
With the introduction of Gemini 2.5, Google has made significant strides in developing AI that can seamlessly interact with web interfaces, enhancing automation and user experience across various platforms.