UI-TARS Desktop is a GUI Agent application based on UI-TARS (Vision-Language Model) that allows you to control your computer using natural language.
📑 Paper
| 🤗 Hugging Face Models
| 🤖 ModelScope
🖥️ Desktop Application
| 👓 Midscene (use in browser)
The GGUF model has undergone quantization, but unfortunately, its performance cannot be guaranteed. As a result, we have decided to downgrade it.
💡 Alternative Solution: You can use Cloud Deployment or Local Deployment [vLLM](If you have enough GPU resources) instead.
We appreciate your understanding and patience as we work to ensure the best possible experience.
- 🚀 01.25: We updated the Cloud Deployment section in the 中文版: GUI模型部署教程 with new information related to the ModelScope platform. You can now use the ModelScope platform for deployment.
Instruction | Video |
---|---|
Get the current weather in SF using the web browser | new_mac_action_weather.mp4 |
Send a twitter with the content "hello world" | new_send_twitter_windows.mp4 |
- 🤖 Natural language control powered by Vision-Language Model
- 🖥️ Screenshot and visual recognition support
- 🎯 Precise mouse and keyboard control
- 💻 Cross-platform support (Windows/MacOS)
- 🔄 Real-time feedback and status display
- 🔐 Private and secure - fully local processing
You can download the latest release version of UI-TARS Desktop from our releases page.
- Drag UI TARS application into the Applications folder
- Enable the permission of UI TARS in MacOS:
- System Settings -> Privacy & Security -> Accessibility
- System Settings -> Privacy & Security -> Screen Recording
- Then open UI TARS application, you can see the following interface:
Still to run the application, you can see the following interface:
We recommend using HuggingFace Inference Endpoints for fast deployment. We provide two docs for users to refer:
English version: GUI Model Deployment Guide
中文版: GUI模型部署教程
We recommend using vLLM for fast deployment and inference. You need to use vllm>=0.6.1
.
pip install -U transformers
VLLM_VERSION=0.6.6
CUDA_VERSION=cu124
pip install vllm==${VLLM_VERSION} --extra-index-url https://download.pytorch.org/whl/${CUDA_VERSION}
We provide three model sizes on Hugging Face: 2B, 7B, and 72B. To achieve the best performance, we recommend using the 7B-DPO or 72B-DPO model (based on your hardware configuration):
Run the command below to start an OpenAI-compatible API service:
python -m vllm.entrypoints.openai.api_server --served-model-name ui-tars --model <path to your model>
Note: VLM Base Url is OpenAI compatible API endpoints (see OpenAI API protocol document for more details).
Just simple two steps to run the application:
pnpm install
pnpm run dev
Note: On MacOS, you need to grant permissions to the app (e.g., iTerm2, Terminal) you are using to run commands.
# Unit test
pnpm run test
# E2E test
pnpm run test:e2e
- Node.js >= 20
- Supported Operating Systems:
- Windows 10/11
- macOS 10.15+
UI-TARS Desktop is licensed under the Apache License 2.0.
If you find our paper and code useful in your research, please consider giving a star ⭐ and citation 📝
@article{qin2025ui,
title={UI-TARS: Pioneering Automated GUI Interaction with Native Agents},
author={Qin, Yujia and Ye, Yining and Fang, Junjie and Wang, Haoming and Liang, Shihao and Tian, Shizuo and Zhang, Junda and Li, Jiahao and Li, Yunxin and Huang, Shijue and others},
journal={arXiv preprint arXiv:2501.12326},
year={2025}
}