Imagine running a state-of-the-art language model with 256K context window, hybrid reasoning, and agent-level intelligence, all on your local machine. Meet Hunyuan, Tencent’s powerful new family of open-source models built for versatility, speed, and long-context reasoning. Whether you’re building production-grade AI agents or running inference on edge devices, Hunyuan delivers exceptional performance at every scale, from the lightweight 0.5B to the robust 7B variant. With Grouped Query Attention (GQA) for faster inference, quantization support for low-resource deployment, and benchmarks that rival top-tier models (e.g., 88.25 on GSM8K and 82.95 on BBH for the 7B pretrain), Hunyuan models are engineered to handle real-world tasks with ease. What’s more, they come in both pretrained and instruction-tuned formats, optimized for coding, reasoning, science, and agent tasks. If you’re tired of LLMs that falter with long documents or lack local flexibility, Hunyuan is your next must-try.
In this guide, we’ll walk you through how to install and run Hunyuan 7B or 1.8B locally, unlocking cutting-edge AI right from your own machine, no GPU farm required.
Prerequisites
The minimum system requirements for running this model are:
- GPU: 1x RTX4090 1x RTX A6000
- Storage: 50 GB (preferable)
- VRAM: at least 16 GB
- Anaconda installed
Step-by-step process to install and run Hunyuan 7B or 1.5B
For the purpose of this tutorial, we’ll use a GPU-powered Virtual Machine by NodeShift since it provides high compute Virtual Machines at a very affordable cost on a scale that meets GDPR, SOC2, and ISO27001 requirements. Also, it offers an intuitive and user-friendly interface, making it easier for beginners to get started with Cloud deployments. However, feel free to use any cloud provider of your choice and follow the same steps for the rest of the tutorial.
Step 1: Setting up a NodeShift Account
Visit app.nodeshift.com and create an account by filling in basic details, or continue signing up with your Google/GitHub account.
If you already have an account, login straight to your dashboard.
Step 2: Create a GPU Node
After accessing your account, you should see a dashboard (see image), now:
- Navigate to the menu on the left side.
- Click on the GPU Nodes option.
- Click on Start to start creating your very first GPU node.
These GPU nodes are GPU-powered virtual machines by NodeShift. These nodes are highly customizable and let you control different environmental configurations for GPUs ranging from H100s to A100s, CPUs, RAM, and storage, according to your needs.
Step 3: Selecting configuration for GPU (model, region, storage)
- For this tutorial, we’ll be using 1x RTX A6000 GPU, however, you can choose any GPU as per the prerequisites.
- Similarly, we’ll opt for 200GB storage by sliding the bar. You can also select the region where you want your GPU to reside from the available ones.
Step 4: Choose GPU Configuration and Authentication method
- After selecting your required configuration options, you’ll see the available GPU nodes in your region and according to (or very close to) your configuration. In our case, we’ll choose a 1x RTX A6000 48GB GPU node with 64vCPUs/63GB RAM/200GB SSD.
2. Next, you’ll need to select an authentication method. Two methods are available: Password and SSH Key. We recommend using SSH keys, as they are a more secure option. To create one, head over to our official documentation.
Step 5: Choose an Image
The final step is to choose an image for the VM, which in our case is Nvidia Cuda.
That’s it! You are now ready to deploy the node. Finalize the configuration summary, and if it looks good, click Create to deploy the node.
Step 6: Connect to active Compute Node using SSH
- As soon as you create the node, it will be deployed in a few seconds or a minute. Once deployed, you will see a status Running in green, meaning that our Compute node is ready to use!
- Once your GPU shows this status, navigate to the three dots on the right, click on Connect with SSH, and copy the SSH details that appear.
As you copy the details, follow the below steps to connect to the running GPU VM via SSH:
- Open your terminal, paste the SSH command, and run it.
2. In some cases, your terminal may take your consent before connecting. Enter ‘yes’.
3. A prompt will request a password. Type the SSH password, and you should be connected.
Output:
Next, If you want to check the GPU details, run the following command in the terminal:
!nvidia-smi
Step 7: Set up the project environment with dependencies
- Create a virtual environment using Anaconda.
conda create -n hunyuan python=3.11 -y && conda activate hunyuan
Output:
2. Once you’re inside the environment, install vllm
with dependencies.
pip install --upgrade vllm
Output:
3. Also, open a second terminal, connect to remote server with SSH and install open-webui
.
pip install open-webui
4. Install Transformers.
pip install git+https://github.com/huggingface/transformers@4970b23cedaf745f963779b4eae68da281e8c6ca
Step 8: Download the model
- Download model with
vllm
and host the endpoint at 8000
.
Replace it with Hunyuan 1.5B to download the mode.
vllm serve tencent/Hunyuan-7B-Instruct --host 0.0.0.0 --port 8000 --trust_remote_code
Output:
2. In the second terminal connected with the GPU host with ssh, serve the open-webui frontend endpoint.
open-webui serve --port 3000
3. Forward both the ports and tunnel them to access in the local browser.
If you’re on a remote machine (e.g., NodeShift GPU), you’ll need to do SSH port forwarding in order to access the both vllm and open-webui session on your local browser.
Run the following command in your local terminal after replacing:
<YOUR_SERVER_PORT>
with the PORT allotted to your remote server (For the NodeShift server – you can find it in the deployed GPU details on the dashboard).
<PATH_TO_SSH_KEY>
with the path to the location where your SSH key is stored.
<YOUR_SERVER_IP>
with the IP address of your remote server.
ssh -L 3000:localhost:3000 -p <YOUR_SERVER_PORT> -i <PATH_TO_SSH_KEY> root@<YOUR_SERVER_IP>
In another local terminal run forward the port for vllm endpoint:
ssh -L 8000:localhost:8000 -p <YOUR_SERVER_PORT> -i <PATH_TO_SSH_KEY> root@<YOUR_SERVER_IP>
Step 9: Run the model via Open WebUI Interface
Once ports are forwarded, you can simply access the model via Open WebUI interface and chat with it.
- Before running the model, connect the webui with
vllm
API endpoint in the settings.
2. Select the Hunyuan-7B or 1.5 model in the chat page and run the prompt.
For e.g., we’re testing the following prompt:
You are an AI assistant embedded in a robot that will help a person relocate to a new city for a remote job. Your goal is to:
Recommend:
- A city in India that fits best for software engineers.
- An ideal living arrangement (rent or buy, apartment or house).
- Internet providers suitable for remote work in that city.
- A weekly schedule that balances productivity, social life, and wellness.
You must:
- Think through the options and tradeoffs.
- Justify your reasoning step-by-step.
- Format the output clearly in sections with bullet points.
Let’s begin by asking your 5 questions.
Output:
Conclusion
With Hunyuan’s powerful blend of ultra-long context handling, hybrid reasoning, and efficient local inference, deploying a high-performance LLM on your own hardware has never been more accessible. In this article, we explored how to set up the 7B and 1.8B variants locally, highlighting their standout features and real-world capabilities across reasoning, coding, and agent tasks. Backing this seamless experience is NodeShift cloud, which simplifies the deployment process with ready-to-use infrastructure, optimized environments, and hands-on support, ensuring you spend less time configuring and more time building with next-gen AI.