Hunyuan3D-Omni is Tencent’s unified, controllable image-to-3D generator built on Hunyuan3D 2.1. Beyond images, it ingests point clouds, voxels, 3D bounding boxes, and skeletal poses through a single control encoder, letting you steer geometry, topology, and pose precisely. The training uses difficulty-aware sampling to robustly fuse modalities (e.g., bias toward harder signals like pose), and optional EMA and FlashVDM switches improve stability and speed at inference. Reported footprint: ~10 GB VRAM for single-asset generation with batch size 1.
GPU Configuration (Inference, Rule-of-Thumb)
Assumptions: batch = 1, PyTorch 2.5+ w/ CUDA 12.4 wheels, default image-to-3D path, typical resolution; FlashVDM on reduces latency and marginally trims VRAM spikes; EMA on slightly increases load but improves stability.
Scenario | Precision / Mode | Min VRAM (works) | Comfortable VRAM (smooth) | Example GPUs (min → comfy) | Notes |
---|
Entry single-GPU | FP16/BF16, FlashVDM on, EMA off | 10–12 GB | 12–16 GB | RTX 3060 12 GB → RTX 4070/4070 Ti | Matches the project note of ~10 GB. Keep batch=1; avoid very large controls. |
Stable single-GPU | FP16/BF16, FlashVDM on, EMA on | 12–14 GB | 16–24 GB | RTX 4070/4080(S) → RTX 3090/4090 | EMA adds a small VRAM bump and steadier results; fastest wall-clock with FlashVDM. |
Heavy controls (pose/voxel) | FP16/BF16, FlashVDM on | 14–18 GB | 24–32 GB | RTX 3090/4090 → RTX A6000 48 GB | Complex pose rigs / dense voxels raise peaks; prefer 24 GB+ for headroom. |
Throughput focus | FP16/BF16, FlashVDM on, micro-batching | 24 GB | 32–48 GB | RTX 4090 → A6000 48 GB | For back-to-back jobs or higher resolution; still keep per-job batch=1 for safety. |
Production workstation / lab | FP16/BF16, FlashVDM on | 32 GB | 48–80 GB | A6000 48 GB → H100 80 GB | Easiest for experimenting with control richness and larger assets. |
Resources
Link: https://huggingface.co/tencent/Hunyuan3D-Omni
Step-by-Step Process to Install & Run Hunyuan3D-Omni Locally
For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.
Step 1: Sign Up and Set Up a NodeShift Cloud Account
Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.
Follow the account setup process and provide the necessary details and information.
Step 2: Create a GPU Node (Virtual Machine)
GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H200s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.
Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy
Step 3: Select a Model, Region, and Storage
In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.
We will use 1 x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.
Step 4: Select Authentication Method
There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.
Step 5: Choose an Image
In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Hunyuan3D-Omni, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.
We chose the following image:
nvidia/cuda:12.1.1-devel-ubuntu22.04
This image is essential because it includes:
- Full CUDA toolkit (including
nvcc
)
- Proper support for building and running GPU-based models like Hunyuan3D-Omni.
- Compatibility with CUDA 12.1.1 required by certain model operations
Launch Mode
We selected:
Interactive shell server
This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like Hunyuan3D-Omni.
Docker Repository Authentication
We left all fields empty here.
Since the Docker image is publicly available on Docker Hub, no login credentials are required.
Identification
nvidia/cuda:12.1.1-devel-ubuntu22.04
CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.
This setup ensures that the Hunyuan3D-Omni runs in a GPU-enabled environment with proper CUDA access and high compute performance.
After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.
Step 6: Virtual Machine Successfully Deployed
You will get visual confirmation that your node is up and running.
Step 7: Connect to GPUs using SSH
NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.
Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.
Now open your terminal and paste the proxy SSH IP or direct SSH IP.
Next, If you want to check the GPU details, run the command below:
nvidia-smi
Step 8: Install Base System Packages (Ubuntu)
Install the essentials you’ll need for Hunyuan3D-Omni: Python 3.10 venv/pip, Git + LFS, FFmpeg, OpenGL libs, and build tools.
Run the following commands to install base system packages:
sudo apt update
sudo apt install -y python3.10-venv python3-pip git git-lfs ffmpeg libgl1 libglib2.0-0 build-essential
git lfs install
Step 9: Create & Activate a Python Virtual Environment
Isolate everything for Hunyuan3D-Omni in its own venv, then upgrade the basic build tools.
Run the following commands to create & activate a python virtual environment:
python3.10 -m venv ~/hy3d
source ~/hy3d/bin/activate
python -m pip install -U pip wheel setuptools
Step 10: Clone the Hunyuan3D-Omni Repository
Grab the official codebase from GitHub and move into the project folder.
Run the following command to clone the hunyuan3d-omni repository:
git clone https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni.git
cd Hunyuan3D-Omni
This Will:
- Download all the necessary model code,
inference.py
, and demos/
directories.
- Set you up in the correct working directory to proceed with model setup, weight download, and inference.
Step 11: Install PyTorch (CUDA 12.4 Compatible)
Use the official PyTorch 2.5.1 wheel built for CUDA 12.4 (required by Hunyuan3D-Omni).
Run the following command to install pytocrh:
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu124
Step 12: Install Required Dependencies
Run the following command inside the Hunyuan3D-Omn
i folder to install all required Python packages:
pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
--index-url https://download.pytorch.org/whl/cu124
This Installs:
torch
– core deep learning framework
torchvision
– for image utilities and datasets
torchaudio
– optional audio tools (included for completeness)
Step 13: Install Hugging Face Hub + Fast Transfer Support
Install the updated huggingface_hub
along with hf_transfer
for faster model downloads:
pip install -U huggingface_hub hf_transfer
Step 14: Download Model Weights from Hugging Face
Use the Hugging Face CLI to download only the model checkpoint files (e.g., .bin
, .safetensors
) into a local directory named ./model/
:
huggingface-cli download tencent/Hunyuan3D-Omni \
--include "model/*" \
--local-dir ./model/
Note: If you see a warning that huggingface-cli download
is deprecated, you can switch to the new CLI:
hf download tencent/Hunyuan3D-Omni \
--include "model/*" \
--local-dir ./model/ \
--resume-download
Step 15: Run Inference Using Point Cloud Control
Now that the model and environment are fully set up, you can run the point cloud–guided 3D generation using the following command:
python3 inference.py --control_type point --use_ema
What This Does:
- Uses sample point cloud inputs from
./demos/point/imgs/
- Loads the EMA version of the model for more stable results
- Generates 3D shapes (usually saved in an output directory like
./results/
or ./outputs/
)
Step 16: Run Inference Using Voxel Control
To generate 3D shapes using voxel-based control inputs, run the following command:
python3 inference.py --control_type voxel --use_ema
What This Does:
- Uses pre-provided voxel input samples from
./demos/voxel/
or a similar directory.
- Loads the Exponential Moving Average (EMA) version of the model for smoother output quality.
- Produces a 3D asset (mesh) based on the voxel representation.
Output:
The generated 3D mesh (likely in .ply
, .obj
, or .glb
format) will be saved in a directory such as:
./outputs/voxel/
Optional: Other Control Modes
You can try other controls by changing --control_type
:
# Bounding Box guided
python3 inference.py --control_type bbox --use_ema
# Voxel control
python3 inference.py --control_type voxel --use_ema
# Skeletal pose control
python3 inference.py --control_type pose --use_ema
Conclusion
Hunyuan3D-Omni is a powerful, unified 3D generation framework that goes far beyond typical image-to-3D pipelines — offering fine-grained control using point clouds, voxels, bounding boxes, and skeletal poses. With a clean setup, efficient GPU usage, and support for advanced inference flags like EMA and FlashVDM, it’s production-ready for creators, researchers, and developers alike.
By deploying it on a GPU VM like NodeShift, you can unlock scalable, high-performance 3D asset generation — with full control, flexibility, and speed.