How to Install & Run Hunyuan3D-Omni Locally?

by Ayush Kumar | October 1, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Hunyuan3D-Omni is Tencent’s unified, controllable image-to-3D generator built on Hunyuan3D 2.1. Beyond images, it ingests point clouds, voxels, 3D bounding boxes, and skeletal poses through a single control encoder, letting you steer geometry, topology, and pose precisely. The training uses difficulty-aware sampling to robustly fuse modalities (e.g., bias toward harder signals like pose), and optional EMA and FlashVDM switches improve stability and speed at inference. Reported footprint: ~10 GB VRAM for single-asset generation with batch size 1.

GPU Configuration (Inference, Rule-of-Thumb)

Assumptions: batch = 1, PyTorch 2.5+ w/ CUDA 12.4 wheels, default image-to-3D path, typical resolution; FlashVDM on reduces latency and marginally trims VRAM spikes; EMA on slightly increases load but improves stability.

Scenario	Precision / Mode	Min VRAM (works)	Comfortable VRAM (smooth)	Example GPUs (min → comfy)	Notes
Entry single-GPU	FP16/BF16, FlashVDM on, EMA off	10–12 GB	12–16 GB	RTX 3060 12 GB → RTX 4070/4070 Ti	Matches the project note of ~10 GB. Keep batch=1; avoid very large controls.
Stable single-GPU	FP16/BF16, FlashVDM on, EMA on	12–14 GB	16–24 GB	RTX 4070/4080(S) → RTX 3090/4090	EMA adds a small VRAM bump and steadier results; fastest wall-clock with FlashVDM.
Heavy controls (pose/voxel)	FP16/BF16, FlashVDM on	14–18 GB	24–32 GB	RTX 3090/4090 → RTX A6000 48 GB	Complex pose rigs / dense voxels raise peaks; prefer 24 GB+ for headroom.
Throughput focus	FP16/BF16, FlashVDM on, micro-batching	24 GB	32–48 GB	RTX 4090 → A6000 48 GB	For back-to-back jobs or higher resolution; still keep per-job batch=1 for safety.
Production workstation / lab	FP16/BF16, FlashVDM on	32 GB	48–80 GB	A6000 48 GB → H100 80 GB	Easiest for experimenting with control richness and larger assets.

Resources

Link: https://huggingface.co/tencent/Hunyuan3D-Omni

Step-by-Step Process to Install & Run Hunyuan3D-Omni Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H200s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x RTX A6000 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Hunyuan3D-Omni, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based models like Hunyuan3D-Omni.
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like Hunyuan3D-Omni.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the Hunyuan3D-Omni runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Install Base System Packages (Ubuntu)

Install the essentials you’ll need for Hunyuan3D-Omni: Python 3.10 venv/pip, Git + LFS, FFmpeg, OpenGL libs, and build tools.

Run the following commands to install base system packages:

sudo apt update
sudo apt install -y python3.10-venv python3-pip git git-lfs ffmpeg libgl1 libglib2.0-0 build-essential
git lfs install

Step 9: Create & Activate a Python Virtual Environment

Isolate everything for Hunyuan3D-Omni in its own venv, then upgrade the basic build tools.

Run the following commands to create & activate a python virtual environment:

python3.10 -m venv ~/hy3d
source ~/hy3d/bin/activate
python -m pip install -U pip wheel setuptools

Step 10: Clone the Hunyuan3D-Omni Repository

Grab the official codebase from GitHub and move into the project folder.

Run the following command to clone the hunyuan3d-omni repository:

git clone https://github.com/Tencent-Hunyuan/Hunyuan3D-Omni.git
cd Hunyuan3D-Omni

This Will:

Download all the necessary model code, inference.py, and demos/ directories.
Set you up in the correct working directory to proceed with model setup, weight download, and inference.

Step 11: Install PyTorch (CUDA 12.4 Compatible)

Use the official PyTorch 2.5.1 wheel built for CUDA 12.4 (required by Hunyuan3D-Omni).

Run the following command to install pytocrh:

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
  --index-url https://download.pytorch.org/whl/cu124

Step 12: Install Required Dependencies

Run the following command inside the Hunyuan3D-Omni folder to install all required Python packages:

pip install torch==2.5.1 torchvision==0.20.1 torchaudio==2.5.1 \
  --index-url https://download.pytorch.org/whl/cu124

This Installs:

torch – core deep learning framework
torchvision – for image utilities and datasets
torchaudio – optional audio tools (included for completeness)

Step 13: Install Hugging Face Hub + Fast Transfer Support

Install the updated huggingface_hub along with hf_transfer for faster model downloads:

pip install -U huggingface_hub hf_transfer

Step 14: Download Model Weights from Hugging Face

Use the Hugging Face CLI to download only the model checkpoint files (e.g., .bin, .safetensors) into a local directory named ./model/:

huggingface-cli download tencent/Hunyuan3D-Omni \
  --include "model/*" \
  --local-dir ./model/

Note: If you see a warning that huggingface-cli download is deprecated, you can switch to the new CLI:

hf download tencent/Hunyuan3D-Omni \
  --include "model/*" \
  --local-dir ./model/ \
  --resume-download

Step 15: Run Inference Using Point Cloud Control

Now that the model and environment are fully set up, you can run the point cloud–guided 3D generation using the following command:

python3 inference.py --control_type point --use_ema

What This Does:

Uses sample point cloud inputs from ./demos/point/imgs/
Loads the EMA version of the model for more stable results
Generates 3D shapes (usually saved in an output directory like ./results/ or ./outputs/)

Step 16: Run Inference Using Voxel Control

To generate 3D shapes using voxel-based control inputs, run the following command:

python3 inference.py --control_type voxel --use_ema

What This Does:

Uses pre-provided voxel input samples from ./demos/voxel/ or a similar directory.
Loads the Exponential Moving Average (EMA) version of the model for smoother output quality.
Produces a 3D asset (mesh) based on the voxel representation.

Output:

The generated 3D mesh (likely in .ply, .obj, or .glb format) will be saved in a directory such as:

./outputs/voxel/

Optional: Other Control Modes

You can try other controls by changing --control_type:

# Bounding Box guided
python3 inference.py --control_type bbox --use_ema

# Voxel control
python3 inference.py --control_type voxel --use_ema

# Skeletal pose control
python3 inference.py --control_type pose --use_ema

Conclusion

Hunyuan3D-Omni is a powerful, unified 3D generation framework that goes far beyond typical image-to-3D pipelines — offering fine-grained control using point clouds, voxels, bounding boxes, and skeletal poses. With a clean setup, efficient GPU usage, and support for advanced inference flags like EMA and FlashVDM, it’s production-ready for creators, researchers, and developers alike.

By deploying it on a GPU VM like NodeShift, you can unlock scalable, high-performance 3D asset generation — with full control, flexibility, and speed.

Relevant blog posts

October 11, 2025

How to Install & Run Qwen3-VL-30B-A3B-Thinking Locally?

Qwen3-VL-30B-A3B-Thinking is one of the most advanced multimodal reasoning models in the Qwen3 series, designed to seamlessly fuse text, vision, and video understanding with large-scale reasoning. Built on a Mixture-of-Experts (MoE) architecture with 30B active parameters, the model introduces a specialized Thinking variant, tuned for deep multimodal reasoning across STEM, math, and complex real-world scenarios. Key Strengths Include Visual Agent Capabilities – Can perceive GUI elements, invoke tools, and complete tasks on PC/mobile interfaces. Visual Coding Boost – Converts diagrams, screenshots, and videos into structured code artifacts (e.g., HTML, CSS, JavaScript, Draw.io). Advanced Spatial & Video Perception – Supports 3D grounding, object occlusion reasoning, timestamp alignment, and long-horizon video comprehension. Massive Context Handling – Native 256K tokens, expandable up to 1M, enabling book-level comprehension or hours-long video indexing. Robust OCR & Recognition – Trained on broad visual corpora, supports 32 languages, rare/ancient scripts, and noisy/tilted text scenarios. Unified Text-Vision Understanding – Matches pure LLMs in text reasoning while tightly aligning vision inputs for lossless multimodal comprehension. Overall, Qwen3-VL-30B-A3B-Thinking is positioned as a research-grade, enterprise-ready model that excels at multimodal STEM reasoning, vide

October 10, 2025

How to Install & Run Microsoft UserLM-8B Locally?

UserLM-8b is Microsoft’s open-weight large language model uniquely designed to simulate the “user” role in conversations. Unlike most LLMs that play the assistant role, UserLM-8b was fine-tuned on the WildChat-1M dataset to generate realistic user utterances. This makes it particularly useful for evaluating assistant LLMs, synthetic data generation, and research on user behavior modeling. Built on top of Llama-3.1-8B-Base, the model was fully fine-tuned with 227 hours of training on NVIDIA RTX A6000 GPUs. UserLM-8b can: Generate first-turn user queries given a task intent. Simulate multi-turn follow-up responses across long conversations. Signal the natural end of a conversation with a special token. Its evaluations show that UserLM-8b achieves lower perplexity, stronger distributional alignment, and more realistic conversational diversity compared to assistant-based simulators. While not designed as an assistant model, UserLM-8b helps researchers stress-test assistants under a wide range of conversational conditions, making it a valuable tool for robustness and evaluation studies.

October 9, 2025

How to Install & Run Servicenow Apriel-1.5-15b-Thinker Locally?

Apriel-1.5-15B-Thinker is ServiceNow’s open-weights multimodal reasoning model (image-text-to-text) built with an emphasis on mid-training/continual pre-training and high-quality text SFT—no RL. Despite its compact 15B size, it posts strong results (e.g., 52 on the Artificial Analysis Intelligence Index) and is designed to fit on a single GPU. It ships with an OpenAI-compatible vLLM recipe (custom parser for tools + reasoning) and an MIT license, making it practical for on-prem and research workflows.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.