How to Install & Run Qwen3-Omni-30B-A3B-Instruct Locally?

by Ayush Kumar | September 23, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

Qwen3-Omni-30B-A3B-Instruct is a multilingual, any-to-any omni-modal MoE model with a native Thinker–Talker design. It ingests text, image, audio, and video and can stream back text or natural speech in real time. Thanks to early text-first pretraining, mixed multimodal training, and a multi-codebook audio stack, it delivers SOTA-level ASR/AV while keeping strong unimodal text & vision performance. It supports FlashAttention-2, long contexts, and runs well with Transformers or vLLM. Use the Instruct variant for end-to-end voice/chat experiences (thinker + talker), or the Thinking variant when you only need chain-of-thought text output.

GPU Configuration (Quick Reference)

Assumptions

Precision: BF16 (FlashAttention-2 enabled)
Framework: Transformers (for min-VRAM math); vLLM is recommended for serving and may change practical headroom.
max_model_len≈32k, default image/video preprocessing (e.g., ~2 fps for eval), and single-request unless noted.

Evaluation

Performance of Qwen3-Omni

Qwen3-Omni maintains state-of-the-art performance on text and visual modalities without degradation relative to same-size single-model Qwen counterparts. Across 36 audio and audio-visual benchmarks, it achieves open-source SOTA on 32 and sets the SOTA on 22, outperforming strong closed-source systems such as Gemini 2.5 Pro and GPT-4o.

Text -> Text

		GPT-4o-0327	Qwen3-235B-A22B Non Thinking	Qwen3-30B-A3B-Instruct-2507	Qwen3-Omni-30B-A3B-Instruct	Qwen3-Omni-Flash-Instruct
Alignment Tasks	IFEval	83.9	83.2	84.7	81.0	81.7
	Creative Writing v3	84.9	80.4	86.0	80.6	81.8
	WritingBench	75.5	77.0	85.5	82.6	83.0
Agent	BFCL-v3	66.5	68.0	65.1	64.4	65.0
Multilingual Tasks	MultiIF	70.4	70.2	67.9	64.0	64.7
Multilingual Tasks	PolyMATH	25.5	27.0	43.1	37.9	39.3

		Gemini-2.5-Flash Thinking	Qwen3-235B-A22B Thinking	Qwen3-30B-A3B-Thinking-2507	Qwen3-Omni-30B-A3B-Thinking	Qwen3-Omni-Flash-Thinking
General Tasks	MMLU-Redux	92.1	92.7	91.4	88.8	89.7
General Tasks	GPQA	82.8	71.1	73.4	73.1	73.1
Reasoning	AIME25	72.0	81.5	85.0	73.7	74.0
Reasoning	LiveBench 20241125	74.3	77.1	76.8	71.8	70.3
Code	MultiPL-E	84.5	79.9	81.3	80.6	81.0
Alignment Tasks	IFEval	89.8	83.4	88.9	85.1	85.2
	Arena-Hard v2	56.7	61.5	56.0	55.1	57.8
	Creative Writing v3	85.0	84.6	84.4	82.5	83.6
	WritingBench	83.9	80.3	85.0	85.5	85.9
Agent	BFCL-v3	68.6	70.8	72.4	63.2	64.5
Multilingual Tasks	MultiIF	74.4	71.9	76.4	72.9	73.2
Multilingual Tasks	PolyMATH	49.8	54.7	52.6	47.1	48.7

Audio -> Text

	Seed-ASR	Voxtral-Mini	Voxtral-Small	GPT-4o-Transcribe	Gemini-2.5-Pro	Qwen2.5-Omni	Qwen3-Omni-30B-A3B-Instruct	Qwen3-Omni-Flash-Instruct
EN & ZH ASR (wer)
Wenetspeech net \| meeting	4.66 \| 5.69	24.30 \| 31.53	20.33 \| 26.08	15.30 \| 32.27	14.43 \| 13.47	5.91 \| 7.65	4.69 \| 5.89	4.62 \| 5.75
Librispeech clean \| other	1.58 \| 2.84	1.88 \| 4.12	1.56 \| 3.30	1.39 \| 3.75	2.89 \| 3.56	1.74 \| 3.45	1.22 \| 2.48	1.27 \| 2.44
CV15-en	–	9.47	7.79	10.01	9.89	7.61	6.05	5.94
CV15-zh	–	24.67	19.30	9.84	8.00	5.13	4.31	4.28
Fleurs-en	3.40	3.96	3.77	3.32	2.94	3.77	2.72	2.74
Fleurs-zh	2.69	12.22	7.98	2.44	2.71	2.54	2.20	2.19
Multilingual ASR (wer)
Fleurs-avg (19 lang)	–	15.67	8.09	4.48	5.55	14.04	5.33	5.31
Lyric ASR (wer)
MIR-1K (vocal-only)	6.45	23.33	18.73	11.87	9.85	8.15	5.90	5.85
Opencpop-test	2.98	31.01	16.06	7.93	6.49	2.84	1.54	2.02
S2TT (BLEU)
Fleurs-en2xx	–	30.35	37.85	–	39.25	29.22	37.50	36.22
Fleurs-xx2en	–	27.54	32.81	–	35.41	28.61	31.08	30.71
Fleurs-zh2xx	–	17.03	22.05	–	26.63	17.97	25.17	25.10
Fleurs-xx2zh	–	28.75	34.82	–	37.50	27.68	33.13	31.19

	GPT-4o-Audio	Gemini-2.5-Flash	Gemini-2.5-Pro	Qwen2.5-Omni	Qwen3-Omni-30B-A3B-Instruct	Qwen3-Omni-30B-A3B-Thinking	Qwen3-Omni-Flash-Instruct	Qwen3-Omni-Flash-Thinking
VoiceBench
AlpacaEval	95.6	96.1	94.3	89.9	94.8	96.4	95.4	96.8
CommonEval	89.8	88.3	88.4	76.7	90.8	90.5	91.0	90.9
WildVoice	91.6	92.1	93.4	77.7	91.6	90.5	92.3	90.9
SD-QA	75.5	84.5	90.1	56.4	76.9	78.1	76.8	78.5
MMSU	80.3	66.1	71.1	61.7	68.1	83.0	68.4	84.3
OpenBookQA	89.2	56.9	92.3	80.9	89.7	94.3	91.4	95.0
BBH	84.1	83.9	92.6	66.7	80.4	88.9	80.6	89.6
IFEval	76.0	83.8	85.7	53.5	77.8	80.6	75.2	80.8
AdvBench	98.7	98.9	98.1	99.2	99.3	97.2	99.4	98.9
Overall	86.8	83.4	89.6	73.6	85.5	88.8	85.6	89.5
Audio Reasoning
MMAU-v05.15.25	62.5	71.8	77.4	65.5	77.5	75.4	77.6	76.5
MMSU	56.4	70.2	77.7	62.6	69.0	70.2	69.1	71.3

	Best Specialist Models	GPT-4o-Audio	Gemini-2.5-Pro	Qwen2.5-Omni	Qwen3-Omni-30B-A3B-Instruct	Qwen3-Omni-Flash-Instruct
RUL-MuchoMusic	47.6 (Audio Flamingo 3)	36.1	49.4	47.3	52.0	52.1
GTZAN Acc.	87.9 (CLaMP 3)	76.5	81.0	81.7	93.0	93.1
MTG Genre Micro F1	35.8 (MuQ-MuLan)	25.3	32.6	32.5	39.0	39.5
MTG Mood/Theme Micro F1	10.9 (MuQ-MuLan)	11.3	14.1	8.9	21.0	21.7
MTG Instrument Micro F1	39.8 (MuQ-MuLan)	34.2	33.0	22.6	40.5	40.7
MTG Top50 Micro F1	33.2 (MuQ-MuLan)	25.0	26.1	21.6	36.7	36.9
MagnaTagATune Micro F1	41.6 (MuQ)	29.2	28.1	30.1	44.3	46.8

Vision -> Text

Datasets	GPT4-o	Gemini-2.0-Flash	Qwen2.5-VL 72B	Qwen3-Omni-30B-A3B -Instruct	Qwen3-Omni-Flash -Instruct
General Visual Question Answering
MMStar	64.7	71.4	70.8	68.5	69.3
HallusionBench	55.0	56.3	55.2	59.7	58.5
MM-MT-Bench	7.7	6.7	7.6	7.4	7.6
Math & STEM
MMMU_val	69.1	71.3	70.2	69.1	69.8
MMMU_pro	51.9	56.1	51.1	57.0	57.6
MathVista_mini	63.8	71.4	74.8	75.9	77.4
MathVision_full	30.4	48.6	38.1	56.3	58.3
Documentation Understanding
AI2D	84.6	86.7	88.7	85.2	86.4
ChartQA_test	86.7	64.6	89.5	86.8	87.1
Counting
CountBench	87.9	91.2	93.6	90.0	90.0
Video Understanding
Video-MME	71.9	72.4	73.3	70.5	71.4
LVBench	30.8	57.9	47.3	50.2	51.1
MLVU	64.6	71.0	74.6	75.2	75.7

Datasets	Gemini-2.5-flash-thinking	InternVL-3.5-241B-A28B	Qwen3-Omni-30B-A3B-Thinking	Qwen3-Omni-Flash-Thinking
General Visual Question Answering
MMStar	75.5	77.9	74.9	75.5
HallusionBench	61.1	57.3	62.8	63.4
MM-MT-Bench	7.8	–	8.0	8.0
Math & STEM
MMMU_val	76.9	77.7	75.6	75.0
MMMU_pro	65.8	–	60.5	60.8
MathVista_mini	77.6	82.7	80.0	81.2
MathVision_full	62.3	63.9	62.9	63.8
Documentation Understanding
AI2D_test	88.6	87.3	86.1	86.8
ChartQA_test	–	88.0	89.5	89.3
Counting
CountBench	88.6	–	88.6	92.5
Video Understanding
Video-MME	79.6	72.9	69.7	69.8
LVBench	64.5	–	49.0	49.5
MLVU	82.1	78.2	72.9	73.9

AudioVisual -> Text

Datasets	Previous Open-source SoTA	Gemini-2.5-Flash	Qwen2.5-Omni	Qwen3-Omni-30B-A3B-Instruct	Qwen3-Omni-Flash-Instruct
WorldSense	47.1	50.9	45.4	54.0	54.1

Datasets	Previous Open-source SoTA	Gemini-2.5-Flash-Thinking	Qwen3-Omni-30B-A3B-Thinking	Qwen3-Omni-Flash-Thinking
DailyOmni	69.8	72.7	75.8	76.2
VideoHolmes	55.6	49.5	57.3	57.3

Zero-shot Speech Generation

Datasets	Model	Performance
SEED test-zh \| test-en	Seed-TTS_ICL	1.11 \| 2.24
	Seed-TTS_RL	1.00 \| 1.94
	MaskGCT	2.27 \| 2.62
	E2 TTS	1.97 \| 2.19
	F5-TTS	1.56 \| 1.83
	Spark TTS	1.20 \| 1.98
	CosyVoice 2	1.45 \| 2.57
	CosyVoice 3	0.71 \| 1.45
	Qwen2.5-Omni-7B	1.42 \| 2.33
	Qwen3-Omni-30B-A3B	1.07 \| 1.39

Multilingual Speech Generation

Language	Content Consistency			Speaker Similarity
Language	Qwen3-Omni-30B-A3B	MiniMax	ElevenLabs	Qwen3-Omni-30B-A3B	MiniMax	ElevenLabs
Chinese	0.716	2.252	16.026	0.772	0.780	0.677
English	1.069	2.164	2.339	0.773	0.756	0.613
German	0.777	1.906	0.572	0.738	0.733	0.614
Italian	1.067	1.543	1.743	0.742	0.699	0.579
Portuguese	1.872	1.877	1.331	0.770	0.805	0.711
Spanish	1.765	1.029	1.084	0.744	0.762	0.615
Japanese	3.631	3.519	10.646	0.763	0.776	0.738
Korean	1.670	1.747	1.865	0.778	0.776	0.700
French	2.505	4.099	5.216	0.689	0.628	0.535
Russian	3.986	4.281	3.878	0.759	0.761	0.676

Cross-Lingual Speech Generation

Language	Qwen3-Omni-30B-A3B	CosyVoice3	CosyVoice2
en-to-zh	5.37	5.09	13.5
ja-to-zh	3.32	3.05	48.1
ko-to-zh	0.99	1.06	7.70
zh-to-en	2.76	2.98	6.47
ja-to-en	3.31	4.20	17.1
ko-to-en	3.34	4.19	11.2
zh-to-ja	8.29	7.08	13.1
en-to-ja	7.53	6.80	14.9
ko-to-ja	4.24	3.93	5.86
zh-to-ko	5.13	14.4	24.8
en-to-ko	4.96	5.87	21.9
ja-to-ko	6.23	7.92	21.5

Resources

Link: https://huggingface.co/Qwen/Qwen3-Omni-30B-A3B-Instruct

Step-by-Step Process to Install & Run Qwen3-Omni-30B-A3B-Instruct Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H200s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x H200 GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running Qwen3-Omni-30B-A3B-Instruct, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based models like Qwen3-Omni-30B-A3B-Instruct.
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like Qwen3-Omni-30B-A3B-Instruct.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the Qwen3-Omni-30B-A3B-Instruct runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Check the Available Python version and Install the new version

Run the following commands to check the available Python version.

If you check the version of the python, system has Python 3.8.1 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.

Run the following commands to add the deadsnakes PPA:

sudo apt update
sudo apt install -y software-properties-common
sudo add-apt-repository -y ppa:deadsnakes/ppa
sudo apt update

Step 9: Install Python 3.11

Now, run the following command to install Python 3.11 or another desired version:

sudo apt install -y python3.11 python3.11-venv python3.11-dev

Step 10: Update the Default `Python3` Version

Now, run the following command to link the new Python version as the default python3:

sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.8 1
sudo update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 2
sudo update-alternatives --config python3

Then, run the following command to verify that the new Python version is active:

python3 --version

Step 11: Install and Update Pip

Run the following command to install and update the pip:

python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel
python3.11 -m pip -V

Then, run the following command to check the version of pip:

pip --version

Step 12: Set Up Python Environment

Run the following command to setup the Python environment:

python3.11 -m venv /opt/py311
source /opt/py311/bin/activate
python -V
pip -V

Step 13: Install Transformers, Accelerate & Qwen Omni Utils

Run the following commands to install transformers, accelerate & qwen omni utils:

pip install git+https://github.com/huggingface/transformers
pip install accelerate
pip install qwen-omni-utils -U

Step 14: Install Wheel and Flash Attention

Run the following commands to install wheel and flash attention:

pip install wheel
pip install -U flash-attn --no-build-isolation

Step 15: Install PyTorch with GPU support

Run the following command to install torch:

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124

Step 16: Install FFMPEG

Run the following command to install ffmpeg:

apt update
apt install -y ffmpeg libsndfile1

Step 17: Connect to Your GPU VM with a Code Editor

Before you start running model script with the Qwen3-Omni-30B-A3B-Instruct model, it’s a good idea to connect your GPU virtual machine (VM) to a code editor of your choice. This makes writing, editing, and running code much easier.

You can use popular editors like VS Code, Cursor, or any other IDE that supports SSH remote connections.
In this example, we’re using cursor code editor.
Once connected, you’ll be able to browse files, edit scripts, and run commands directly on your remote server, just like working locally.

Why do this?
Connecting your VM to a code editor gives you a powerful, streamlined workflow for Python development, allowing you to easily manage your code, install dependencies, and experiment with large models.

Step 18: Create the Script

Create a file (ex: # app.py) and add the following code:

import os
os.environ["TRANSFORMERS_ATTENTION_IMPLEMENTATION"] = "sdpa"  # force SDPA

from transformers import (
    Qwen3OmniMoeForConditionalGeneration,
    Qwen3OmniMoeProcessor,
    logging as hf_logging,
)
from qwen_omni_utils import process_mm_info
import torch
import soundfile as sf

hf_logging.set_verbosity_error()

MODEL_PATH = "Qwen/Qwen3-Omni-30B-A3B-Instruct"
USE_AUDIO_IN_VIDEO = True

# Prefer SDPA; keep FA2 disabled
torch.backends.cuda.enable_flash_sdp(False)
torch.backends.cuda.enable_mem_efficient_sdp(True)
torch.backends.cuda.enable_math_sdp(True)

# Load model/processor
model = Qwen3OmniMoeForConditionalGeneration.from_pretrained(
    MODEL_PATH,
    device_map="auto",
    torch_dtype="auto",
    attn_implementation="sdpa",
    low_cpu_mem_usage=True,
)
processor = Qwen3OmniMoeProcessor.from_pretrained(MODEL_PATH)

# Conversation
conversation = [
    {
        "role": "user",
        "content": [
            {"type": "image", "image": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cars.jpg"},
            {"type": "audio", "audio": "https://qianwen-res.oss-cn-beijing.aliyuncs.com/Qwen3-Omni/demo/cough.wav"},
            {"type": "text", "text": "What can you see and hear? Answer in one short sentence."}
        ],
    },
]

# Prepare inputs
text = processor.apply_chat_template(conversation, add_generation_prompt=True, tokenize=False)
audios, images, videos = process_mm_info(conversation, use_audio_in_video=USE_AUDIO_IN_VIDEO)
inputs = processor(
    text=text,
    audio=audios,
    images=images,
    videos=videos,
    return_tensors="pt",
    padding=True,
    use_audio_in_video=USE_AUDIO_IN_VIDEO,
)

# Move to device; cast ONLY floating tensors
for k, v in inputs.items():
    if isinstance(v, torch.Tensor):
        if torch.is_floating_point(v):
            inputs[k] = v.to(model.device, dtype=model.dtype)
        else:
            inputs[k] = v.to(model.device)

# Generate (ask for dict-style outputs)
with torch.inference_mode():
    gen_out = model.generate(
        **inputs,
        speaker="Ethan",
        use_audio_in_video=USE_AUDIO_IN_VIDEO,
        return_dict_in_generate=True,          # <-- ensure .sequences
        thinker_return_dict_in_generate=True,  # <-- thinker submodule returns dict too
        # max_new_tokens=128,  # (optional) keep runtime predictable
    )

# Some builds return a tuple (text_out, audio); handle both
audio = None
text_out = gen_out
if isinstance(gen_out, tuple):
    text_out, audio = gen_out
else:
    audio = getattr(gen_out, "audio", None)

# Get sequences tensor whether it's a ModelOutput or a plain tensor
sequences = getattr(text_out, "sequences", text_out)

# Decode just the newly generated tokens
prompt_len = inputs["input_ids"].shape[1]
decoded = processor.batch_decode(
    sequences[:, prompt_len:],
    skip_special_tokens=True,
    clean_up_tokenization_spaces=False,
)
print(decoded)

# Save audio if present
if audio is not None:
    if isinstance(audio, torch.Tensor):
        audio_np = audio.reshape(-1).detach().cpu().numpy()
    else:
        # already numpy or list
        import numpy as np
        audio_np = np.asarray(audio).reshape(-1)
    sf.write("output.wav", audio_np, samplerate=24000)
    print("Saved: output.wav")

What This Script Does

Forces Transformers to use SDPA attention (not FlashAttention) for compatibility.
Loads the Qwen3-Omni-30B-A3B-Instruct multimodal model + processor on GPU with torch_dtype="auto".
Builds a multimodal chat: one image URL, one audio URL, and a short user text prompt.
Uses qwen_omni_utils.process_mm_info + the processor to prepare tensors for text, audio, image (and optional video).
Moves inputs to the model’s device, casting only floating tensors to the model dtype (keeps integer IDs intact).
Calls model.generate(...) (dict-style outputs enabled) with speaker="Ethan" and thinker outputs on.
Decodes just the newly generated text (skips the prompt tokens) and prints the response.
If the model returns audio, saves it to output.wav at 24 kHz.
Silences most HF warnings for a cleaner log.
Uses torch.inference_mode() for memory/speed efficiency during generation.

Step 19: Run the Script

Run the script from the following command:

python3 test_jina.py

This will download the model and generate response on terminal.

Conclusion

That’s it—you’ve got Qwen3-Omni-30B-A3B-Instruct running end-to-end on a NodeShift GPU VM, with clean SDPA attention (no FlashAttention headaches), multimodal inputs (image + audio), and real-time text/speech output. This setup is reproducible, stable, and ready for experiments—whether you’re testing ASR/AV, building a voice chat demo, or benchmarking against your own datasets.

Next up, try serving with vLLM for throughput, switch to the Thinking variant for chain-of-thought text only, and plug in your own media streams. If you share results, tag NodeShift—we’d love to see what you build.

Relevant blog posts

October 11, 2025

How to Install & Run Qwen3-VL-30B-A3B-Thinking Locally?

Qwen3-VL-30B-A3B-Thinking is one of the most advanced multimodal reasoning models in the Qwen3 series, designed to seamlessly fuse text, vision, and video understanding with large-scale reasoning. Built on a Mixture-of-Experts (MoE) architecture with 30B active parameters, the model introduces a specialized Thinking variant, tuned for deep multimodal reasoning across STEM, math, and complex real-world scenarios. Key Strengths Include Visual Agent Capabilities – Can perceive GUI elements, invoke tools, and complete tasks on PC/mobile interfaces. Visual Coding Boost – Converts diagrams, screenshots, and videos into structured code artifacts (e.g., HTML, CSS, JavaScript, Draw.io). Advanced Spatial & Video Perception – Supports 3D grounding, object occlusion reasoning, timestamp alignment, and long-horizon video comprehension. Massive Context Handling – Native 256K tokens, expandable up to 1M, enabling book-level comprehension or hours-long video indexing. Robust OCR & Recognition – Trained on broad visual corpora, supports 32 languages, rare/ancient scripts, and noisy/tilted text scenarios. Unified Text-Vision Understanding – Matches pure LLMs in text reasoning while tightly aligning vision inputs for lossless multimodal comprehension. Overall, Qwen3-VL-30B-A3B-Thinking is positioned as a research-grade, enterprise-ready model that excels at multimodal STEM reasoning, vide

October 10, 2025

How to Install & Run Microsoft UserLM-8B Locally?

UserLM-8b is Microsoft’s open-weight large language model uniquely designed to simulate the “user” role in conversations. Unlike most LLMs that play the assistant role, UserLM-8b was fine-tuned on the WildChat-1M dataset to generate realistic user utterances. This makes it particularly useful for evaluating assistant LLMs, synthetic data generation, and research on user behavior modeling. Built on top of Llama-3.1-8B-Base, the model was fully fine-tuned with 227 hours of training on NVIDIA RTX A6000 GPUs. UserLM-8b can: Generate first-turn user queries given a task intent. Simulate multi-turn follow-up responses across long conversations. Signal the natural end of a conversation with a special token. Its evaluations show that UserLM-8b achieves lower perplexity, stronger distributional alignment, and more realistic conversational diversity compared to assistant-based simulators. While not designed as an assistant model, UserLM-8b helps researchers stress-test assistants under a wide range of conversational conditions, making it a valuable tool for robustness and evaluation studies.

October 9, 2025

How to Install & Run Servicenow Apriel-1.5-15b-Thinker Locally?

Apriel-1.5-15B-Thinker is ServiceNow’s open-weights multimodal reasoning model (image-text-to-text) built with an emphasis on mid-training/continual pre-training and high-quality text SFT—no RL. Despite its compact 15B size, it posts strong results (e.g., 52 on the Artificial Analysis Intelligence Index) and is designed to fit on a single GPU. It ships with an OpenAI-compatible vLLM recipe (custom parser for tools + reasoning) and an MIT license, making it practical for on-prem and research workflows.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.