How to Install & Run JanusCoderV-8B Locally?

by Ayush Kumar | November 1, 2025

Ready to build cheaper?

Custom CPU plans from as little as $0.012/hour.

JanusCoderV-8B is an 8B multimodal code-intelligence model from InternLM’s JanusCoder suite, built on InternVL-3.5-8B. Trained on JANUSCODE-800K, it unifies visual + programmatic inputs to generate and edit code for charts, interactive web UIs, and animation logic. It supports image-conditioned code generation, visual-grounded edits, and long outputs (demo shows max_new_tokens up to 32K) using standard Transformers (≥ 4.55.0) with AutoProcessor + AutoModelForCausalLM and remote code enabled.

Performance

Model	JanusCoderV-8B	Qwen2.5VL-7B-Instruct	InternVL3-8B	InternVL3.5-8B	MiniCPM-V-2-6	Llama3.2-11B-Vision-Instruct	GPT-4o
ChartMimic (Customized)	74.20	58.69	60.04	59.55	48.18	39.63	67.42
DesignBench (Gen)	68.86	72.73	69.34	71.73	66.25	62.24	76.83
DesignBench (Edit)	8.63	6.85	7.76	8.63	4.56	6.61	9.23
WebCode2M	18.28	12.83	12.40	11.95	9.73	6.57	13.00
InteractScience (Func.)	17.60	8.40	8.93	11.47	0.13	6.67	27.20
InteractScience (Visual)	33.32	19.83	53.35	24.17	7.70	13.24	46.01

GPU Configuration

GPU	VRAM	Recommended Load	Suggested Batch	Context (approx)	Images/Prompt	What to Expect
T4	16 GB	4-bit (bnb), vision on GPU	1	4–8K	1	Bare-minimum dev; slow decoding
L4	24 GB	8-bit (bnb) or 4-bit; BF16 vision	1–2	8–16K	1–2	Good for demos, light service
RTX 3060	12 GB	4-bit (bnb)	1	4–8K	1	Fits with tight cache; slower
RTX 3070 (8G)	8 GB	4-bit + offload	1	≤4K	1	Only for experiments; very tight
RTX 3080 (10G)	10 GB	4-bit (bnb)	1	4–8K	1	Similar to 3060; tight headroom
RTX 3090	24 GB	8-bit (bnb) or 4-bit; BF16 vision	2–3	8–16K	1–2	Solid single-GPU dev box
RTX 4080	16 GB	8-bit (bnb)	1–2	8–12K	1	Fast; watch KV cache growth
RTX 4090	24 GB	8-bit (bnb) or FP16/BF16 w/ small ctx	2–4	8–16K (FP16 with care)	1–2	Great workstation; good latency
RTX A4000	16 GB	8-bit (bnb)	1–2	8–12K	1	Stable, moderate throughput
RTX A5000	24 GB	8-bit (bnb) or FP16 (tight)	2–3	8–16K	1–2	Good for small services
RTX A6000 (48G)	48 GB	FP16/BF16 full	4–8	16–32K	2–4	Smooth long outputs; strong
L40 (48G)	48 GB	FP16/BF16 full	4–8	16–32K	2–4	Datacenter class; reliable
L40S (48G)	48 GB	FP16/BF16 full	6–10	16–32K	2–4	Faster than L40; great serving
A100 (40G)	40 GB	FP16/BF16 full	4–6	16–24K	2–3	Proven workhorse
A100 (80G)	80 GB	FP16/BF16 + big KV	8–12	32K+	3–5	Long code outputs; high throughput
H100 (80G)	80 GB	FP16/BF16 + big KV	10–16	32K+	3–6	Top-tier speed; great latency
H200 (141G)	141 GB	FP16/BF16, huge cache	16–24	64K+	4–8	Extreme context/throughput
2× 24–48G (e.g., 2×3090 / 2×A5000 / 2×A6000)	—	Tensor Parallel (TP=2), BF16	8–16	16–32K	2–4	Scale when single-GPU is tight
2× 80G (2×A100/H100)	—	TP=2, BF16	16–32	32K+	4–8	High-throughput production

Resources

Link: https://huggingface.co/internlm/JanusCoderV-8B

Step-by-Step Process to Install & Run JanusCoder Locally

For the purpose of this tutorial, we will use a GPU-powered Virtual Machine offered by NodeShift; however, you can replicate the same steps with any other cloud provider of your choice. NodeShift provides the most affordable Virtual Machines at a scale that meets GDPR, SOC2, and ISO27001 requirements.

Step 1: Sign Up and Set Up a NodeShift Cloud Account

Visit the NodeShift Platform and create an account. Once you’ve signed up, log into your account.

Follow the account setup process and provide the necessary details and information.

Step 2: Create a GPU Node (Virtual Machine)

GPU Nodes are NodeShift’s GPU Virtual Machines, on-demand resources equipped with diverse GPUs ranging from H200s to A100s. These GPU-powered VMs provide enhanced environmental control, allowing configuration adjustments for GPUs, CPUs, RAM, and Storage based on specific requirements.

Navigate to the menu on the left side. Select the GPU Nodes option, create a GPU Node in the Dashboard, click the Create GPU Node button, and create your first Virtual Machine deploy

Step 3: Select a Model, Region, and Storage

In the “GPU Nodes” tab, select a GPU Model and Storage according to your needs and the geographical region where you want to launch your model.

We will use 1 x H100 SXM GPU for this tutorial to achieve the fastest performance. However, you can choose a more affordable GPU with less VRAM if that better suits your requirements.

Step 4: Select Authentication Method

There are two authentication methods available: Password and SSH Key. SSH keys are a more secure option. To create them, please refer to our official documentation.

Step 5: Choose an Image

In our previous blogs, we used pre-built images from the Templates tab when creating a Virtual Machine. However, for running JanusCoder, we need a more customized environment with full CUDA development capabilities. That’s why, in this case, we switched to the Custom Image tab and selected a specific Docker image that meets all runtime and compatibility requirements.

We chose the following image:

nvidia/cuda:12.1.1-devel-ubuntu22.04

This image is essential because it includes:

Full CUDA toolkit (including nvcc)
Proper support for building and running GPU-based models like JanusCoder.
Compatibility with CUDA 12.1.1 required by certain model operations

Launch Mode

We selected:

Interactive shell server

This gives us SSH access and full control over terminal operations — perfect for installing dependencies, running benchmarks, and launching models like JanusCoder.

Docker Repository Authentication

We left all fields empty here.

Since the Docker image is publicly available on Docker Hub, no login credentials are required.

Identification

Template Name:

nvidia/cuda:12.1.1-devel-ubuntu22.04

CUDA and cuDNN images from gitlab.com/nvidia/cuda. Devel version contains full cuda toolkit with nvcc.

This setup ensures that the JanusCoder runs in a GPU-enabled environment with proper CUDA access and high compute performance.

After choosing the image, click the ‘Create’ button, and your Virtual Machine will be deployed.

Step 6: Virtual Machine Successfully Deployed

You will get visual confirmation that your node is up and running.

Step 7: Connect to GPUs using SSH

NodeShift GPUs can be connected to and controlled through a terminal using the SSH key provided during GPU creation.

Once your GPU Node deployment is successfully created and has reached the ‘RUNNING’ status, you can navigate to the page of your GPU Deployment Instance. Then, click the ‘Connect’ button in the top right corner.

Now open your terminal and paste the proxy SSH IP or direct SSH IP.

Next, If you want to check the GPU details, run the command below:

nvidia-smi

Step 8: Install Python 3.11 and Pip (VM already has Python 3.10; We Update It)

Run the following commands to check the available Python version.

If you check the version of the python, system has Python 3.10.12 available by default. To install a higher version of Python, you’ll need to use the deadsnakes PPA.

Run the following commands to add the deadsnakes PPA:

apt update && apt install -y software-properties-common curl ca-certificates
add-apt-repository -y ppa:deadsnakes/ppa
apt update

Now, run the following commands to install Python 3.11, Pip and Wheel:

apt install -y python3.11 python3.11-venv python3.11-dev
python3.11 -m ensurepip --upgrade
python3.11 -m pip install --upgrade pip setuptools wheel
python3.11 --version
python3.11 -m pip --version

Step 9: Created and Activated Python 3.11 Virtual Environment

Run the following commands to created and activated Python 3.11 virtual environment:

python3.11 -m venv ~/.venvs/py311
source ~/.venvs/py311/bin/activate
python --version
pip --version

Step 10: Install PyTorch for CUDA

Run the following command to install PyTorch:

pip install --index-url https://download.pytorch.org/whl/cu121 torch torchvision torchaudio

Step 11: Install Core Libs

Run the following command to install core libs:

pip install -U "transformers>=4.57.0" accelerate huggingface-hub safetensors pillow requests
pip install -U bitsandbytes

Step 12: Connect to Your GPU VM with a Code Editor

Before you start running model script with the JanusCoder model, it’s a good idea to connect your GPU virtual machine (VM) to a code editor of your choice. This makes writing, editing, and running code much easier.

You can use popular editors like VS Code, Cursor, or any other IDE that supports SSH remote connections.
In this example, we’re using cursor code editor.
Once connected, you’ll be able to browse files, edit scripts, and run commands directly on your remote server, just like working locally.

Why do this?
Connecting your VM to a code editor gives you a powerful, streamlined workflow for Python development, allowing you to easily manage your code, install dependencies, and experiment with large models.

Step 13: Create the Script

Create a file (ex: #run_januscoder_v8b.py) and add the following code:

#!/usr/bin/env python3
# JanusCoderV-8B runner (InternVL head)
# Uses AutoModelForImageTextToText + AutoProcessor and supports URL/local images.

import argparse, io, sys, requests, torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForImageTextToText  # <-- key class

MODEL_NAME = "internlm/JanusCoderV-8B"

def load_image_from_url(url: str) -> Image.Image:
    r = requests.get(url, timeout=30)
    r.raise_for_status()
    return Image.open(io.BytesIO(r.content)).convert("RGB")

def load_image_local(path: str) -> Image.Image:
    return Image.open(path).convert("RGB")

def main():
    p = argparse.ArgumentParser()
    src = p.add_mutually_exclusive_group(required=True)
    src.add_argument("--image-url", type=str)
    src.add_argument("--image-path", type=str)
    p.add_argument("--task", type=str, default="Please describe the image explicitly.")
    p.add_argument("--max-new-tokens", type=int, default=1024)
    p.add_argument("--bits8", action="store_true", help="8-bit load (needs bitsandbytes)")
    p.add_argument("--no-bf16", action="store_true", help="Force FP16 inputs")
    args = p.parse_args()

    use_bf16 = (not args.no_bf16) and torch.cuda.is_available() and torch.cuda.is_bf16_supported()
    input_dtype = torch.bfloat16 if use_bf16 else torch.float16
    print(f"torch={torch.__version__} | cuda={torch.cuda.is_available()} | bf16_ok={use_bf16} | dtype={input_dtype}")

    print("Loading processor …")
    processor = AutoProcessor.from_pretrained(MODEL_NAME, trust_remote_code=True)

    print("Loading model …")
    load_kwargs = dict(device_map="auto", trust_remote_code=True)
    if args.bits8:
        load_kwargs["load_in_8bit"] = True
    else:
        load_kwargs["dtype"] = input_dtype

    model = AutoModelForImageTextToText.from_pretrained(MODEL_NAME, **load_kwargs).eval()

    # Build messages with either URL or PIL image
    content = []
    if args.image_url:
        content.append({"type": "image", "url": args.image_url})
    else:
        pil = load_image_local(args.image_path)
        content.append({"type": "image", "image": pil})
    content.append({"type": "text", "text": args.task})
    messages = [{"role": "user", "content": content}]

    print("Tokenizing …")
    inputs = processor.apply_chat_template(
        messages,
        add_generation_prompt=True,
        tokenize=True,
        return_dict=True,
        return_tensors="pt",
    )

    # Move input tensors to model device/dtype
    dev = next(iter(model.parameters())).device
    for k, v in list(inputs.items()):
        if torch.is_floating_point(v):
            inputs[k] = v.to(dev, dtype=input_dtype)
        else:
            inputs[k] = v.to(dev)

    print("Generating …")
    with torch.inference_mode():
        out_ids = model.generate(**inputs, max_new_tokens=args.max_new_tokens, do_sample=False, use_cache=True)

    prompt_len = inputs["input_ids"].shape[1]
    text = processor.decode(out_ids[0, prompt_len:], skip_special_tokens=True)
    print("\n" + "=" * 80 + "\nOUTPUT:\n" + "=" * 80)
    print(text)

if __name__ == "__main__":
    main()

What This Script Does

Loads JanusCoderV-8B with the correct InternVL head (AutoModelForImageTextToText) and its processor.
Accepts either an image URL or a local image path, plus a custom instruction/task.
Builds a multimodal chat message (image + text), tokenizes it with the model’s chat template, and moves tensors to the right device/dtype (BF16/FP16 or 8-bit).
Runs generation (model.generate) with configurable max_new_tokens, then decodes only the new tokens after the prompt.
Prints a clean final output (the model’s response) to the console.

Step 14: Run the Model for a Quick Test

Once everything is installed and the script (run_januscoder_v8b.py) is saved, run the following command to verify that the model works correctly:

python run_januscoder_v8b.py \
  --image-url http://images.cocodataset.org/val2017/000000039769.jpg

What This Does

Loads the JanusCoderV-8B multimodal model (InternVL-based) into GPU memory.
Downloads and processes the sample COCO validation image.
Sends the image and the prompt “Please describe the image explicitly.” to the model.
The model generates a textual description of the image and prints it directly to the terminal.

When you see the generated description printed under the “OUTPUT” section, your JanusCoderV-8B setup is confirmed to be working properly.

Step 15: UI → HTML/CSS (From the Same URL)

Conclusion

JanusCoderV-8B truly stands out as a next-generation visual-programmatic model, seamlessly connecting images, text, and code in one unified workflow.
By following this step-by-step guide, you’ve learned how to:

Deploy a GPU-powered environment on NodeShift Cloud.
Set up CUDA, Python, and PyTorch for model execution.
Install dependencies and configure Transformers (≥4.57.0).
Run JanusCoderV-8B locally using the AutoModelForImageTextToText pipeline.

With this setup, you can now generate detailed visual descriptions, create UI-to-code conversions, or even perform interactive design edits directly from images.
Whether you’re a developer, designer, or researcher, JanusCoderV-8B opens new possibilities for building intelligent, multimodal coding experiences powered by visual context.

Relevant blog posts

October 31, 2025

A Step-By-Step Guide to Install & Run Kimi Linear

In an era where attention mechanisms are redefining efficiency in large language models, Kimi Linear emerges as a breakthrough innovation designed for extreme scalability without compromise. Built upon the novel Kimi Delta Attention (KDA) architecture, it reimagines how models process information across both short and million-token-long contexts. Unlike conventional full attention systems that buckle under long sequences, Kimi Linear offers a hybrid linear attention framework combining the precision of global attention with the blazing speed and memory efficiency of KDA. The results speak for themselves, it achieves 51.0 on MMLU-Pro (4k) while maintaining the same speed as full attention, and delivers Pareto-optimal 84.3 on RULER (128k) with a 3.98× speedup. Even more impressively, Kimi Linear pushes decoding throughput up to 6× faster and cuts KV cache requirements by 75%, making it one of the most efficient architectures for high-throughput, long-context reasoning.

October 29, 2025

How to Install & Run Chandra-OCR Locally?

Chandra is Datalab’s next-generation OCR model built for precise document understanding. It goes beyond simple text extraction — converting images and PDFs into structured Markdown, HTML, or JSON while preserving original layout details like tables, forms, and diagrams. With strong support for handwriting, math equations, and multi-column layouts across 40+ languages, Chandra achieves an overall accuracy of 83.1% on the olmOCR benchmark, outperforming most open and commercial OCR systems. It can be used easily via CLI, VLLM, Hugging Face, or a Streamlit app, making it versatile for developers, researchers, and document intelligence workflows.

October 27, 2025

How to Install & Run LiquidAI LFM2-VL Locally?

LFM2-VL-450M is the most compact and efficient model in Liquid AI’s LFM2-VL family, designed for low-latency multimodal inference on edge and cloud GPUs. With only 450M parameters (350M text + 86M vision encoder), it delivers reliable image-text reasoning at 2× faster speeds than typical VLMs in its size range. It supports native 512×512 resolution, dynamic vision token handling, and can be fine-tuned easily for domain-specific visual understanding tasks such as product tagging, document OCR, and quick caption generation. Its minimal footprint makes it ideal for real-time multimodal inference on affordable GPUs.

See all posts

Ready to build
with us?

The ideal way for organizations young and old to ease their way into the distributed and affordable cloud at their own pace.

Stay Tuned!

Stay up to date with the latest updates, news, and hotfixes for our product.

NodeShift creates a vital link between developers and affordable cloud.

Switch theme

English (EN)
Arabic (AR)
Chinese (ZH-CN)
German (DE)
Korean (KO)
Russian (RU)
French (FR)
Spanish (ES)
Portuguese (PT)
Japanese (JA)

JavaScript is disabled in your browser. For a better experience, please enable JavaScript.Learn how to enable JavaScript.