Deploy Qwen3-Coder on VPS: Step-by-Step Guide to Build Your Own AI Coding Assistant

About 2 min

Deploy Qwen3-Coder on VPS: Step-by-Step Guide to Build Your Own AI Coding Assistant

This guide shows you how to deploy Alibaba’s open-source model Qwen3-Coder on a LightNode VPS, exposing it as an API service with an optional web frontend. Perfect for launching your own AI coding assistant or monetizing through APIs.

Overview

Purchase a VPS (LightNode)
Install the base environment (Python + Git)
Download and run the Qwen3-Coder model
Build an API with FastAPI
(Optional) Add a frontend (Gradio)
Test external access & configure security

1. Purchase a VPS (e.g., LightNode)

Register: www.lightnode.com
Recommended locations: Japan, Hong Kong, Singapore
OS: Ubuntu 20.04 LTS
Specs: 2 vCPU, 4GB RAM (no GPU required for CPU mode)
After setup, save the public IP and root password

2. Install Required Dependencies

SSH into your VPS:

ssh root@your_vps_ip

Update system packages:

apt update && apt upgrade -y

Install Python and Git:

apt install python3-pip git -y
pip3 install --upgrade pip

3. Download and Run Qwen3-Coder (CPU Version)

Install HuggingFace Transformers:

pip install transformers accelerate torch

Create a file qwen_server.py:

from transformers import AutoModelForCausalLM, AutoTokenizer
from fastapi import FastAPI, Request
import uvicorn

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True).eval()

app = FastAPI()

@app.post("/codegen")
async def codegen(request: Request):
    data = await request.json()
    prompt = data.get("prompt")
    input_ids = tokenizer(prompt, return_tensors="pt").input_ids
    outputs = model.generate(input_ids, max_new_tokens=256)
    response = tokenizer.decode(outputs[0], skip_special_tokens=True)
    return {"result": response}

if __name__ == "__main__":
    uvicorn.run(app, host="0.0.0.0", port=7860)

Start the service:

python3 qwen_server.py

Example API call:

POST http://your_ip:7860/codegen
BODY: { "prompt": "Write a Python web scraper" }

4. Optional: Add a Frontend (Gradio)

Install Gradio:

Create a new file qwen_gradio.py:

import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True).eval()

def generate_code(prompt):
    inputs = tokenizer(prompt, return_tensors="pt")
    outputs = model.generate(**inputs, max_new_tokens=256)
    return tokenizer.decode(outputs[0], skip_special_tokens=True)

gr.Interface(
    fn=generate_code,
    inputs="text",
    outputs="text",
    title="Qwen3-Coder API Demo"
).launch(server_name="0.0.0.0", server_port=7860)

Launch it:

python3 qwen_gradio.py

Open in browser: http://your_ip:7860

5. Security Suggestions

Enable UFW firewall:

apt install ufw
ufw allow OpenSSH
ufw allow 7860
ufw enable

For production use, consider using Nginx + Let’s Encrypt for HTTPS and domain support.

Suggested Project Structure

qwen-server/
├── qwen_server.py       # FastAPI backend API
├── qwen_gradio.py       # Gradio web UI
├── requirements.txt     # (Optional) dependency list
└── README.md            # Project description

Monetization Ideas & Use Cases

SaaS Coding Assistant: Build your own "GPT for coding" tool
Public API Service: Charge per call or subscription
AI-Powered Teaching Platform: Use model to auto-generate code & tutorials
Custom Automation Services: Script generation, code conversion, documentation

Summary

Component	Tool	Purpose
Model	Qwen3-Coder	Open-source code generation model
Hosting	LightNode VPS	Low-cost, global cloud platform
API	FastAPI	Lightweight Python API framework
Frontend	Gradio	Rapid demo UI builder