Deploy Qwen3-Coder on VPS: Step-by-Step Guide to Build Your Own AI Coding Assistant
Deploy Qwen3-Coder on VPS: Step-by-Step Guide to Build Your Own AI Coding Assistant
This guide shows you how to deploy Alibabaโs open-source model Qwen3-Coder on a LightNode VPS, exposing it as an API service with an optional web frontend. Perfect for launching your own AI coding assistant or monetizing through APIs.
Overview
- Purchase a VPS (LightNode)
- Install the base environment (Python + Git)
- Download and run the Qwen3-Coder model
- Build an API with FastAPI
- (Optional) Add a frontend (Gradio)
- Test external access & configure security
1. Purchase a VPS (e.g., LightNode)
- Register: www.lightnode.com
- Recommended locations: Japan, Hong Kong, Singapore
- OS: Ubuntu 20.04 LTS
- Specs: 2 vCPU, 4GB RAM (no GPU required for CPU mode)
- After setup, save the public IP and root password
2. Install Required Dependencies
SSH into your VPS:
ssh root@your_vps_ip
Update system packages:
apt update && apt upgrade -y
Install Python and Git:
apt install python3-pip git -y
pip3 install --upgrade pip
3. Download and Run Qwen3-Coder (CPU Version)
Install HuggingFace Transformers:
pip install transformers accelerate torch
Create a file qwen_server.py:
from transformers import AutoModelForCausalLM, AutoTokenizer
from fastapi import FastAPI, Request
import uvicorn
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True).eval()
app = FastAPI()
@app.post("/codegen")
async def codegen(request: Request):
data = await request.json()
prompt = data.get("prompt")
input_ids = tokenizer(prompt, return_tensors="pt").input_ids
outputs = model.generate(input_ids, max_new_tokens=256)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
return {"result": response}
if __name__ == "__main__":
uvicorn.run(app, host="0.0.0.0", port=7860)
Start the service:
python3 qwen_server.py
Example API call:
POST http://your_ip:7860/codegen
BODY: { "prompt": "Write a Python web scraper" }
4. Optional: Add a Frontend (Gradio)
Install Gradio:
Create a new file qwen_gradio.py:
import gradio as gr
from transformers import AutoModelForCausalLM, AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained("Qwen/Qwen1.5-0.5B-Chat", trust_remote_code=True).eval()
def generate_code(prompt):
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
return tokenizer.decode(outputs[0], skip_special_tokens=True)
gr.Interface(
fn=generate_code,
inputs="text",
outputs="text",
title="Qwen3-Coder API Demo"
).launch(server_name="0.0.0.0", server_port=7860)
Launch it:
python3 qwen_gradio.py
Open in browser: http://your_ip:7860
5. Security Suggestions
Enable UFW firewall:
apt install ufw
ufw allow OpenSSH
ufw allow 7860
ufw enable
For production use, consider using Nginx + Letโs Encrypt for HTTPS and domain support.
Suggested Project Structure
qwen-server/
โโโ qwen_server.py # FastAPI backend API
โโโ qwen_gradio.py # Gradio web UI
โโโ requirements.txt # (Optional) dependency list
โโโ README.md # Project description
Monetization Ideas & Use Cases
SaaS Coding Assistant: Build your own "GPT for coding" tool
Public API Service: Charge per call or subscription
AI-Powered Teaching Platform: Use model to auto-generate code & tutorials
Custom Automation Services: Script generation, code conversion, documentation
Summary
Component | Tool | Purpose |
---|---|---|
Model | Qwen3-Coder | Open-source code generation model |
Hosting | LightNode VPS | Low-cost, global cloud platform |
API | FastAPI | Lightweight Python API framework |
Frontend | Gradio | Rapid demo UI builder |