Available for Projects · UTC+8 Taiwan

I deploy AI that
actually stays running.

AI Deployment Engineer based in Taiwan. I run 14 LLMs on a self-hosted GPU server with 22 production services live 24/7. I build systems that work in the real world — not just in demos.

Start a Project → See the Infrastructure
14
LLM Models
in Production
22
AI Services
Running 24/7
6K+
Vector Memory
Chunks (RAG)
0
Cloud
Lock-in
100%
Data
Sovereignty
What I Build

AI that runs in your business,
not on a slide deck.

Every project is deployed on real infrastructure and tested against actual workflows — not mocked up for a presentation.

🤖
AI Business Automation
Replace repetitive workflows with AI agents. Quote generation, scheduling, customer triage — real digital employees, not chatbots.
🧠
RAG Knowledge Base
Turn 10 years of SOPs, specs, and emails into a queryable AI brain. Semantic search — finds meaning, not just keywords.
Hybrid Cloud-Edge AI
Sensitive data stays on your server. AI runs locally. You get full performance without handing your data to third parties.
👁️
Computer Vision
YOLO-based real-time detection. Production line anomalies, warehouse inventory, perimeter alerts — 24/7 without blinking.
💬
AI Secretary / Chatbot
Always-online AI that knows your business. Answers customer queries, books appointments, escalates to humans when needed.
📡
AEO Optimization
Make ChatGPT, Perplexity, and Gemini recommend your business when people ask. The next layer beyond traditional SEO.
Real Infrastructure

This isn't a demo environment.
This is what I run daily.

GX10 self-built GPU server. 14 models. 22 services. The same infrastructure I use to build your project.

AI Control Center Real-time monitoring of all 14 LLMs, knowledge base stats (6,021 chunks), resource usage dashboards
Self-hosted GPU Server NVIDIA GPU running inference locally. CPU/RAM/GPU monitoring. Zero cloud dependency for core inference
Vector Memory System ChromaDB + nomic-embed-text (768-dim). 6,021 memory chunks. Real semantic search across all knowledge bases
22 Live Services All running through Cloudflare Tunnel. Uptime monitoring, per-service traffic analytics, zero downtime deployments
AI Control Center dashboard showing 14 LLMs and system metrics
AI Control Center — Live production dashboard
GPU server system monitor showing real-time resource usage
GPU Server Monitor — Real-time inference load
Tech Stack

Tools I actually use.

Not a buzzword list. These are the technologies running in production on my server right now.

LLMsLlama 3 · Qwen · Gemma · Mistral · DeepSeek
InferenceOllama · vLLM · llama.cpp · GGUF
RAGChromaDB · FAISS · nomic-embed · LangChain
VisionYOLO · OpenCV · GStreamer · RTSP
Automationn8n · Python agents · Claude Code
InfrastructureProxmox LXC · Docker · Cloudflare
FrontendNext.js · React · Playwright
DataSQLite · PostgreSQL · Redis · Vector DBs
FAQ

Common questions.

What makes you different from other AI developers?

Most AI developers wrap an API call. I run the infrastructure. 14 LLMs on a self-hosted GPU server means I understand what breaks in production, not just what works in a Jupyter notebook. I've built systems that are still running 6 months later — not demos that get shelved.

Can you work with confidential or proprietary data?

Yes — it's actually my specialty. I build on-premise and hybrid deployments where your data never leaves your server. No sending documents to OpenAI. Your SOPs, customer data, and internal knowledge stay on your infrastructure.

What's your typical project scope?

From focused builds (a RAG knowledge base over your documents, 2–3 weeks) to full AI infrastructure setups (hybrid architecture + automation pipelines + monitoring, 2–3 months). I prefer scoped projects with clear deliverables over open-ended retainers.

What industries have you worked in?

Manufacturing (DXF laser cutting quoting), textile/fashion (AI pattern generation + virtual try-on), logistics, and business process automation across multiple SMEs in Taiwan. The problems are different; the architecture patterns are often the same.

What's your timezone and availability?

UTC+8 (Taiwan). Available for async collaboration with any timezone. For real-time calls: overlap easily with Singapore, Japan, Australia, and partially with Europe (morning). US projects work well async with daily updates.

Ready to build AI that
actually works?

Tell me the problem. I'll tell you if AI can solve it — and if it can, I'll build it so it keeps running.

Email me → 中文版