Kiln Quickstart — Run Qwen3.5-4B in 5 minutes

Reader map

Know where to stop

Basic path

Get Kiln running first

Follow prerequisites, install, model download, server start, /health, and /ui. If chat works, the first-run path is complete.

Optional learning

Train after verification

Use SFT, GRPO, and adapter workflows only after the server is healthy and you have sent one chat request.

Advanced reference

Return when integrating

Use the API Reference and CLI Reference for tools, batch generation, adapter import/export, merge, composition, and webhooks.

1

Prerequisites + choose one path

Install Kiln

Desktop App · recommended

Download Kiln Desktop v0.2.15, then choose or download the Qwen3.5-4B model in the app and start the server from the GUI.

Platform	Installer	Size
macOS Apple Silicon	Kiln.Desktop_0.2.15_aarch64.dmg	8.5 MB
Windows	Kiln.Desktop_0.2.15_x64-setup.exe (NSIS)	4.5 MB
Windows	Kiln.Desktop_0.2.15_x64_en-US.msi (MSI)	6.8 MB
Linux	Kiln.Desktop_0.2.15_amd64.deb	8.8 MB
Linux	Kiln.Desktop_0.2.15_amd64.AppImage	85.7 MB

Desktop and server release lines intentionally differ: desktop-v0.2.15 is the latest Desktop app release, and the app downloads and verifies the latest kiln-v* server binary for you.

Linux x86_64 · CUDA 12.4

KILN_VERSION=$(curl -fsSL https://api.github.com/repos/ericflo/kiln/releases/latest | sed -n 's/.*"tag_name": "kiln-v\([^"]*\)".*/\1/p')
curl -L -o kiln-linux-cuda.tar.gz \
  "https://github.com/ericflo/kiln/releases/download/kiln-v${KILN_VERSION}/kiln-${KILN_VERSION}-x86_64-unknown-linux-gnu-cuda124.tar.gz"
tar -xzf kiln-linux-cuda.tar.gz

Linux x86_64 · Vulkan 1.2

KILN_VERSION=$(curl -fsSL https://api.github.com/repos/ericflo/kiln/releases/latest | sed -n 's/.*"tag_name": "kiln-v\([^"]*\)".*/\1/p')
curl -L -o kiln-linux-vulkan.tar.gz \
  "https://github.com/ericflo/kiln/releases/download/kiln-v${KILN_VERSION}/kiln-${KILN_VERSION}-x86_64-unknown-linux-gnu-vulkan.tar.gz"
tar -xzf kiln-linux-vulkan.tar.gz

Use this on AMD/Intel Linux systems where vulkaninfo --summary lists the GPU.

macOS Apple Silicon · Metal

KILN_VERSION=$(curl -fsSL https://api.github.com/repos/ericflo/kiln/releases/latest | sed -n 's/.*"tag_name": "kiln-v\([^"]*\)".*/\1/p')
curl -L -o kiln-macos.tar.gz \
  "https://github.com/ericflo/kiln/releases/download/kiln-v${KILN_VERSION}/kiln-${KILN_VERSION}-aarch64-apple-darwin-metal.tar.gz"
tar -xzf kiln-macos.tar.gz

Windows x86_64 · CUDA 12.4

$KilnVersion = ((Invoke-RestMethod https://api.github.com/repos/ericflo/kiln/releases/latest).tag_name -replace '^kiln-v', '')
curl.exe -L -o kiln-windows.zip `
  "https://github.com/ericflo/kiln/releases/download/kiln-v$KilnVersion/kiln-$KilnVersion-x86_64-pc-windows-msvc-cuda124.zip"
Expand-Archive .\kiln-windows.zip -DestinationPath .\kiln

2

Model path

Download Qwen3.5-4B

Point KILN_MODEL_PATH at a local checkout of Qwen/Qwen3.5-4B.

pip install huggingface-hub
huggingface-cli download Qwen/Qwen3.5-4B --local-dir ./Qwen3.5-4B
export KILN_MODEL_PATH=./Qwen3.5-4B

3

Start

Run the server

Server binaries bind to 127.0.0.1:8420 by default.

KILN_MODEL_PATH=./Qwen3.5-4B ./kiln serve

Docker (Linux + NVIDIA Container Toolkit)

docker run --gpus all -p 8420:8420 \
  -e KILN_MODEL_PATH=/models/Qwen3.5-4B \
  -v "$PWD/Qwen3.5-4B:/models/Qwen3.5-4B:ro" \
  ghcr.io/ericflo/kiln-server:latest serve

4

Verify

Open the UI, check health, send chat

Open the dashboard

Visit http://127.0.0.1:8420/ui to inspect status, adapters, training jobs, and quick inference from the dashboard.

Check `/health`

kiln health

curl -s http://127.0.0.1:8420/health \
  | python3 -m json.tool

Send chat

curl -s http://127.0.0.1:8420/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "messages": [{"role": "user", "content": "What is 2+2?"}],
    "max_tokens": 64,
    "temperature": 0.7
  }' | python3 -m json.tool

This is the first inference checkpoint: get one response before moving on to SFT or GRPO.

Kiln dashboard showing server status, adapters, training, and quick inference controls — Use the dashboard as your status checkpoint before starting adapter or training workflows.

If /ui, /health, or the chat request fails, use Troubleshooting to check the model path, binary download, CUDA/Vulkan/Metal setup, Docker, and health checks before retrying.

Prefer terminal-first checks? See the CLI Reference for kiln health, training, and adapter commands.

Where to go next

Training file shapes at a glance

kiln train sft reads SFT JSONL: one chat correction per line, each with a messages array. kiln train grpo reads a GRPO JSON request/batch: groups of candidate completions with reward scores. See the GRPO Guide for the generate → score → train loop and the CLI Reference for full command examples.

Run Kiln, open the UI, and send one chat request.

Know where to stop

Get Kiln running first

Train after verification

Return when integrating

Install Kiln

Desktop App · recommended

Linux x86_64 · CUDA 12.4

Linux x86_64 · Vulkan 1.2

macOS Apple Silicon · Metal

Windows x86_64 · CUDA 12.4

Download Qwen3.5-4B

Run the server

Docker (Linux + NVIDIA Container Toolkit)

Open the UI, check health, send chat

Open the dashboard

Check `/health`

Send chat

Where to go next

Training file shapes at a glance

SFT corrections

GRPO Guide

API Reference

CLI Reference

Troubleshooting

Demo

Architecture

Run Kiln, open the UI, and send one chat request.

Know where to stop

Get Kiln running first

Train after verification

Return when integrating

Install Kiln

Desktop App · recommended

Linux x86_64 · CUDA 12.4

Linux x86_64 · Vulkan 1.2

macOS Apple Silicon · Metal

Windows x86_64 · CUDA 12.4

Download Qwen3.5-4B

Run the server

Docker (Linux + NVIDIA Container Toolkit)

Open the UI, check health, send chat

Open the dashboard

Check /health

Send chat

Where to go next

Training file shapes at a glance

SFT corrections

GRPO Guide

API Reference

CLI Reference

Troubleshooting

Demo

Architecture

Check `/health`