Kiln Troubleshooting — First-run checks

Wrong binary or GPU path

Symptom

The binary exits early, reports no usable accelerator, or starts with much lower performance than expected.

Check

Linux and Windows CUDA builds require an NVIDIA GPU.
Linux Vulkan builds require a Vulkan 1.2+ AMD/Intel driver; vulkaninfo --summary should list the target GPU.
CUDA release builds target CUDA 12.4-era systems.
macOS uses the Apple Silicon Metal artifact, not a CUDA or Vulkan artifact.

Fix

Download the matching artifact from GitHub releases. On Linux Vulkan hosts, use KILN_VULKAN_DEVICE=0 or GGML_VK_VISIBLE_DEVICES=0 to pin a device. On Linux with Docker, install NVIDIA Container Toolkit before using --gpus all.

Model weights are not found

Symptom

Startup fails with a missing model path, missing tokenizer, or missing safetensors message.

Check

The model directory exists on the same machine or inside the Docker container.
The path contains Qwen3.5-4B weights, config, and tokenizer files.
Relative paths are resolved from the current working directory.

Fix

Set the path explicitly. For Docker, mount the host directory at the same path you pass to the server.

KILN_MODEL_PATH=/models/Qwen3.5-4B ./kiln serve
./kiln serve --model-path /models/Qwen3.5-4B

`/health` is not green

Symptom

The HTTP server is reachable, but chat or training requests fail.

Check

/health reports the configured model path and device state.
/v1/models returns the model id you expected.
The listen address is localhost:8420 unless you changed it.

Fix

Fix the first failing health field before debugging chat payloads. Health output is the fastest way to separate setup issues from request-shape issues.

Remote server is not reachable

Symptom

Kiln is running on a GPU box, Tailscale host, or reverse-proxied machine, but client commands fail with a connection error.

Check

The default bind is local-only: 127.0.0.1:8420.
For private-network access, set server.host = "0.0.0.0" in config or start with KILN_HOST=0.0.0.0.
Only expose that bind on a trusted/private network or behind a reverse proxy that adds authentication.
From the client machine, verify the exact host with curl http://gpu-box:8420/health.

Fix

Open the firewall or private-network route to the server port, then point CLI client commands at the same base URL.

kiln health --url http://gpu-box:8420
kiln train status --url http://gpu-box:8420
kiln adapters list --url http://gpu-box:8420

Older-release long-prefill and tool-call timeouts

Symptom

On kiln-v0.2.9, long tools-bearing chat completions or long-prefill prompts could time out under concurrent load, then leave later requests failing until restart.

Check

This guidance is only for users intentionally pinned to kiln-v0.2.9.
Issue #664 tracks the tools-bearing cascade after a prefill timeout.
Issue #656 tracks the KV-cache-exhaustion and prefill-state-cleanup investigation.
Issue #686 tracks repeated long-prefill HTTP 408s around the old timeout boundary.

Fix

Upgrade to the latest kiln-v* release. If pinned to v0.2.9, run clients with workers=1 so requests are serialized; if concurrent workers are required, set server.request_timeout_secs to at least 600.

The workers=1 mitigation is client-side serialization, not a kiln server flag: keep at most one in-flight request in your driver, worker pool, or load generator. It avoids concurrent prefills, which is the condition that exposed the older degraded-state failure mode.

The request_timeout_secs >= 600 mitigation is also historical. It was useful for pinned kiln-v0.2.9 deployments that had to keep workers=2 while running tight KV-cache caps and roughly 20k-token-or-longer prefills. Current releases route prefix-cache prefill through the tiled/streaming dispatcher and include follow-on prefix-cache memory, KV auto-sizing, streaming-prefill, and observability fixes.

For the original diagnosis and bisect notes, read docs/audits/PHASE11_ISSUE_686_BISECT.md.

Mock mode is not real training

Symptom

A training request appears to complete, but inference quality does not change or no real adapter weights appear.

Check

Mock mode is for API-shape and UI checks only.
Real /v1/train/sft and /v1/train/grpo jobs require a loaded model and usable accelerator.
/v1/train/status should show completed work for the adapter name you sent.

Fix

Use mock mode to verify wiring, then run the same payload against a real model path. Treat training endpoints as privileged: posted examples update the active adapter.

Adapters are in a different directory than expected

Symptom

/v1/adapters does not list an adapter you trained, uploaded, or expected from a previous run.

Check

Adapter storage is separate from the base model weights.
Docker containers only see directories you mounted.
Different working directories can imply different relative adapter paths.

Fix

Use an explicit adapter directory in your config or CLI flags, and mount it into Docker just like the model directory. Then verify with GET /v1/adapters.

If Kiln does not start cleanly, check these first.

Start with three probes

If the Desktop App does not finish first launch

Wrong binary or GPU path

Model weights are not found

`/health` is not green

Remote server is not reachable

Older-release long-prefill and tool-call timeouts

Mock mode is not real training

Adapters are in a different directory than expected

Where to go next

Quickstart

GRPO guide

Architecture

API reference

CLI reference

GitHub issues

If Kiln does not start cleanly, check these first.

Start with three probes

If the Desktop App does not finish first launch

Wrong binary or GPU path

Model weights are not found

/health is not green

Remote server is not reachable

Older-release long-prefill and tool-call timeouts

Mock mode is not real training

Adapters are in a different directory than expected

Where to go next

Quickstart

GRPO guide

Architecture

API reference

CLI reference

GitHub issues

`/health` is not green