Troubleshooting · first-run checks

If Kiln does not start cleanly, check these first.

This page is a compact diagnosis guide for cold-reader setup issues. It does not replace the Quickstart; use it when a command fails, /health is not green, or the server is running but not using the model or adapter you expected.

Start with three probes

Binary

Pick the release artifact for your OS and accelerator: Linux/Windows CUDA builds for NVIDIA GPUs, the Linux Vulkan build for AMD/Intel GPUs, or the Apple Silicon Metal build on macOS arm64.

Model path

Point Kiln at the local Qwen3.5-4B weights with KILN_MODEL_PATH or --model-path. The path must contain the downloaded safetensors and tokenizer files.

Health

After startup, ask /health what the server actually loaded before trying chat, SFT, GRPO, or adapter calls.

If the kiln CLI is on your PATH, run kiln health for the same info as a readable tree (use --json for scripts and --url http://host:8420 for remote servers). The curl commands below are the equivalent HTTP probes — handy for CI, scripts, or any environment without the CLI.

curl -s http://localhost:8420/health | jq .
curl -s http://localhost:8420/v1/models | jq .
curl -s http://localhost:8420/v1/chat/completions \
  -H 'Content-Type: application/json' \
  -d '{"model":"qwen3.5-4b","messages":[{"role":"user","content":"Say hi."}],"max_tokens":16}' | jq .
Desktop App first launch

If the Desktop App does not finish first launch

The Desktop App wraps the same Kiln server, model path, GPU driver, and local port checks. If setup stalls, open the app's Logs view first, then compare the message with these common recovery paths.

  • If the server binary failed to download or verify, retry the download and confirm the app can write to its data directory.
  • If the model path is unset or missing weights, choose the local Qwen3.5-4B directory that contains safetensors, config, and tokenizer files.
  • If the CUDA driver is too old or an update is blocked on Linux or Windows, update the NVIDIA driver before launching the CUDA server.
  • If a Vulkan build falls back to CPU, run vulkaninfo --summary and confirm the AMD/Intel GPU is listed before launching the Vulkan server.
  • If the port is already in use, stop the other Kiln/server process or change the Desktop App server port before restarting.
  • If the server enters a crash/restart loop, open Logs, fix the first setup error shown there, then restart from the app.
  • For app-specific paths and log locations, see the Desktop troubleshooting notes.

Wrong binary or GPU path

Symptom

The binary exits early, reports no usable accelerator, or starts with much lower performance than expected.

Check
  • Linux and Windows CUDA builds require an NVIDIA GPU.
  • Linux Vulkan builds require a Vulkan 1.2+ AMD/Intel driver; vulkaninfo --summary should list the target GPU.
  • CUDA release builds target CUDA 12.4-era systems.
  • macOS uses the Apple Silicon Metal artifact, not a CUDA or Vulkan artifact.
Fix

Download the matching artifact from GitHub releases. On Linux Vulkan hosts, use KILN_VULKAN_DEVICE=0 or GGML_VK_VISIBLE_DEVICES=0 to pin a device. On Linux with Docker, install NVIDIA Container Toolkit before using --gpus all.

Model weights are not found

Symptom

Startup fails with a missing model path, missing tokenizer, or missing safetensors message.

Check
  • The model directory exists on the same machine or inside the Docker container.
  • The path contains Qwen3.5-4B weights, config, and tokenizer files.
  • Relative paths are resolved from the current working directory.
Fix

Set the path explicitly. For Docker, mount the host directory at the same path you pass to the server.

KILN_MODEL_PATH=/models/Qwen3.5-4B ./kiln serve
./kiln serve --model-path /models/Qwen3.5-4B

/health is not green

Symptom

The HTTP server is reachable, but chat or training requests fail.

Check
  • /health reports the configured model path and device state.
  • /v1/models returns the model id you expected.
  • The listen address is localhost:8420 unless you changed it.
Fix

Fix the first failing health field before debugging chat payloads. Health output is the fastest way to separate setup issues from request-shape issues.

Remote server is not reachable

Symptom

Kiln is running on a GPU box, Tailscale host, or reverse-proxied machine, but client commands fail with a connection error.

Check
  • The default bind is local-only: 127.0.0.1:8420.
  • For private-network access, set server.host = "0.0.0.0" in config or start with KILN_HOST=0.0.0.0.
  • Only expose that bind on a trusted/private network or behind a reverse proxy that adds authentication.
  • From the client machine, verify the exact host with curl http://gpu-box:8420/health.
Fix

Open the firewall or private-network route to the server port, then point CLI client commands at the same base URL.

kiln health --url http://gpu-box:8420
kiln train status --url http://gpu-box:8420
kiln adapters list --url http://gpu-box:8420

Older-release long-prefill and tool-call timeouts

Symptom

On kiln-v0.2.9, long tools-bearing chat completions or long-prefill prompts could time out under concurrent load, then leave later requests failing until restart.

Check
  • This guidance is only for users intentionally pinned to kiln-v0.2.9.
  • Issue #664 tracks the tools-bearing cascade after a prefill timeout.
  • Issue #656 tracks the KV-cache-exhaustion and prefill-state-cleanup investigation.
  • Issue #686 tracks repeated long-prefill HTTP 408s around the old timeout boundary.
Fix

Upgrade to the latest kiln-v* release. If pinned to v0.2.9, run clients with workers=1 so requests are serialized; if concurrent workers are required, set server.request_timeout_secs to at least 600.

The workers=1 mitigation is client-side serialization, not a kiln server flag: keep at most one in-flight request in your driver, worker pool, or load generator. It avoids concurrent prefills, which is the condition that exposed the older degraded-state failure mode.

The request_timeout_secs >= 600 mitigation is also historical. It was useful for pinned kiln-v0.2.9 deployments that had to keep workers=2 while running tight KV-cache caps and roughly 20k-token-or-longer prefills. Current releases route prefix-cache prefill through the tiled/streaming dispatcher and include follow-on prefix-cache memory, KV auto-sizing, streaming-prefill, and observability fixes.

For the original diagnosis and bisect notes, read docs/audits/PHASE11_ISSUE_686_BISECT.md.

Mock mode is not real training

Symptom

A training request appears to complete, but inference quality does not change or no real adapter weights appear.

Check
  • Mock mode is for API-shape and UI checks only.
  • Real /v1/train/sft and /v1/train/grpo jobs require a loaded model and usable accelerator.
  • /v1/train/status should show completed work for the adapter name you sent.
Fix

Use mock mode to verify wiring, then run the same payload against a real model path. Treat training endpoints as privileged: posted examples update the active adapter.

Adapters are in a different directory than expected

Symptom

/v1/adapters does not list an adapter you trained, uploaded, or expected from a previous run.

Check
  • Adapter storage is separate from the base model weights.
  • Docker containers only see directories you mounted.
  • Different working directories can imply different relative adapter paths.
Fix

Use an explicit adapter directory in your config or CLI flags, and mount it into Docker just like the model directory. Then verify with GET /v1/adapters.

Where to go next