Run Kiln from the terminal.
Use the CLI when you want a headless server, repeatable scripts, or quick diagnostics against a running Kiln instance. These examples assume a running Kiln server backed by Qwen/Qwen3.5-4B; see Quickstart for setup and model download details. If you want visual status, adapter controls, or training monitoring, open http://127.0.0.1:8420/ui instead.
Command chooser
If you want to...
Start the server
Point Kiln at your model directory, then serve the OpenAI-compatible API.
export KILN_MODEL_PATH=./Qwen3.5-4B
kiln serve
Verify readiness
Use the readable tree for humans or JSON for scripts and CI probes.
kiln health
kiln health --json
Submit training
Send SFT corrections, GRPO scored completions, or check job progress.
kiln train sft --file corrections.jsonl --adapter support-bot
kiln train grpo --file grpo-batch.json --adapter support-bot
kiln train status
Manage adapters
List adapters, load a saved LoRA, or unload the active adapter.
kiln adapters list
kiln adapters load support-bot
kiln adapters unload
Validate config
Check a TOML config before using it with kiln serve --config.
kiln config --file kiln.toml
Server
Start serving Qwen3.5-4B
Point KILN_MODEL_PATH at the local model directory, then start the OpenAI-compatible server. Running kiln with no subcommand starts the server just like kiln serve.
export KILN_MODEL_PATH=./Qwen3.5-4B
kiln serve
The default server listens on 127.0.0.1:8420; open /ui there for the dashboard.
Configuration
Use config files and model IDs
--config (or -c) loads a TOML config file. --served-model-id changes the model name returned by /v1/models and accepted by OpenAI-compatible clients.
kiln config
kiln config --file kiln.toml
kiln serve -c kiln.toml
kiln serve --config kiln.toml
kiln serve --served-model-id qwen3.5-4b-local
Use kiln config to validate built-in defaults plus KILN_* environment overrides, or kiln config --file / kiln config -f to validate a TOML file before starting the server.
Logging
Tune startup output
Kiln uses the configured log level by default. Add global -v / --verbose for debug startup detail, repeat it as -vv for trace-level kernel and scheduler detail, or use -q / --quiet when you only want warnings and errors. Add --help when you want the exact flags from the installed binary; the examples below stay focused on copy-paste startup and health commands.
kiln -v serve
kiln -vv serve
kiln -q health
Put verbosity flags before or after the subcommand; they are global CLI options and are mutually exclusive between verbose and quiet modes.
Health
Check server readiness
kiln health prints a readable tree with model, adapter, scheduler, and training status.
kiln health
kiln health --url http://localhost:8420
Point --url at a remote, Tailscale, or reverse-proxied server when Kiln runs on another machine.
Use --json in scripts or CI probes when you want the raw health payload.
kiln health --json \
| python3 -m json.tool
Training
Submit SFT and GRPO jobs
Training commands talk to an already-running server. SFT reads JSONL with one chat correction example per line, each with a messages array. GRPO reads one JSON request/batch with groups; each group has prompt messages plus candidate completions containing text and reward scores.
SFT corrections
kiln train sft \
--file corrections.jsonl \
--adapter support-bot
kiln train sft --file corrections.jsonl --adapter support-bot --url http://gpu-box:8420
Each JSONL line is one chat correction with a messages array. Add --epochs, --lr, or --lora-rank when you need to override defaults.
GRPO rewards
kiln train grpo \
--file grpo-batch.json \
--adapter support-bot
Use /v1/completions/batch or another generator to create prompts and candidate completions first, score them, then submit the scored groups to kiln train grpo. See the GRPO Guide for reward-loop examples.
Queue status
kiln train status
kiln train status --job-id train_123
kiln train status --url http://gpu-box:8420
Use the dashboard when you want visual progress and recent job history. The same --url flag works for SFT, GRPO, and status commands.
Adapters
Manage LoRA adapters
Adapter commands call the running server's adapter API. Use them in scripts; use /ui when you want upload, download, merge, or safer visual confirmation before deleting.
List, load, unload
kiln adapters list
kiln adapters list --url http://gpu-box:8420
kiln adapters load support-bot
kiln adapters unload
kiln adapters unload support-bot
The named unload form is accepted for backwards compatibility; the server unloads the active adapter. Add --url to target a remote server.
Delete
kiln adapters delete support-bot
Delete removes an adapter through the server. Prefer the UI for one-off manual cleanup.
Related docs
Quickstart
Install Kiln, download the model, and send the first chat request.
API Reference
Map CLI flows to inference, training, adapter, and health endpoints.
GRPO Guide
Build the generate, score, train loop around kiln train grpo.
Troubleshooting
Fix first-run server, model path, CUDA, Docker, and health issues.
Architecture
Understand how serving, training, adapters, and the scheduler fit together.