CLI Reference
Complete reference for Skillet command-line tools.
eval
Run evaluations against a baseline or with a skill loaded.
skillet eval <name> [skill] [options]Arguments
| Argument | Required | Description |
|---|---|---|
name | Yes | Eval set name, directory path, or single .yaml file |
skill | No | Path to skill directory (omit for baseline) |
The name argument accepts:
- A name: looks in
~/.skillet/evals/<name>/ - A directory path: loads all
.yamlfiles recursively - A single
.yamlfile path
Options
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--samples | -s | int | 3 | Number of iterations per eval |
--max-evals | -m | int | all | Maximum evals to run (randomly sampled) |
--tools | str | all | Comma-separated list of allowed tools | |
--parallel | -p | int | 3 | Number of parallel workers |
--skip-cache | bool | false | Skip reading from cache (still writes) | |
--trust | bool | false | Skip confirmation for setup/teardown scripts | |
--no-summary | bool | false | Skip the failure summary LLM call |
Examples
# Baseline eval (no skill)
skillet eval browser-fallback
# Eval with skill
skillet eval browser-fallback ~/.claude/skills/browser-fallback
# Single file
skillet eval ./evals/my-skill/001.yaml skill/
# More samples for statistical confidence
skillet eval my-skill -s 5
# Random sample of 5 evals
skillet eval my-skill -m 5
# More parallelism
skillet eval my-skill -p 5
# Force fresh runs (ignore cache)
skillet eval my-skill --skip-cache
# Skip script confirmation prompts
skillet eval my-skill --trust
# Skip failure summary (saves one LLM call)
skillet eval my-skill --no-summary
# Restrict available tools
skillet eval my-skill --tools "Read,Write,Bash"tune
Iteratively improve a skill until target pass rate or max rounds.
skillet tune <name> <skill> [options]Arguments
| Argument | Required | Description |
|---|---|---|
name | Yes | Eval set name or path |
skill | Yes | Path to skill directory |
Options
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--rounds | -r | int | 5 | Maximum tuning rounds |
--target | -t | float | 100.0 | Target pass rate percentage |
--samples | -s | int | 1 | Samples per eval per round |
--parallel | -p | int | 3 | Number of parallel workers |
--output | -o | path | auto | Custom output path for results JSON |
How It Works
- Runs all evals against the current skill
- Records pass rate and failure details
- If pass rate < target, uses DSPy optimization to generate improved skill
- Writes improved skill to disk
- Repeats until target reached or max rounds
Results are saved to ~/.skillet/tunes/<name>/<timestamp>.json (or custom path with -o).
Examples
# Default tuning (5 rounds, 100% target)
skillet tune conventional-comments ~/.claude/skills/conventional-comments
# Aim for 80% pass rate
skillet tune conventional-comments skill/ -t 80
# More rounds
skillet tune conventional-comments skill/ -r 10
# Custom output file
skillet tune conventional-comments skill/ -o results.json
# Multiple samples for more reliable pass rates
skillet tune conventional-comments skill/ -s 3create
Create a new skill from captured evals.
skillet create <name> [options]Arguments
| Argument | Required | Description |
|---|---|---|
name | Yes | Eval set name (from ~/.skillet/evals/<name>/) |
Options
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--dir | -d | path | ~ | Base directory for output |
--prompt | str | none | Extra prompt to customize generation |
Output is written to <dir>/.claude/skills/<name>/SKILL.md.
Examples
# Create in home directory (default)
skillet create browser-fallback
# Creates ~/.claude/skills/browser-fallback/SKILL.md
# Create in current directory
skillet create browser-fallback -d .
# Creates ./.claude/skills/browser-fallback/SKILL.md
# Create in specific project
skillet create browser-fallback -d /path/to/project
# Add extra instructions
skillet create browser-fallback --prompt "Be concise, max 20 lines"generate-evals
Generate candidate eval files from a SKILL.md.
skillet generate-evals <skill> [options]Arguments
| Argument | Required | Description |
|---|---|---|
skill | Yes | Path to skill directory or SKILL.md file |
Options
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--output | -o | path | auto | Output directory for candidate files |
--max | int | 5 | Max evals per category | |
--domain | -d | str | all | Filter to specific domain(s): triggering, functional, performance |
Domains
Each generated eval is tagged with a domain indicating what aspect of the skill it tests:
| Domain | Description |
|---|---|
triggering | Does the skill activate when it should (and stay silent when it shouldn't)? |
functional | Does the skill produce correct output once triggered? |
performance | Does the skill meet quality, latency, or efficiency expectations? |
By default all domains are generated. Use --domain to filter to specific ones. The flag can be repeated to select multiple domains.
Examples
# Generate from skill directory
skillet generate-evals ~/.claude/skills/browser-fallback
# Custom output location
skillet generate-evals skill/ -o ./my-evals/
# Limit to 3 per category
skillet generate-evals skill/ --max 3
# Only triggering evals
skillet generate-evals skill/ -d triggering
# Triggering and functional evals
skillet generate-evals skill/ -d triggering -d functionallint
Lint a SKILL.md file for common issues.
skillet lint <path> [options]Arguments
| Argument | Required | Description |
|---|---|---|
path | Yes* | Path to a SKILL.md file (*not required with --list-rules) |
Options
| Flag | Type | Default | Description |
|---|---|---|---|
--no-llm | bool | false | Skip LLM-assisted lint rules |
--list-rules | bool | false | List all available rules and exit |
Rules
Skillet ships with 14 built-in rules across four categories:
Naming
| Rule | Severity | Description |
|---|---|---|
filename-case | warning | Skill file must be named exactly SKILL.md |
folder-kebab-case | warning | Skill folder name must be kebab-case |
name-kebab-case | warning | Name field must be kebab-case |
name-matches-folder | warning | Name field must match the parent folder name |
name-no-reserved | error | Name must not contain claude or anthropic |
Structure
| Rule | Severity | Description |
|---|---|---|
frontmatter-valid | warning | Frontmatter must include name and description |
frontmatter-delimiters | error | YAML frontmatter must have --- delimiters |
frontmatter-no-xml | error | No XML angle brackets (< >) in frontmatter |
description-length | warning | Description must be under 1,024 characters |
body-word-count | warning | Skill body should be under 5,000 words |
no-readme | warning | No README.md inside skill folder |
Fields
| Rule | Severity | Description |
|---|---|---|
field-license | warning | Recommended license field should be present |
field-compatibility | warning | Recommended compatibility field should be present |
field-metadata | warning | Recommended metadata field should be present |
Exit Codes
| Code | Meaning |
|---|---|
| 0 | No issues found |
| 1 | One or more findings |
| 2 | Error (file not found, invalid input) |
Examples
# Lint a skill
skillet lint ~/.claude/skills/my-skill/SKILL.md
# Lint without LLM-assisted checks
skillet lint --no-llm path/to/SKILL.md
# List all available rules
skillet lint --list-rulescompare
Compare baseline vs skill results from cache.
skillet compare <name> <skill>Arguments
| Argument | Required | Description |
|---|---|---|
name | Yes | Eval set name or path |
skill | Yes | Path to skill directory |
Run skillet eval <name> and skillet eval <name> <skill> first to populate the cache.
Examples
skillet compare browser-fallback ~/.claude/skills/browser-fallbackshow
Inspect cached eval results without re-running any evals.
skillet show <name> [options]Arguments
| Argument | Required | Description |
|---|---|---|
name | Yes | Eval set name or path |
Options
| Flag | Short | Type | Default | Description |
|---|---|---|---|---|
--eval | -e | str | none | Show detailed results for a specific eval file |
--skill | -s | path | none | Show results with a skill loaded instead of baseline |
Examples
# Show cached baseline results
skillet show browser-fallback
# Show results with a skill
skillet show browser-fallback --skill ~/.claude/skills/browser-fallback
# Show detailed results for a specific eval
skillet show browser-fallback --eval 001.yaml
# Combine: specific eval with skill
skillet show browser-fallback --skill path/to/skill --eval 001.yamlEnvironment Variables
| Variable | Default | Description |
|---|---|---|
SKILLET_DIR | ~/.skillet | Base directory for evals, cache, and tune results |
Exit Codes
| Code | Meaning |
|---|---|
| 0 | Success |
| 1 | Error (invalid arguments, missing files, etc.) |