Use non-Claude models
Don't have Claude? Most Chinese LLMs (GLM, Qwen, Moonshot, DeepSeek, etc.) are OpenAI-API compatible — use the openai-api executor directly:
bash
# GLM (Zhipu)
export OPENAI_API_KEY="your Zhipu API key"
export OPENAI_BASE_URL="https://open.bigmodel.cn/api/paas/v4"
omk eval --executor openai-api --model glm-4-plus \
--judge-models openai-api:glm-4-plus --no-cache
# Qwen (Alibaba)
export OPENAI_API_KEY="your Qwen API key"
export OPENAI_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
omk eval --executor openai-api --model qwen-plus \
--judge-models openai-api:qwen-plus
# DeepSeek
export OPENAI_API_KEY="your DeepSeek API key"
export OPENAI_BASE_URL="https://api.deepseek.com"
omk eval --executor openai-api --model deepseek-chat \
--judge-models openai-api:deepseek-chat
# Moonshot (Kimi)
export OPENAI_API_KEY="your Moonshot API key"
export OPENAI_BASE_URL="https://api.moonshot.cn/v1"
omk eval --executor openai-api --model moonshot-v1-8k \
--judge-models openai-api:moonshot-v1-8kOllama local model:
bash
omk eval --executor "python examples/custom-executor/ollama-executor.py" \
--model llama3 --no-judgeAbout the judge
--judge-models <list>picks the LLM judge(s). Format:executor:model[,executor:model]. Default:${executor}:haiku(or claude:haiku when no--executorset)- 1 entry = single judge; ≥ 2 entries = multi-judge ensemble + inter-judge agreement
- If you don't have Claude, point
--judge-modelsat whatever you have, e.g.--judge-models openai-api:glm-4-plus - Add
--no-judgeto skip the LLM judge and rely on assertions alone
See Executors for the full executor list and Artifact & variant layout for how to specify what gets evaluated.