FailureSensorIQ leaderboard
Intro text
- "headers": [
- "T",
- "Model",
- "Average โฌ๏ธ",
- "Acc_All",
- "Acc_Sel",
- "Acc_El",
- "Acc_Perturb",
- "Consistency_Score",
- "Type",
- "Architecture",
- "Precision",
- "Hub License",
- "#Params (B)",
- "Hub โค๏ธ",
- "Available on the hub",
- "Model sha"
- "data": [
- [
- "โญ",
- "<a target="_blank" href="https://openai.com/index/openai-o3-mini/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/o3-mini</a>",
- 57.17,
- 58.46,
- 56.81,
- 69.84,
- 53.28,
- 47.47,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://openai.com/o1/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/o1</a>",
- 56.66,
- 60.4,
- 61.06,
- 67.89,
- 49.76,
- 44.17,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Original" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-4-Maverick-17B-128E-Original</a>",
- 52.43,
- 55.83,
- 44.47,
- 71.9,
- 49.12,
- 40.83,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 400,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/openai/gpt-4.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-4.1</a>",
- 50.94,
- 53.47,
- 56.38,
- 59.17,
- 45.74,
- 39.93,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1</a>",
- 50.58,
- 50.09,
- 45.74,
- 59.75,
- 54.37,
- 42.93,
- "RL-tuned",
- "DeepseekV3ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 685,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://openai.com/index/introducing-openai-o1-preview/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/o1-preview</a>",
- 50.51,
- 52.31,
- 49.57,
- 62.5,
- 47.51,
- 40.68,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.1-405B-Instruct</a>",
- 46.44,
- 51.26,
- 48.72,
- 61.24,
- 40.04,
- 30.93,
- "instruction-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 405,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/mistralai/Mistral-Large-Instruct-2407" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mistral-Large-Instruct-2407</a>",
- 45.27,
- 50.09,
- 51.28,
- 57.57,
- 38.1,
- 29.32,
- "instruction-tuned",
- "MistralForCausalLM",
- "bfloat16",
- "apache-2.0",
- 123,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/openai/gpt-4.1-mini" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-4.1-mini</a>",
- 44.77,
- 49.27,
- 45.53,
- 57.34,
- 39.97,
- 31.76,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Llama-70B</a>",
- 44.29,
- 44.62,
- 36.38,
- 65.14,
- 44.99,
- 30.3,
- "RL-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 70.6,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-4-Scout-17B-16E</a>",
- 43.08,
- 53.96,
- 44.47,
- 63.53,
- 29.36,
- 24.11,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 109,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/microsoft/phi-4" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">microsoft/phi-4</a>",
- 43.04,
- 48.56,
- 40.43,
- 60.32,
- 36.3,
- 29.62,
- "instruction-tuned",
- "Phi3ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 14.7,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/google/gemma-2-9b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-2-9b</a>",
- 42.87,
- 43.98,
- 30.43,
- 58.6,
- 45.29,
- 36.07,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 9.24,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.2-11B-Vision" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.2-11B-Vision</a>",
- 40.1,
- 39.11,
- 33.83,
- 50.92,
- 45.74,
- 30.9,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 70.6,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.3-70B-Instruct</a>",
- 39.1,
- 41.69,
- 35.11,
- 55.85,
- 37.57,
- 25.27,
- "instruction-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 70.6,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x22B-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x22B-v0.1</a>",
- 38.78,
- 45.18,
- 42.55,
- 59.06,
- 27.52,
- 19.57,
- "instruction-tuned",
- "MixtralForCausalLM",
- "bfloat16",
- "apache-2.0",
- 46.7,
- 78,
- true,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Llama-8B</a>",
- 35.37,
- 43.04,
- 38.94,
- 54.36,
- 24.26,
- 16.27,
- "RL-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.03,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-7B-Instruct</a>",
- 34.71,
- 38.73,
- 40.64,
- 49.54,
- 28.2,
- 16.42,
- "instruction-tuned",
- "Qwen2ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 7.62,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.1-8B-Instruct</a>",
- 33.33,
- 40.04,
- 36.17,
- 51.15,
- 25.35,
- 13.95,
- "instruction-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.03,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/openai/gpt-4.1-nano" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-4.1-nano</a>",
- 31.18,
- 41.77,
- 40,
- 50.34,
- 14.47,
- 9.3,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Qwen-7B</a>",
- 26.6,
- 34.01,
- 23.83,
- 50.11,
- 17.02,
- 8.02,
- "RL-tuned",
- "Qwen2ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 7.62,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-3.2-8b-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-3.2-8b-instruct</a>",
- 25.95,
- 30.26,
- 41.7,
- 29.82,
- 19.24,
- 8.74,
- "instruction-tuned",
- "GraniteForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.17,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-3.3-8b-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-3.3-8b-instruct</a>",
- 25.11,
- 25.83,
- 32.13,
- 29.47,
- 23.73,
- 14.4,
- "instruction-tuned",
- "GraniteForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.17,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x7B-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x7B-v0.1</a>",
- 21.36,
- 27.6,
- 25.32,
- 38.19,
- 11.21,
- 4.46,
- "instruction-tuned",
- "MixtralForCausalLM",
- "bfloat16",
- "apache-2.0",
- 46.7,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-3.0-8b-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-3.0-8b-instruct</a>",
- 19.34,
- 22.8,
- 16.17,
- 36.7,
- 16.16,
- 4.87,
- "instruction-tuned",
- "GraniteForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.17,
- 78,
- true,
- "main"
- [
- "metadata": null
- "headers": [
- "T",
- "Model",
- "Average โฌ๏ธ",
- "acc_electric_motor",
- "acc_steam_turbine",
- "acc_aero_gas_turbine",
- "acc_industrial_gas_turbine",
- "acc_pump",
- "acc_compressor",
- "acc_reciprocating_internal_combustion_engine",
- "acc_electric_generator",
- "acc_fan",
- "acc_power_transformer",
- "Type",
- "Architecture",
- "Precision",
- "Hub License",
- "#Params (B)",
- "Hub โค๏ธ",
- "Available on the hub",
- "Model sha"
- "data": [
- [
- "โญ",
- "<a target="_blank" href="https://openai.com/o1/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/o1</a>",
- 60.52,
- 68.8,
- 47.95,
- 50.89,
- 70.83,
- 58.55,
- 56.36,
- 65.48,
- 70.94,
- 58,
- 57.35,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://openai.com/index/openai-o3-mini/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/o3-mini</a>",
- 58.58,
- 63.25,
- 49.71,
- 51.49,
- 68.33,
- 52.63,
- 55.45,
- 63.39,
- 65.81,
- 61,
- 54.78,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-4-Maverick-17B-128E-Original" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-4-Maverick-17B-128E-Original</a>",
- 56.63,
- 66.24,
- 47.37,
- 47.32,
- 69.17,
- 55.26,
- 52.27,
- 61.01,
- 62.39,
- 56.5,
- 48.71,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 400,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-4-Scout-17B-16E</a>",
- 54.4,
- 60.26,
- 40.94,
- 47.02,
- 67.5,
- 50,
- 46.36,
- 66.96,
- 59.4,
- 60.5,
- 45.04,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 109,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/openai/gpt-4.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-4.1</a>",
- 53.83,
- 58.12,
- 43.27,
- 43.75,
- 69.17,
- 52.63,
- 46.36,
- 61.01,
- 58.12,
- 57,
- 48.9,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://openai.com/index/introducing-openai-o1-preview/" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/o1-preview</a>",
- 53.1,
- 65.38,
- 43.86,
- 42.56,
- 62.92,
- 50,
- 46.82,
- 58.33,
- 62.82,
- 53.5,
- 44.85,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.1-405B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.1-405B-Instruct</a>",
- 51.12,
- 61.11,
- 39.18,
- 48.51,
- 59.17,
- 46.05,
- 45,
- 58.63,
- 55.56,
- 51.5,
- 46.51,
- "instruction-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 405,
- 78,
- true,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1</a>",
- 50.73,
- 58.97,
- 39.18,
- 44.94,
- 63.33,
- 50.66,
- 44.09,
- 51.49,
- 56.84,
- 53.5,
- 44.3,
- "RL-tuned",
- "DeepseekV3ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 685,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/mistralai/Mistral-Large-Instruct-2407" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mistral-Large-Instruct-2407</a>",
- 49.42,
- 51.71,
- 36.26,
- 44.64,
- 59.58,
- 45.39,
- 46.82,
- 59.52,
- 50.85,
- 50,
- 49.45,
- "instruction-tuned",
- "MistralForCausalLM",
- "bfloat16",
- "apache-2.0",
- 123,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/openai/gpt-4.1-mini" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-4.1-mini</a>",
- 49.33,
- 55.98,
- 43.27,
- 41.67,
- 56.67,
- 45.39,
- 43.18,
- 56.85,
- 49.15,
- 54.5,
- 46.69,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/microsoft/phi-4" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">microsoft/phi-4</a>",
- 49.02,
- 50.85,
- 40.35,
- 45.83,
- 57.92,
- 50,
- 41.36,
- 52.98,
- 52.56,
- 55,
- 43.38,
- "instruction-tuned",
- "Phi3ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 14.7,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x22B-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x22B-v0.1</a>",
- 45.83,
- 46.58,
- 40.35,
- 39.29,
- 57.5,
- 46.05,
- 43.18,
- 52.38,
- 39.74,
- 53.5,
- 39.71,
- "instruction-tuned",
- "MixtralForCausalLM",
- "bfloat16",
- "apache-2.0",
- 46.7,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/google/gemma-2-9b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-2-9b</a>",
- 44.83,
- 45.3,
- 35.67,
- 42.86,
- 53.33,
- 43.42,
- 40.91,
- 53.57,
- 44.44,
- 55,
- 33.82,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 9.24,
- 78,
- false,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Llama-70B</a>",
- 44.47,
- 50,
- 36.84,
- 40.77,
- 51.25,
- 39.47,
- 34.55,
- 52.98,
- 49.15,
- 48.5,
- 41.18,
- "RL-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 70.6,
- 78,
- true,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Llama-8B</a>",
- 43.98,
- 46.58,
- 36.26,
- 43.45,
- 50,
- 40.79,
- 41.82,
- 55.65,
- 52.14,
- 43.5,
- 29.6,
- "RL-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.03,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/openai/gpt-4.1-nano" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">openai/gpt-4.1-nano</a>",
- 42.35,
- 44.87,
- 31.58,
- 38.99,
- 50.83,
- 40.13,
- 37.73,
- 50.89,
- 45.73,
- 49.5,
- 33.27,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 500,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.3-70B-Instruct</a>",
- 41.14,
- 45.3,
- 29.24,
- 35.42,
- 45.42,
- 37.5,
- 35.91,
- 50,
- 42.74,
- 48,
- 41.91,
- "instruction-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 70.6,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.1-8B-Instruct</a>",
- 40.05,
- 39.74,
- 33.33,
- 37.5,
- 48.75,
- 36.84,
- 35.91,
- 47.92,
- 37.61,
- 46.5,
- 36.4,
- "instruction-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.03,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.2-11B-Vision" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.2-11B-Vision</a>",
- 39.28,
- 38.46,
- 31.58,
- 36.61,
- 42.92,
- 38.16,
- 37.73,
- 47.62,
- 36.32,
- 48.5,
- 34.93,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 70.6,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-7B-Instruct</a>",
- 39.28,
- 41.03,
- 30.41,
- 35.42,
- 45,
- 39.47,
- 35,
- 47.62,
- 42.74,
- 44.5,
- 31.62,
- "instruction-tuned",
- "Qwen2ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 7.62,
- 78,
- true,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Qwen-7B</a>",
- 34.53,
- 37.18,
- 28.07,
- 32.44,
- 43.33,
- 34.87,
- 31.82,
- 41.67,
- 35.04,
- 34,
- 26.84,
- "RL-tuned",
- "Qwen2ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 7.62,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-3.2-8b-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-3.2-8b-instruct</a>",
- 30.66,
- 34.19,
- 27.49,
- 24.7,
- 30,
- 35.53,
- 28.18,
- 36.9,
- 28.63,
- 33,
- 27.94,
- "instruction-tuned",
- "GraniteForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.17,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x7B-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x7B-v0.1</a>",
- 26.66,
- 29.06,
- 20.47,
- 20.24,
- 24.58,
- 23.68,
- 22.73,
- 36.61,
- 23.08,
- 34,
- 32.17,
- "instruction-tuned",
- "MixtralForCausalLM",
- "bfloat16",
- "apache-2.0",
- 46.7,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-3.3-8b-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-3.3-8b-instruct</a>",
- 26.49,
- 26.5,
- 24.56,
- 19.05,
- 26.67,
- 28.95,
- 26.36,
- 36.31,
- 25.21,
- 30.5,
- 20.77,
- "instruction-tuned",
- "GraniteForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.17,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-3.0-8b-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-3.0-8b-instruct</a>",
- 23.05,
- 28.63,
- 19.3,
- 18.75,
- 19.17,
- 23.68,
- 20.45,
- 32.14,
- 22.22,
- 27,
- 19.12,
- "instruction-tuned",
- "GraniteForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.17,
- 78,
- true,
- "main"
- [
- "metadata": null
- "headers": [
- "T",
- "Model",
- "fmsr_uacc",
- "fmsr_ss",
- "fmsr_coverage_rate",
- "fmsr_acc",
- "Type",
- "Architecture",
- "Precision",
- "Hub License",
- "#Params (B)",
- "Hub โค๏ธ",
- "Available on the hub",
- "Model sha"
- "data": [
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/mistralai/Mistral-Large-Instruct-2407" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mistral-Large-Instruct-2407</a>",
- 50.03,
- 3.02,
- 91.32,
- 60.83,
- "instruction-tuned",
- "MistralForCausalLM",
- "bfloat16",
- "apache-2.0",
- 123,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/microsoft/phi-4" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">microsoft/phi-4</a>",
- 46.51,
- 3.02,
- 91.2,
- 56.39,
- "instruction-tuned",
- "Phi3ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 14.7,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x22B-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x22B-v0.1</a>",
- 41.55,
- 3.36,
- 93.46,
- 54.75,
- "instruction-tuned",
- "MixtralForCausalLM",
- "bfloat16",
- "apache-2.0",
- 46.7,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/google/gemma-2-9b" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">google/gemma-2-9b</a>",
- 33.47,
- 3.61,
- 94.74,
- 48.6,
- "instruction-tuned",
- "?",
- "bfloat16",
- "apache-2.0",
- 9.24,
- 78,
- false,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-3.3-8b-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-3.3-8b-instruct</a>",
- 34,
- 3.55,
- 94,
- 48.52,
- "instruction-tuned",
- "GraniteForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.17,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-3.2-8b-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-3.2-8b-instruct</a>",
- 32.46,
- 3.65,
- 95.52,
- 47.98,
- "instruction-tuned",
- "GraniteForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.17,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/mistralai/Mixtral-8x7B-v0.1" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">mistralai/Mixtral-8x7B-v0.1</a>",
- 30.55,
- 3.65,
- 95.48,
- 45.09,
- "instruction-tuned",
- "MixtralForCausalLM",
- "bfloat16",
- "apache-2.0",
- 46.7,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/Qwen/Qwen2.5-7B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">Qwen/Qwen2.5-7B-Instruct</a>",
- 29.57,
- 3.54,
- 92.48,
- 42.06,
- "instruction-tuned",
- "Qwen2ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 7.62,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/ibm-granite/granite-3.0-8b-instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">ibm-granite/granite-3.0-8b-instruct</a>",
- 27.32,
- 3.67,
- 93.93,
- 40.65,
- "instruction-tuned",
- "GraniteForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.17,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.3-70B-Instruct</a>",
- 23.53,
- 3.62,
- 93.22,
- 34.42,
- "instruction-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 70.6,
- 78,
- true,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-70B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Llama-70B</a>",
- 20.96,
- 3.81,
- 95.02,
- 32.48,
- "RL-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 70.6,
- 78,
- true,
- "main"
- [
- "โญ",
- "<a target="_blank" href="https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">meta-llama/Llama-3.1-8B-Instruct</a>",
- 19.94,
- 3.83,
- 94.94,
- 31.15,
- "instruction-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.03,
- 78,
- true,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Llama-8B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Llama-8B</a>",
- 16.38,
- 3.94,
- 95.95,
- 26.32,
- "RL-tuned",
- "LlamaForCausalLM",
- "bfloat16",
- "apache-2.0",
- 8.03,
- 78,
- true,
- "main"
- [
- "๐ฆ",
- "<a target="_blank" href="https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-7B" style="color: var(--link-text-color); text-decoration: underline;text-decoration-style: dotted;">deepseek-ai/DeepSeek-R1-Distill-Qwen-7B</a>",
- 17.76,
- 3.73,
- 94.04,
- 26.87,
- "RL-tuned",
- "Qwen2ForCausalLM",
- "bfloat16",
- "apache-2.0",
- 7.62,
- 78,
- true,
- "main"
- [
- "metadata": null
Prompt Format
The prompt will follow the following style. Models' output are expected to follow this format.
Select the correct option(s) from the following options given the question. To solve the problem, follow the Let's think Step by Step reasoning strategy.
Question: For electric motor, if a failure event rotor windings fault occurs, which sensor out of the choices is the most relevant sensor regarding the occurrence of the failure event?
Options:
A partial discharge
B resistance
C oil debris
D current
E voltage
{"step_1": "<Step 1 of your reasoning>", "step_2": "<Step 2 of your reasoning>", "step_n": "<Step n of your reasoning>", "answer": <the list of selected option, e.g., ["A", "B", "C", "D", "E"]>}
Your output in a single line:
Expected Output Format
{"step_1": "<Step 1 of your reasoning>", "step_2": "<Step 2 of your reasoning>", "step_n": "<Step n of your reasoning>", "answer": <the list of selected option, e.g., ["A", "B", "C", "D", "E"]>}
Reproducibility
To reproduce our results, here is the commands you can run:
Some good practices before submitting a model
1) Make sure you can load your model and tokenizer using AutoClasses:
from transformers import AutoConfig, AutoModel, AutoTokenizer
config = AutoConfig.from_pretrained("your model name", revision=revision)
model = AutoModel.from_pretrained("your model name", revision=revision)
tokenizer = AutoTokenizer.from_pretrained("your model name", revision=revision)
If this step fails, follow the error messages to debug your model before submitting it. It's likely your model has been improperly uploaded.
Note: make sure your model is public!
Note: if your model needs use_remote_code=True
, we do not support this option yet but we are working on adding it, stay posted!
2) Convert your model weights to safetensors
It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of parameters of your model to the Extended Viewer
!
3) Make sure your model has an open license!
This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model ๐ค
4) Fill up your model card
When we add extra information about models to the leaderboard, it will be automatically taken from the model card
In case of model failure
If your model is displayed in the FAILED
category, its execution stopped.
Make sure you have followed the above steps first.
If everything is done, check you can launch the EleutherAIHarness on your model locally, using the above command without modifications (you can add --limit
to limit the number of examples per task).
model | revision | private | precision | weight_type | status |
---|---|---|---|---|---|
model | revision | private | precision | weight_type | status |
---|---|---|---|---|---|
model | revision | private | precision | weight_type | status |
---|
model | revision | private | precision | weight_type | status |
---|---|---|---|---|---|
model | revision | private | precision | weight_type | status |
---|