Open LLM Leaderboard - Ranks, and evaluates language models and chatbotshttps://huggingfaceh4-open-llm-leaderboard.hf.space/?__theme=dark
The Open LLM Leaderboard tracks, ranks, and evaluates language models and chatbots based on various benchmarks. Anyone from the community can submit a model for automated evaluation on the GPU cluster, as long as it is a Transformers model with weights on the Hub. The leaderboard evaluates models on four benchmarks, including AI2 Reasoning Challenge, HellaSwag, MMLU, and TruthfulQA, to test reasoning and general knowledge in both zero-shot and few-shot settings.