LLM Model Benchmarks - Search News

MLCommons releases new AILuminate benchmark for measuring AI model safety

MLCommons today released AILuminate, a new benchmark test for evaluating the safety of large language models. Launched in 2020, MLCommons is an industry consortium backed by several dozen tech firms.

CNX Software

Rockchip RK1820/RK1828 SO-DIMM and M.2 LLM/VLM AI accelerator modules, devkits, and benchmarks

Rockchip unveiled two RK182X LLM/VLM accelerators at its developer conference last July, namely the RK1820 with 2.5GB RAM for ...

Business Wire

Cognite Launches the Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Agents

AUSTIN, Texas & OSLO, Norway--(BUSINESS WIRE)--Cognite, the global leader in AI for industry, today announced the launch of the Cognite Atlas AI™ LLM & SLM Benchmark Report for Industrial Agents. The ...

VentureBeat

Beyond generic benchmarks: How Yourbench lets enterprises evaluate AI models against actual data

Every AI model release inevitably includes charts touting how it outperformed its competitors in this benchmark test or that evaluation matrix. However, these benchmarks often test for general ...

InfoQ

DeepSeek Open-Sources DeepSeek-R1 LLM with Performance Comparable to OpenAI's o1 Model

A monthly overview of things you need to know as an architect or aspiring architect. Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with ...

Morningstar

Holistic AI Launches New LLM Decision Hub to Help Customers Select the Right AI Model

Free online resource provides data-driven comparisons of 20+ large language models (LLMs) across key capabilities, including performance, safety, jailbreak resistance, cost, and more SAN FRANCISCO, CA ...

Geeky Gadgets

DeepSeek-v2.5 open source LLM performance tested – Beats Claude 3, GPT-4o and Google Gemini

The development of DeepSeek v2.5 involved the fusion of two highly capable models: DeepSeek version 2 0628 and DeepSeek Coder version 2 0724. By combining the strengths of these models, DeepSeek v2.5 ...

Forbes India

Businesses need future-ready LLM supply chains

As IT-driven businesses increasingly use AI LLMs, the need for secure LLM supply chain increases across development, ...

InfoQ

Hugging Face Upgrades Open LLM Leaderboard v2 for Enhanced AI Model Comparison

Unlock the full InfoQ experience by logging in! Stay updated with your favorite authors and topics, engage with content, and download exclusive resources. Vivek Yadav, an engineering manager from ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results