News

MITRE said the ALUE benchmark for aerospace LLM evaluation supports custom datasets, open-source LLMs and user-defined prompts.