Example of Performance Task in Mathematics 5

Anthropic says its new AI model “maintained focus” for 30 hours on multistep tasks

Claude 4.5 is available everywhere today. Through the API, the model maintains the same pricing as Claude Sonnet 4, at $3 per ...

OpenAI tested GPT-5, Claude, and Gemini on real-world tasks - the results were surprising

According to OpenAI, the tasks were created by professionals with an average of 14 years of experience in relevant fields to reflect "real work products, such as a legal brief, an engineering ...

9to5Google

Gemini 2.5 Deep Think scores competitive coding gold in ‘profound leap’ for abstract problem-solving

After a mathematics win in July, Gemini 2.5 Deep Think has now scored a gold-medal level performance in competitive coding.

4don MSN

OpenAI says GPT-5 stacks up to humans in a wide range of jobs

A new test from OpenAI aims to understand how close AI is to outperforming humans at economically valuable work.

12d

OpenAI Just Dropped A New GPT-5 Codex AI Coding Model for Developers

Learn about the new GPT-5 Codex enhancements like coding with dynamic reasoning, seamless tool integration, and smarter AI ...

ZN.ua

Education Mirrored by Exams: What We Really See in the Results

The results of the 2025 NMT revealed a deep crisis in school education—massive failures in mathematics, poor knowledge of ...

Education Week

How AI Simulations Match Up to Real Students—and Why It Matters

AI-simulated students consistently outperform real students—and make different kinds of mistakes—in math and reading comprehension, according to a new study.

Communications of the ACMOpinion

Not Every AI Problem Is a Data Problem

Large language models (LLMs) have revolutionized the AI landscape, demonstrating remarkable capabilities across a wide range ...

Scientific American

Secrets of DeepSeek AI Model Revealed in Landmark Paper

The first peer-reviewed study of the DeepSeek AI model shows how a Chinese start-up firm made the market-shaking LLM for $300 ...

15d

OpenAI Claims GPT-5 Has Doctorate-Level Abilities, Google DeepMind CEO: Lacks Comprehensive Doctorate Skills, AGI May Take 5 to 10 Years

Hassabis pointed to mathematical abilities as an example, noting that existing models still make errors when handling basic calculations or high school math problems, which starkly contrasts with the ...

15d

Google DeepMind CEO Questions GPT-5 Doctoral-Level Intelligence, AGI May Take 5 to 10 Years to Achieve

Prior to this, OpenAI CEO Sam Altman claimed at the GPT-5 launch that the model 'possesses doctoral-level expertise in any field' and emphasized that it is 'the most powerful and robust reasoning ...

13d

UAE Releases An Open, Small AI Model That Punches Above Its Weight

UAE’s MBZUAI and G24 released K2 Think, an open-source reasoning model with only 32 billion parameters that in trials rivals ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results