
ClockBench is an AI benchmark designed to evaluate the ability of AI models to read analog clocks. While this task is simple for humans, current advanced AI models struggle significantly. The benchmark includes a dataset of clocks and a series of questions covering time reading, time manipulation, and time zone conversions. Human accuracy stands at 89.1%, with top AI models currently achieving only 13.3% accuracy. The goal is to identify whether current AI paradigms are sufficient or if novel approaches are needed to improve visual reasoning in AI models.

ClockBench is an AI benchmark designed to evaluate the ability of AI models to read analog clocks. While this task is simple for humans, current advanced AI models struggle significantly. The benchmark includes a dataset of clocks and a series of questions covering time reading, time manipulation, and time zone conversions. Human accuracy stands at 89.1%, with top AI models currently achieving only 13.3% accuracy. The goal is to identify whether current AI paradigms are sufficient or if novel approaches are needed to improve visual reasoning in AI models.