The Indonesian Express
OpenAI has launched HealthBench, an open-source dataset to test the ability of artificial intelligence (AI) models to answer medical questions. This marks a major breakthrough for OpenAI in the field of health technology. HealthBench was developed with 262 doctors from 60 countries and contains 5,000 realistic medical conversation simulations. The main goal is to evaluate whether AI can provide accurate answers to public health questions. The assessment uses a doctor-made rubric and is evaluated by the GPT-4.1 model. As a result, OpenAI's o3 model achieved the highest score of 60 percent, followed by Grok (54 percent) and Google Gemini 2.5 Pro (52 percent). One scenario tests the AI's response to emergency situations, such as an unresponsive elderly person on the floor. The AI ??is asked to provide first aid steps, then assessed based on the accuracy and completeness of its answers. Quoted from Cnet, Tuesday (5/13/2025), interestingly, HealthBench supports up to 49 languages. This platform also covers 26 medical specialties, such as neurosurgery and ophthalmology. With the launch of HealthBench, OpenAI hopes that AI can provide more accurate medical information. The goal is to ensure a safe and appropriate response for users.