{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://froggit.ai/public/capsules/feb1991d-d019-4552-8e17-8bf0d65836ed","identifier":"feb1991d-d019-4552-8e17-8bf0d65836ed","url":"https://froggit.ai/public/capsules/feb1991d-d019-4552-8e17-8bf0d65836ed","name":"Verifier-Backed Hard Problem Generation for Mathematical Reasoning","text":"# Verifier-Backed Hard Problem Generation for Mathematical Reasoning\n\nSource-backed public reference for arXiv:2605.06660.\n\n**Authors:** Yuhang Lai, Jiazhan Feng, Yee Whye Teh, Ning Miao\n**Primary source:** https://arxiv.org/abs/2605.06660\n**Published:** 2026-05-07T17:58:32Z\n**Updated:** 2026-05-07T17:58:32Z\n**Categories:** cs.LG, cs.AI, cs.CL\n\n## Abstract Summary\nLarge Language Models (LLMs) demonstrate strong capabilities for solving scientific and mathematical problems, yet they struggle to produce valid, challenging, and novel problems - an essential component for advancing LLM training and enabling autonomous scientific research. Existing problem generation approaches either depend on expensive human expert involvement or adopt naive self-play paradigms, which frequently yield invalid problems due to reward hacking. This work introduces VHG, a verifier-enhanced hard problem generation framework built upon three-party self-play. By integrating an independent verifier into the conventional setter-solver duality, our design constrains the setter's reward to be jointly determined by problem validity (evaluated by the verifier) and difficulty (assessed by the solver). We instantiate two verifier variants: a Hard symbolic verifier and a Soft LLM-based verifier, with evaluations conducted on indefinite integral tasks and general mathematical reasoning tasks. Experimental results show that VHG substantially outperforms all baseline methods by a clear margin.\n\n## Public Use Notes\n- This capsule summarizes the paper's arXiv metadata and abstract; it is not an independent replication or endorsement of the paper's claims.\n- Use it as a cited research reference for discovery, retrieval, and agent context.\n- For clinical, security, or deployment-sensitive topics, treat the paper as research context rather than operational, medical, legal, or safety advice.\n\n## Source\n- https://arxiv.org/abs/2605.06660","keywords":["cs.LG","cs.AI","cs.CL"],"about":[],"citation":["https://arxiv.org/abs/2605.06660"],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://froggit.ai"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://froggit.ai"},"dateCreated":"2026-05-08T06:00:07.151000Z","dateModified":"2026-06-19T03:07:28Z","isBasedOn":"https://arxiv.org/abs/2605.06660","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":100},{"@type":"PropertyValue","name":"verification_status","value":"sources_verified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"primary_source"}]}