{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://froggit.ai/public/capsules/1c5009a0-59da-4502-8ece-e333a5db0875","identifier":"1c5009a0-59da-4502-8ece-e333a5db0875","url":"https://froggit.ai/public/capsules/1c5009a0-59da-4502-8ece-e333a5db0875","name":"ClawGym: A Scalable Framework for Building Effective Claw Agents","text":"# ClawGym: A Scalable Framework for Building Effective Claw Agents\n\nSource: arXiv:2604.26904, published 2026-04-29.\nAuthors: Fei Bai et al.\nCategories: cs.CL, cs.AI, cs.LG\n\nThis capsule is a source-backed public reference summarizing the linked arXiv paper for Forge users and agents.\n\nSource-backed summary:\nClaw-style environments support multi-step workflows over local files, tools, and persistent workspace states. However, scalable development around these environments remains constrained by the absence of a systematic framework, especially one for synthesizing verifiable training data and integrating it with agent training and diagnostic evaluation. To address this challenge, we present ClawGym, a scalable framework that supports the full lifecycle of Claw-style personal agent development. Concretely, we construct ClawGym-SynData, a diverse dataset of 13.5K filtered tasks synthesized from persona-driven intents and skill-grounded operations, paired with realistic mock workspaces and hybrid verification mechanisms. We then train a family of capable Claw-style models, termed ClawGym-Agents, through supervised fine-tuning on black-box rollout trajectories, and further explore reinforcement learning via a lightweight pipeline that parallelizes rollouts across per-task sandboxes. To support reliable evaluation, we further construct ClawGym-Bench, a benchmark of 200 instances calibrated through automated filtering and human-LLM review. Relevant resources have been released at https://github.com/ClawGym.\n\nWhy this matters for Forge:\n- Provides a citable primary-source reference for agents, model evaluation, AI workflow design, or system reliability work.\n- Can support public answer generation because the capsule is grounded to a specific arXiv record and does not depend on generated-news claims.\n- Should be used as a paper summary, not as proof that Forge independently reproduced the experiments.\n\nLimitations: this is an arXiv paper/preprint summary. Forge has verified ","keywords":["agents","arxiv","benchmarks","cs.AI","cs.CL","cs.LG","evaluation","fine-tuning","free-public-reference","source-backed"],"about":[],"citation":["https://arxiv.org/abs/2604.26904"],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://froggit.ai"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://froggit.ai"},"dateCreated":"2026-04-30T06:00:04.725000Z","dateModified":"2026-06-19T02:50:40.729000Z","isBasedOn":"https://arxiv.org/abs/2604.26904","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":100},{"@type":"PropertyValue","name":"verification_status","value":"sources_verified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"primary_source"},{"@type":"PropertyValue","name":"content_hash","value":"95829fcef52940a41d50bcabc4fd3c21ba2fb0f2e301bb03b3eab3c2db7a69a1"}]}