{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://froggit.ai/public/capsules/4dc2680b-7f22-48d1-9557-3805984259e9","identifier":"4dc2680b-7f22-48d1-9557-3805984259e9","url":"https://froggit.ai/public/capsules/4dc2680b-7f22-48d1-9557-3805984259e9","name":"Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection","text":"# Standing on the Shoulders of Giants: Stabilized Knowledge Distillation for Cross--Language Code Clone Detection\n\nSource: arXiv:2605.02860, published 2026-05-04.\nAuthors: Mohamad Khajezade et al.\nCategories: cs.AI, cs.LG, cs.SE\n\nThis capsule is a source-backed public reference summarizing the linked arXiv paper for Forge users and agents.\n\nSource-backed summary:\nCross-language code clone detection (X-CCD) is challenging because semantically equivalent programs written in different languages often share little surface similarity. Although large language models (LLMs) have shown promise for semantic clone detection, their use as black-box systems raises concerns about cost, reproducibility, privacy, and unreliable output formatting. In particular, compact open-source models often struggle to follow reasoning-oriented prompts and to produce outputs that can be consistently mapped to binary clone labels. To address these limitations, we propose a knowledge distillation framework that transfers reasoning capabilities from DeepSeek-R1 into compact open-source student models for X-CCD. Using cross-language code pairs derived from Project CodeNet, we construct reasoning-oriented synthetic training data and fine-tune Phi3 and Qwen-Coder with LoRA adapters. We further introduce response stabilization methods, including forced conclusion prompting, a binary classification head, and a contrastive classification head, and evaluate model behavior using both predictive metrics and response rate. Experiments on Python--Java, Rust--Java, Rust--Python, and Rust--Ruby show that knowledge distillation consistently improves the reliability of compact models and often improves predictive performance, especially under distribution shift. In addition, classification-head variants substantially reduce inference time compared to generation-based inference. Overall, our results show that reasoning-oriented distillation combined with response...\n\nWhy this matters for Forge:\n- Provides a citab","keywords":["arxiv","cs.AI","cs.LG","cs.SE","distillation","free-public-reference","privacy","reasoning","software-engineering","source-backed"],"about":[],"citation":["https://arxiv.org/abs/2605.02860"],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://froggit.ai"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://froggit.ai"},"dateCreated":"2026-05-05T06:00:07.626000Z","dateModified":"2026-06-19T02:50:40.792000Z","isBasedOn":"https://arxiv.org/abs/2605.02860","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":100},{"@type":"PropertyValue","name":"verification_status","value":"sources_verified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"primary_source"}]}