{"@context":"https://schema.org","@type":"CreativeWork","@id":"https://froggit.ai/public/capsules/1061251c-e117-4e1e-9690-42e924dc0561","identifier":"1061251c-e117-4e1e-9690-42e924dc0561","url":"https://froggit.ai/public/capsules/1061251c-e117-4e1e-9690-42e924dc0561","name":"Make Your LVLM KV Cache More Lightweight","text":"# Make Your LVLM KV Cache More Lightweight\n\nSource: arXiv:2605.00789, published 2026-05-01.\nAuthors: Xihao Chen et al.\nCategories: cs.CV, cs.AI, cs.LG\n\nThis capsule is a source-backed public reference summarizing the linked arXiv paper for Forge users and agents.\n\nSource-backed summary:\nKey-Value (KV) cache has become a de facto component of modern Large Vision-Language Models (LVLMs) for inference. While it enhances decoding efficiency in Large Language Models (LLMs), its direct adoption in LVLMs introduces substantial GPU memory overhead due to the large number of vision tokens processed during the prefill stage. To tackle this problem, we propose LightKV, a novel approach that reduces KV cache size by exploiting the redundancy among vision-token embeddings. Guided by text prompts, LightKV employs cross-modality message passing to aggregate informative messages across vision tokens and progressively compress them during prefill. This prompt-aware guidance distinguishes our method from prior vision-only compression strategies. We evaluate LightKV on eight open-source LVLMs across eight public benchmark datasets, e.g., MME and SeedBench. Experimental results demonstrate that with only 55% of the original vision tokens, LightKV (a) halves the vision-token KV cache size, (b) reduces computation by up to 40%, and (c) preserves general-purpose performance while significantly outperforming existing baselines.\n\nWhy this matters for Forge:\n- Provides a citable primary-source reference for agents, model evaluation, AI workflow design, or system reliability work.\n- Can support public answer generation because the capsule is grounded to a specific arXiv record and does not depend on generated-news claims.\n- Should be used as a paper summary, not as proof that Forge independently reproduced the experiments.\n\nLimitations: this is an arXiv paper/preprint summary. Forge has verified the source identity and made the capsule answer-ready as a source-backed reference, but has not in","keywords":["arxiv","benchmarks","cs.AI","cs.CV","cs.LG","evaluation","free-public-reference","gui-agents","kv-cache","memory","software-engineering","source-backed"],"about":[],"citation":["https://arxiv.org/abs/2605.00789"],"isPartOf":{"@type":"Dataset","name":"Forge Cascade Knowledge Graph","url":"https://froggit.ai"},"publisher":{"@type":"Organization","name":"Forge Cascade","url":"https://froggit.ai"},"dateCreated":"2026-05-04T06:00:06.458000Z","dateModified":"2026-06-19T02:50:40.732000Z","isBasedOn":"https://arxiv.org/abs/2605.00789","additionalProperty":[{"@type":"PropertyValue","name":"trust_level","value":100},{"@type":"PropertyValue","name":"verification_status","value":"sources_verified"},{"@type":"PropertyValue","name":"provenance_status","value":"valid"},{"@type":"PropertyValue","name":"evidence_level","value":"primary_source"}]}