Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.
Multiverse Computing S.L. said today it has raised $215 million in funding to accelerate the deployment of its quantum computing-inspired artificial intelligence model compression technology, which ...
Effective compression is about finding patterns to make data smaller without losing information. When an algorithm or model can accurately guess the next piece of data in a sequence, it shows it’s ...
Large language models (LLMs) such as GPT-4o and other modern state-of-the-art generative models like Anthropic’s Claude, Google's PaLM and Meta's Llama have been dominating the AI field recently.
Small changes in the large language models (LLMs) at the heart of AI applications can result in substantial energy savings, according to a report released by the United Nations Educational, Scientific ...
San Sebastian, Spain – June 12, 2025: Multiverse Computing has developed CompactifAI, a compression technology capable of reducing the size of LLMs (Large Language Models) by up to 95 percent while ...
Intel has disclosed a maximum severity vulnerability in some versions of its Intel Neural Compressor software for AI model compression. The bug, designated as CVE-2024-22476, provides an ...
I see awful diminishing returns here. (Lossless) compression of today isn't really that much better than products from the 80s and early 90s - stacker (wasn't it?), pkzip, tar, gz. You get maybe a few ...