Block Encoding Compression

1 天

Nvidia shrinks LLM memory 20x without changing model weights

Nvidia's KV Cache Transform Coding (KVTC) compresses LLM key-value cache by 20x without model changes, cutting GPU memory costs and time-to-first-token by up to 8x for multi-turn AI applications.

7 小时

Breaking the 100M Token Limit: EverMind's MSA Architecture Achieves Efficient End-to-End ...

The research introduces a novel memory architecture called MSA (Memory Sparse Attention). Through a combination of the Memory Sparse Attention mechanism, Document-wise RoPE for extreme context ...

Cybernews

Malicious campaign targeting vulnerable OpenWebUI servers: technical analysis

During an investigation into exposed OpenWebUI servers, the Cybernews research team identified a malicious campaign targeting vulnerable OpenWebUI servers with cryptocurrency miners and Info Stealers.

一些您可能无法访问的结果已被隐去。

显示无法访问的结果

Nvidia shrinks LLM memory 20x without changing model weights

Breaking the 100M Token Limit: EverMind's MSA Architecture Achieves Efficient End-to-End ...

Malicious campaign targeting vulnerable OpenWebUI servers: technical analysis

今日热点