Cohere's first developer coding model is a 30B mixture-of-experts running on a single H100 with 256K context length.
DiffusionGemma is Google DeepMind's experimental 26B open model using text diffusion for up to 4x faster generation on GPUs.