@Aria_Nawi
104B total parameters / 7.4B active parameters. But here's the thing, the efficiency gains don't come from shrinking the model. They come from how it's built. Ling-2.6-flash upgrades standard GQA into a 1:7 MLA + Lightning Linear hybrid attention design, paired with a sparse MoE