Multi-head Latent Attention (MLA)

Multi-head Latent Attention (MLA): Making Transformers More Efficient

This blog post explains Multi-head Latent Attention (MLA) and provides minimal working code in pytorch.

October 2025 · Saeed Mehrang
RoPE vs Sinusoidal Embeddings

Understanding Rotary Position Embeddings (RoPE): A Visual Guide

This blog post explains RoPE in simple terms, showing how it differs from sinusoidal embeddings and why it’s become the standard for modern language models.

October 2025 · Saeed Mehrang