Part 4：Attention 机制与 Transformer

Author

jshn9515

Published

2026-05-05

Modified

2026-07-06

Title	Author	Date
10.1 为什么 Attention 是 IO-Bound	jshn9515	2026-03-19
10.2 FlashAttention v1：消除 Attention 的 IO 瓶颈	jshn9515	2026-03-19
8.1 Bahdanau Attention：从信息压缩到动态检索	jshn9515	2026-04-09
8.10 Transformer 的三种不同架构：理解、生成与输入输出转换	jshn9515	2026-05-08
8.11 Hugging Face Transformers API：从结构到调用	jshn9515	2026-05-09
8.2 Cross-Attention：一个序列查询另一个序列	jshn9515	2026-04-09
8.3 Self-Attention：序列内部的信息交互	jshn9515	2026-04-09
8.4 Multi-Head Attention：从单一视角到多重视角	jshn9515	2026-04-09
8.5 Positional Encoding：给 Attention 补上位置信息	jshn9515	2026-04-09
8.6 Transformer Encoder：把 Self-Attention 堆起来	jshn9515	2026-05-03
8.7 Transformer Decoder：Masked Self-Attention 与 Cross-Attention	jshn9515	2026-05-05
8.8 Encoder-Decoder Transformer：把 Encoder 和 Decoder 连接起来	jshn9515	2026-05-05
8.9 KV Cache：为什么推理时不用重复算过去	jshn9515	2026-05-05

Reuse