deep-learning-notes

中文
English

Part 7：多模态学习

Part 1: 深度学习基础
Part 2: 卷积神经网络
- Chapter 7: 正则化与归一化：让深层网络更稳定
Part 4: Attention 机制与 Transformer
- Chapter 8: Attention 与 Transformer：从动态检索到序列建模
- Chapter 10: 高效 Attention 实现：从 Memory-Efficient Attention 到 FlashAttention
  - 10.1 为什么 Attention 是 IO-Bound
  - 10.2 FlashAttention v1：消除 Attention 的 IO 瓶颈
Part 5: 现代计算机视觉
- Chapter 11: Vision Transformer：从图像分类到视觉序列建模
Part 6: 生成模型
Part 7: 多模态学习
- Chapter 15: 视觉语言模型：从图文对齐到多模态对话
  - 15.1 CLIP：把图像和文本映射到同一个语义空间
Part 8: LLM 基础：以 GPT-2 为例
- Chapter 18: 从零实现 GPT-2：语言模型训练的核心结构

Part 7：多模态学习

Author

jshn9515

Published

2026-05-05

Modified

2026-07-06

Title	Author	Date
15.1 CLIP：把图像和文本映射到同一个语义空间	jshn9515	2026-04-07

No matching items

Reuse

View source
Report an issue
Edit this page