在实际使用中,我们通常不会从零实现一个完整 Transformer,而是会用 Hugging Face Transformers 这样的库加载预训练模型。它帮我们封装好了 tokenizer、模型结构、权重加载、forward 输出、文本生成等流程。但如果只停留在复制代码能跑的层面,很容易不知道每个 API 背后对应的是前面讲过的哪个结构。
所以这一节的重点不是把 Hugging Face Transformers 的所有参数都列一遍,而是建立一个映射关系:
我们前面讲过的 Transformer 结构,在 Hugging Face Transformers 里分别对应哪些 API?
Note
这节内容主要介绍 Hugging Face Transformers 中常用的接口,以及这些接口和 Transformer 结构之间的对应关系。由于 Transformers 库仍在持续更新,部分 API 的行为或参数可能会随着版本变化而调整。如果你在使用时发现代码和本文不完全一致,建议优先参考最新的官方文档。
inputs = tokenizer('I love deep', return_tensors='pt')outputs = model(**inputs)logits = outputs.logitsprint(logits.shape)
torch.Size([1, 3, 50257])
这里的 logits 形状通常是:
(batch_size, seq_len, vocab_size)
它表示每个位置对词表中所有 token 的预测分数。
比如输入是:
I love deep
最后一个位置的 logits 可以用来预测下一个 token:
next_token_logits = logits[:, -1, :]
这对应自回归生成里的:
\[
p(x_{t+1} \mid x_{\le t})
\]
8.11.3.2 generate:自回归生成的封装
虽然我们可以手动取 logits[:, -1, :],然后一步一步采样,但实际使用中通常直接调用:
inputs = tokenizer('The last human on Earth heard a knock at the door and', return_tensors='pt',)output_ids = model.generate(**inputs, max_new_tokens=100)text = tokenizer.decode(output_ids[0], skip_special_tokens=True)print(text)
[transformers] Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The last human on Earth heard a knock at the door and the doorbell rang.
"Hello, my name is John. I'm a student at the University of California, Berkeley. I'm a student at the University of California, Berkeley. I'm a student at the University of California, Berkeley. I'm a student at the University of California, Berkeley. I'm a student at the University of California, Berkeley. I'm a student at the University of California, Berkeley. I'm a student at the University of California, Berkeley. I
inputs = tokenizer('The last human on Earth heard a knock at the door and', return_tensors='pt',)output_ids = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.8, top_p=0.9,)text = tokenizer.decode(output_ids[0], skip_special_tokens=True)print(text)
[transformers] Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
The last human on Earth heard a knock at the door and the door opened with a heavy slam.
"O-oh, I'm sorry, this is a new job."
"What? Why do you have to be here?"
The man who had come to ask about my job asked, "What do you want to do with the rest of your life? You must have a lot of money, and you have to work for a while to get your wages paid. I am not your boss, but you are my boss and
from transformers import AutoTokenizer, AutoModelForCausalLMmodel_id ='gpt2'tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForCausalLM.from_pretrained(model_id)ipy.clear_output()text ='I love deep learning because'inputs = tokenizer(text, return_tensors='pt')with torch.inference_mode(): output_ids = model.generate(**inputs, max_new_tokens=100, do_sample=True, temperature=0.8, top_p=0.9, )output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)print(output_text)
[transformers] Setting `pad_token_id` to `eos_token_id`:50256 for open-end generation.
I love deep learning because it is so simple, and so easy to learn.
It's so easy to see and understand how something works. It's just so much more difficult to use in the real world. The best way to see it is by looking at the data. That way you can learn it.
But there is also the real world. The data. The real world.
When we look at real data, we see the world from the perspective of a human. In real life,
from transformers import AutoTokenizer, AutoModelForSeq2SeqLMmodel_id ='t5-small'tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForSeq2SeqLM.from_pretrained(model_id)ipy.clear_output()
这类模型结构可以理解成:
\[
X \rightarrow \operatorname{Encoder} \rightarrow H \rightarrow \operatorname{Decoder} \rightarrow Y
\]
例如 T5 的输入通常是 text-to-text 形式:
text ='Translate English to German: I love deep learning.'inputs = tokenizer(text, return_tensors='pt')output_ids = model.generate(**inputs, max_new_tokens=50,)output = tokenizer.decode(output_ids[0], skip_special_tokens=True)print(output)
from transformers import AutoTokenizer, AutoModelForSeq2SeqLMmodel_id ='t5-small'tokenizer = AutoTokenizer.from_pretrained(model_id)model = AutoModelForSeq2SeqLM.from_pretrained(model_id)ipy.clear_output()text ='Translate English to German: I love deep learning.'inputs = tokenizer(text, return_tensors='pt')with torch.inference_mode(): output_ids = model.generate(**inputs, max_new_tokens=50, )output_text = tokenizer.decode(output_ids[0], skip_special_tokens=True)print(output_text)
Ich liebe das tiefe Lernen.
这段代码背后发生的是:
Tokenizer 处理输入文本;
Encoder 双向编码完整输入;
Decoder 从起始 token 开始自回归生成;
Decoder self-attention 使用 causal mask;
Decoder cross-attention 读取 encoder 输出;
generate() 返回生成结果。
这正好对应 encoder-decoder 的结构:
\[
X \rightarrow \operatorname{Encoder} \rightarrow H \rightarrow \operatorname{Decoder} \rightarrow Y
\]