模型蒸馏
模型蒸馏指的是训练一个(通常较小的)学生模型,以模仿一个(通常较大的)教师模型或一组教师模型的行为。这通常用于使模型**更快、更便宜、更轻量**。
Cross Encoder 知识蒸馏
目标是最小化学生模型的 logits(即模型原始输出)与教师模型在相同输入对(通常是问题-答案对)上的 logits 之间的差异。
这里有两个训练脚本,使用了来自 Hostätter 等人的预计算 logits。他们为 MS MARCO 数据集训练了一个由 3 个(大型)模型组成的集成模型,并预测了各种(查询,段落)对的分数(50% 正样本,50% 负样本)。
-
在这个例子中,我们使用一个小型且快速的模型进行知识蒸馏,并从教师集成模型中学习 logits 分数。这使得模型性能与大型模型相当,同时速度快了 18 倍。
它使用
MSELoss
来最小化学生模型预测的 logits 与预计算的教师模型在(查询,答案)对上的 logits 之间的距离。 train_cross_encoder_kd_margin_mse.py
这个设置与前一个脚本相同,但现在使用了前述 Hostätter 等人论文中使用的
MarginMSELoss
。MarginMSELoss
不适用于(查询,答案)对和预计算的 logit,而是适用于(查询,正确答案,错误答案)三元组以及一个预计算的 logit,该 logit 对应于teacher.predict([query, correct_answer]) - teacher.predict([query, incorrect_answer])
。简而言之,这个预计算的 logit 是(查询,正确答案)和(查询,错误答案)之间的*差值*。
推理
tomaarsen/reranker-MiniLM-L12-H384-margin-mse 模型是使用第二个脚本训练的。如果您想在自己蒸馏模型之前试用该模型,请随时使用此脚本。
from sentence_transformers import CrossEncoder
# Download from the 🤗 Hub
model = CrossEncoder("tomaarsen/reranker-modernbert-base-msmarco-margin-mse")
# Get scores for pairs of texts
pairs = [
["where is joplin airport", "Scott Joplin is important both as a composer for bringing ragtime to the concert hall, setting the stage (literally) for the rise of jazz; and as an early advocate for civil rights and education among American blacks. Joplin is a hero, and a national treasure of the United States."],
["where is joplin airport", "Flights from Jos to Abuja will get you to this shimmering Nigerian capital within approximately 19 hours. Flights depart from Yakubu Gowon Airport/ Jos Airport (JOS) and arrive at Nnamdi Azikiwe International Airport (ABV). Arik Air is the main airline flying the Jos to Abuja route."],
["where is joplin airport", "Janis Joplin returned to the music scene, knowing it was her destiny, in 1966. A friend, Travis Rivers, recruited her to audition for the psychedelic band, Big Brother and the Holding Company, based in San Francisco. The band was quite big in San Francisco at the time, and Joplin landed the gig."],
["where is joplin airport", "Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals."],
["where is joplin airport", 'Trolley and rail lines made Joplin the hub of southwest Missouri. As the center of the "Tri-state district", it soon became the lead- and zinc-mining capital of the world. As a result of extensive surface and deep mining, Joplin is dotted with open-pit mines and mine shafts.'],
]
scores = model.predict(pairs)
print(scores)
# [0.00410349 0.03430534 0.5108879 0.999984 0.91639173]
# Or rank different texts based on similarity to a single text
ranks = model.rank(
"where is joplin airport",
[
"Scott Joplin is important both as a composer for bringing ragtime to the concert hall, setting the stage (literally) for the rise of jazz; and as an early advocate for civil rights and education among American blacks. Joplin is a hero, and a national treasure of the United States.",
"Flights from Jos to Abuja will get you to this shimmering Nigerian capital within approximately 19 hours. Flights depart from Yakubu Gowon Airport/ Jos Airport (JOS) and arrive at Nnamdi Azikiwe International Airport (ABV). Arik Air is the main airline flying the Jos to Abuja route.",
"Janis Joplin returned to the music scene, knowing it was her destiny, in 1966. A friend, Travis Rivers, recruited her to audition for the psychedelic band, Big Brother and the Holding Company, based in San Francisco. The band was quite big in San Francisco at the time, and Joplin landed the gig.",
"Joplin Regional Airport. Joplin Regional Airport (IATA: JLN, ICAO: KJLN, FAA LID: JLN) is a city-owned airport four miles north of Joplin, in Jasper County, Missouri. It has airline service subsidized by the Essential Air Service program. Airline flights and general aviation are in separate terminals.",
'Trolley and rail lines made Joplin the hub of southwest Missouri. As the center of the "Tri-state district", it soon became the lead- and zinc-mining capital of the world. As a result of extensive surface and deep mining, Joplin is dotted with open-pit mines and mine shafts.',
],
)
print(ranks)
# [
# {"corpus_id": 3, "score": 0.999984},
# {"corpus_id": 4, "score": 0.91639173},
# {"corpus_id": 2, "score": 0.5108879},
# {"corpus_id": 1, "score": 0.03430534},
# {"corpus_id": 0, "score": 0.004103488},
# ]