出版物
如果您觉得这个仓库有帮助,欢迎引用我们的出版物 Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "http://arxiv.org/abs/1908.10084",
}
如果您使用多语言模型,欢迎引用我们的出版物 Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
@inproceedings{reimers-2020-multilingual-sentence-bert,
title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2020",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2004.09813",
}
如果您使用数据增强的代码,欢迎引用我们的出版物 Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks
@inproceedings{thakur-2020-AugSBERT,
title = "Augmented {SBERT}: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks",
author = "Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes and Gurevych, Iryna",
booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
month = "6",
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2010.08240",
pages = "296--310",
}
如果您使用 MS MARCO 模型,欢迎引用该论文:The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes
@inproceedings{reimers-2020-Curse_Dense_Retrieval,
title = "The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
month = "8",
year = "2021",
address = "Online",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/2012.14210",
pages = "605--611",
}
当您使用无监督学习示例时,请参阅:TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning
@inproceedings{wang-2021-TSDAE,
title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna",
booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
month = nov,
year = "2021",
address = "Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
pages = "671--688",
url = "https://arxiv.org/abs/2104.06979",
}
当您使用 GenQ 学习示例时,请参阅:BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models
@inproceedings{thakur-2021-BEIR,
title = "BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models",
author = {Thakur, Nandan and Reimers, Nils and R{\"{u}}ckl{\'{e}}, Andreas and Srivastava, Abhishek and Gurevych, Iryna},
booktitle={Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021) - Datasets and Benchmarks Track (Round 2)},
month = "4",
year = "2021",
url = "https://arxiv.org/abs/2104.08663",
}
当您使用 GPL 时,请参阅:GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval
@inproceedings{wang-2021-GPL,
title = "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval",
author = "Wang, Kexin and Thakur, Nandan and Reimers, Nils and Gurevych, Iryna",
journal= "arXiv preprint arXiv:2112.07577",
month = "12",
year = "2021",
url = "https://arxiv.org/abs/2112.07577",
}
使用 SentenceTransformers 的代码库
haystack - 神经搜索 / 问答
Top2Vec - 主题建模
txtai - 人工智能驱动的搜索引擎
BERTTopic - 使用 SBERT 嵌入的主题模型
KeyBERT - 使用 SBERT 的关键短语提取
contextualized-topic-models - 跨语言主题建模
covid-papers-browser - Covid-19 论文的语义搜索
backprop - 一个使最先进的语言模型易于使用、访问和扩展的自然语言引擎。
文章中的 SentenceTransformers
下面是一个使用 SentenceTransformers 做出惊人成果的文章/应用精选列表。欢迎联系我 (info@nils-reimers.de) 将您的应用添加到此处。
2021年12月 - Sentence Transformer 微调 (SetFit):在少样本文本分类上表现优于 GPT-3,同时体积小 1600 倍
2021年10月:用于语义搜索的自然语言处理 (NLP)
2020年11月 - 如何使用 Transformers 和 Faiss 构建语义搜索引擎
2020年10月 - 使用 BERT 进行主题建模
2020年9月 - Elastic Transformers - 让 BERT 变得有弹性 - 在 Jupyter Notebook 上进行可扩展的语义搜索
2020年7月 - 使用 SentenceBERT 进行简单的句子相似度搜索
2020年5月 - HN Time Machine: 终于有了一些 Hacker News 历史!
2020年5月 - 使用句子嵌入 BERT 模型将英语迁移学习到其他语言的完整指南
2020年3月 - 使用 Amazon Elasticsearch 和 SageMaker 构建 k-NN 相似性搜索引擎
2020年2月 - 使用 Sentence BERT 的语义搜索引擎
研究中使用的 SentenceTransformers
SentenceTransformers 被用于数百个研究项目中。有关出版物列表,请参阅 Google Scholar 或 Semantic Scholar。