出版物

如果您觉得这个仓库有帮助，欢迎引用我们的出版物 Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

@inproceedings{reimers-2019-sentence-bert,
    title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2019",
    publisher = "Association for Computational Linguistics",
    url = "http://arxiv.org/abs/1908.10084",
}

如果您使用多语言模型，欢迎引用我们的出版物 Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation

@inproceedings{reimers-2020-multilingual-sentence-bert,
    title = "Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation",
    author = "Reimers, Nils and Gurevych, Iryna",
    booktitle = "Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing",
    month = "11",
    year = "2020",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2004.09813",
}

如果您使用数据增强的代码，欢迎引用我们的出版物 Augmented SBERT: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks

@inproceedings{thakur-2020-AugSBERT,
    title = "Augmented {SBERT}: Data Augmentation Method for Improving Bi-Encoders for Pairwise Sentence Scoring Tasks",
    author = "Thakur, Nandan and Reimers, Nils and Daxenberger, Johannes  and Gurevych, Iryna",
    booktitle = "Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies",
    month = "6",
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2010.08240",
    pages = "296--310",
}

如果您使用 MS MARCO 模型，欢迎引用该论文：The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes

@inproceedings{reimers-2020-Curse_Dense_Retrieval,
    title = "The Curse of Dense Low-Dimensional Information Retrieval for Large Index Sizes",
    author = "Reimers, Nils  and Gurevych, Iryna",
    booktitle = "Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers)",
    month = "8",
    year = "2021",
    address = "Online",
    publisher = "Association for Computational Linguistics",
    url = "https://arxiv.org/abs/2012.14210",
    pages = "605--611",
}

当您使用无监督学习示例时，请参阅：TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning

@inproceedings{wang-2021-TSDAE,
    title = "TSDAE: Using Transformer-based Sequential Denoising Auto-Encoderfor Unsupervised Sentence Embedding Learning",
    author = "Wang, Kexin and Reimers, Nils and Gurevych, Iryna", 
    booktitle = "Findings of the Association for Computational Linguistics: EMNLP 2021",
    month = nov,
    year = "2021",
    address = "Punta Cana, Dominican Republic",
    publisher = "Association for Computational Linguistics",
    pages = "671--688",
    url = "https://arxiv.org/abs/2104.06979",
}

当您使用 GenQ 学习示例时，请参阅：BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models

@inproceedings{thakur-2021-BEIR,
    title = "BEIR: A Heterogeneous Benchmark for Zero-shot Evaluation of Information Retrieval Models",
    author = {Thakur, Nandan and Reimers, Nils and R{\"{u}}ckl{\'{e}}, Andreas and Srivastava, Abhishek and Gurevych, Iryna}, 
    booktitle={Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS 2021) - Datasets and Benchmarks Track (Round 2)},
    month = "4",
    year = "2021",
    url = "https://arxiv.org/abs/2104.08663",
}

当您使用 GPL 时，请参阅：GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval

@inproceedings{wang-2021-GPL,
    title = "GPL: Generative Pseudo Labeling for Unsupervised Domain Adaptation of Dense Retrieval",
    author = "Wang, Kexin and Thakur, Nandan and Reimers, Nils and Gurevych, Iryna", 
    journal= "arXiv preprint arXiv:2112.07577",
    month = "12",
    year = "2021",
    url = "https://arxiv.org/abs/2112.07577",
}

使用 SentenceTransformers 的代码库

haystack - 神经搜索 / 问答
Top2Vec - 主题建模
txtai - 人工智能驱动的搜索引擎
BERTTopic - 使用 SBERT 嵌入的主题模型
KeyBERT - 使用 SBERT 的关键短语提取
contextualized-topic-models - 跨语言主题建模
covid-papers-browser - Covid-19 论文的语义搜索
backprop - 一个使最先进的语言模型易于使用、访问和扩展的自然语言引擎。

文章中的 SentenceTransformers

下面是一个使用 SentenceTransformers 做出惊人成果的文章/应用精选列表。欢迎联系我 (info@nils-reimers.de) 将您的应用添加到此处。

2021年12月 - Sentence Transformer 微调 (SetFit)：在少样本文本分类上表现优于 GPT-3，同时体积小 1600 倍
2021年10月：用于语义搜索的自然语言处理 (NLP)
2021年1月 - 通过将知识从 Cross-Encoders 迁移到 Bi-Encoders 来改进 BERT 模型
2020年11月 - 如何使用 Transformers 和 Faiss 构建语义搜索引擎
2020年10月 - 使用 BERT 进行主题建模
2020年9月 - Elastic Transformers - 让 BERT 变得有弹性 - 在 Jupyter Notebook 上进行可扩展的语义搜索
2020年7月 - 使用 SentenceBERT 进行简单的句子相似度搜索
2020年5月 - HN Time Machine: 终于有了一些 Hacker News 历史！
2020年5月 - 使用句子嵌入 BERT 模型将英语迁移学习到其他语言的完整指南
2020年3月 - 使用 Amazon Elasticsearch 和 SageMaker 构建 k-NN 相似性搜索引擎
2020年2月 - 使用 Sentence BERT 的语义搜索引擎

研究中使用的 SentenceTransformers

SentenceTransformers 被用于数百个研究项目中。有关出版物列表，请参阅 Google Scholar 或 Semantic Scholar。