Chargement du modèle¶

In [1]:
# importation
from transformers import pipeline
In [2]:
# Charger le pipeline pour un reranker
reranker = pipeline(
    "text-classification",
    model="BAAI/bge-reranker-v2-m3",
    tokenizer="BAAI/bge-reranker-v2-m3"
)

Première requête¶

In [3]:
# Requête et documents
query = "What is the primary goal of machine learning?"
documents = [
    "To manually program rules for every scenario.",
    "To enable systems to learn from data and improve over time.",
    "To replace all human decision-making.",
    "To store large amounts of data efficiently."
]
In [4]:
# Préparer les paires query + document
inputs = [{"text": query, "text_pair": doc} for doc in documents]
print(inputs)
[{'text': 'What is the primary goal of machine learning?', 'text_pair': 'To manually program rules for every scenario.'}, {'text': 'What is the primary goal of machine learning?', 'text_pair': 'To enable systems to learn from data and improve over time.'}, {'text': 'What is the primary goal of machine learning?', 'text_pair': 'To replace all human decision-making.'}, {'text': 'What is the primary goal of machine learning?', 'text_pair': 'To store large amounts of data efficiently.'}]
In [5]:
# Obtenir les scores d'association
scores = reranker(inputs)
print(scores)
[{'label': 'LABEL_0', 'score': 2.433966801618226e-05}, {'label': 'LABEL_0', 'score': 0.0110832080245018}, {'label': 'LABEL_0', 'score': 0.0007815913995727897}, {'label': 'LABEL_0', 'score': 0.0004032152355648577}]
In [6]:
# détail des résultats
for doc, score in zip(documents, scores):
    print(doc, "=>", score)
To manually program rules for every scenario. => {'label': 'LABEL_0', 'score': 2.433966801618226e-05}
To enable systems to learn from data and improve over time. => {'label': 'LABEL_0', 'score': 0.0110832080245018}
To replace all human decision-making. => {'label': 'LABEL_0', 'score': 0.0007815913995727897}
To store large amounts of data efficiently. => {'label': 'LABEL_0', 'score': 0.0004032152355648577}
In [7]:
# identifier le document qui maximise les scores
import numpy
print(documents[numpy.argmax([res['score'] for res in scores])])
To enable systems to learn from data and improve over time.

Fonction pour traiter les couples requêtes - réponses plausibles¶

In [8]:
# une fonction pour renvoyer la réponse la plus crédible
def use_reranker(model,query,responses):
    # Préparer les paires query + document
    entrees = [{"text": query, "text_pair": doc} for doc in responses]
    # calculer les scores
    values = model(entrees)
    # renvoyer cette correspondant au score le plus élevé
    return responses[numpy.argmax([res['score'] for res in values])]

Un essai¶

In [9]:
# Requête et documents candidats avec notre fonction
use_reranker(reranker,"Which of the following is a type of supervised learning algorithm?",
             ["K-means clustering","Principal Component Analysis (PCA)","Logistic regression","Apriori algorithm"])
Out[9]:
'Logistic regression'

Un second essai, etc...¶

In [10]:
# mais encore ?
use_reranker(reranker,"What does overfitting mean in machine learning?",
             ["The model performs well on both training and test data",
              "The model captures perfectly patterns",
              "The model learns noise and performs poorly on new data"])
Out[10]:
'The model learns noise and performs poorly on new data'