Chargement du modèle¶
In [1]:
# importation
from transformers import pipeline
In [2]:
# Charger le pipeline pour un reranker
reranker = pipeline(
"text-classification",
model="BAAI/bge-reranker-v2-m3",
tokenizer="BAAI/bge-reranker-v2-m3"
)
Première requête¶
In [3]:
# Requête et documents
query = "What is the primary goal of machine learning?"
documents = [
"To manually program rules for every scenario.",
"To enable systems to learn from data and improve over time.",
"To replace all human decision-making.",
"To store large amounts of data efficiently."
]
In [4]:
# Préparer les paires query + document
inputs = [{"text": query, "text_pair": doc} for doc in documents]
print(inputs)
[{'text': 'What is the primary goal of machine learning?', 'text_pair': 'To manually program rules for every scenario.'}, {'text': 'What is the primary goal of machine learning?', 'text_pair': 'To enable systems to learn from data and improve over time.'}, {'text': 'What is the primary goal of machine learning?', 'text_pair': 'To replace all human decision-making.'}, {'text': 'What is the primary goal of machine learning?', 'text_pair': 'To store large amounts of data efficiently.'}]
In [5]:
# Obtenir les scores d'association
scores = reranker(inputs)
print(scores)
[{'label': 'LABEL_0', 'score': 2.433966801618226e-05}, {'label': 'LABEL_0', 'score': 0.0110832080245018}, {'label': 'LABEL_0', 'score': 0.0007815913995727897}, {'label': 'LABEL_0', 'score': 0.0004032152355648577}]
In [6]:
# détail des résultats
for doc, score in zip(documents, scores):
print(doc, "=>", score)
To manually program rules for every scenario. => {'label': 'LABEL_0', 'score': 2.433966801618226e-05}
To enable systems to learn from data and improve over time. => {'label': 'LABEL_0', 'score': 0.0110832080245018}
To replace all human decision-making. => {'label': 'LABEL_0', 'score': 0.0007815913995727897}
To store large amounts of data efficiently. => {'label': 'LABEL_0', 'score': 0.0004032152355648577}
In [7]:
# identifier le document qui maximise les scores
import numpy
print(documents[numpy.argmax([res['score'] for res in scores])])
To enable systems to learn from data and improve over time.
Fonction pour traiter les couples requêtes - réponses plausibles¶
In [8]:
# une fonction pour renvoyer la réponse la plus crédible
def use_reranker(model,query,responses):
# Préparer les paires query + document
entrees = [{"text": query, "text_pair": doc} for doc in responses]
# calculer les scores
values = model(entrees)
# renvoyer cette correspondant au score le plus élevé
return responses[numpy.argmax([res['score'] for res in values])]
Un essai¶
In [9]:
# Requête et documents candidats avec notre fonction
use_reranker(reranker,"Which of the following is a type of supervised learning algorithm?",
["K-means clustering","Principal Component Analysis (PCA)","Logistic regression","Apriori algorithm"])
Out[9]:
'Logistic regression'
Un second essai, etc...¶
In [10]:
# mais encore ?
use_reranker(reranker,"What does overfitting mean in machine learning?",
["The model performs well on both training and test data",
"The model captures perfectly patterns",
"The model learns noise and performs poorly on new data"])
Out[10]:
'The model learns noise and performs poorly on new data'