KnutJaegersberg/c4-website-classifier-dataset
Viewer • Updated • 150k • 33 • 3
How to use KnutJaegersberg/website-classifier with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-classification", model="KnutJaegersberg/website-classifier") # Load model directly
from transformers import AutoTokenizer, AutoModelForSequenceClassification
tokenizer = AutoTokenizer.from_pretrained("KnutJaegersberg/website-classifier")
model = AutoModelForSequenceClassification.from_pretrained("KnutJaegersberg/website-classifier")Used the website categories from URL Classification Dataset [DMOZ] https://www.kaggle.com/datasets/shawon10/url-classification-dataset-dmoz and a zero-shot-classifier on c4 sample to make a training dataset. Only a subset of the linked dataset was used.
loss: 0.45181071758270264
f1_macro: 0.9103583954158208
f1_micro: 0.9108333333333334
f1_weighted: 0.9103583954158205
precision_macro: 0.9123282026272069
precision_micro: 0.9108333333333334
precision_weighted: 0.9123282026272069
recall_macro: 0.9108333333333334
recall_micro: 0.9108333333333334
recall_weighted: 0.9108333333333334
accuracy: 0.9108333333333334