Hezam
/

ArabicT5_Classification

Text Classification

text2text-generation

Text Classification

Model card Files Files and versions

ArabicT5_Classification / README.md

Hezam's picture

Update README.md

3e6a5d5 verified about 2 years ago

|

history blame contribute delete

3.57 kB

	---
	language:
	- ar
	metrics:
	- bleu
	- accuracy
	library_name: transformers
	pipeline_tag: text-classification
	tags:
	- t5
	- Classification
	- ArabicT5
	- Text Classification
	widget:
	- example_title: >
	الديني
	- text: >
	الحمد لله رب العالمين والصلاة والسلام على سيد المرسلين نبينا محمد وآله وصحبه أجمعين،وبعد:فإنه يجب على العبد أن يتجنب الذنوب كلها دقها وجلها صغيرها وكبيرها وأن يتعاهد نفسه بالتوبة الصادقة والإنابة إلى ربه. قال تعالى: (وَتُوبُوا إِلَى اللَّهِ جَمِيعًا أَيُّهَا الْمُؤْمِنُونَ لَعَلَّكُمْ تُفْلِحُونَ)النور 31.
	---

	# # Arabic text classification using deep learning (ArabicT5)

	# # Our experiment

	- The category mapping:
	category_mapping = {
	'Politics':1,
	'Finance':2,
	'Medical':3,
	'Sports':4,
	'Culture':5,
	'Tech':6,
	'Religion':7
	}

	- Training parameters
	\| \| \|
	\| :-------------------: \| :-----------:\|
	\| Training batch size \| `8` \|
	\| Evaluation batch size \| `8` \|
	\| Learning rate \| `1e-4` \|
	\| Max length input \| `200` \|
	\| Max length target \| `3` \|
	\| Number workers \| `4` \|
	\| Epoch \| `2` \|
	\| \| \|

	- Results
	\| \| \|
	\| :---------------------: \| :-----------: \|
	\| Validation Loss \| `0.0479` \|
	\| Accuracy \| `96.49%` \|
	\| BLeU \| `96.49%` \|

	# # SANAD: Single-label Arabic News Articles Dataset for automatic text categorization

	- Paper
	[https://www.researchgate.net/publication/333605992_SANAD_Single-Label_Arabic_News_Articles_Dataset_for_Automatic_Text_Categorization]

	- Dataset
	[https://data.mendeley.com/datasets/57zpx667y9/2]

	# # Arabic text classification using deep learning models

	- Paper
	[https://www.sciencedirect.com/science/article/abs/pii/S0306457319303413]

	- Their experiment'
	"Our experimental results showed that all models did very well on SANAD corpus with a minimum accuracy of 93.43%, achieved by CGRU, and top performance of 95.81%, achieved by HANGRU."
	\| Model \| Accuracy \|
	\| :---------------------: \| :---------------------: \|
	\| CGRU \| 93.43% \|
	\| HANGRU \| 95.81% \|

	# # Example usage
	```python
	from transformers import T5ForConditionalGeneration, T5Tokenizer

	model_name="Hezam/ArabicT5_Classification"
	model = T5ForConditionalGeneration.from_pretrained(model_name)
	tokenizer = T5Tokenizer.from_pretrained(model_name)

	text = "الزين فيك القناه الاولي المغربيه الزين فيك القناه الاولي المغربيه اخبارنا المغربيه متابعه تفاجا زوار موقع القناه الاولي المغربي"
	tokens=tokenizer(text, max_length=200,
	truncation=True,
	padding="max_length",
	return_tensors="pt"
	)

	output= model.generate(tokens['input_ids'],
	max_length=3,
	length_penalty=10)

	output = [tokenizer.decode(ids, skip_special_tokens=True,clean_up_tokenization_spaces=True)for ids in output]
	output

	```
	```bash
	['5']
	```