Instructions to use DatarrX/myX-TransStyle-W2S with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use DatarrX/myX-TransStyle-W2S with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="DatarrX/myX-TransStyle-W2S")

# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("DatarrX/myX-TransStyle-W2S")
model = AutoModelForSeq2SeqLM.from_pretrained("DatarrX/myX-TransStyle-W2S")

PEFT
How to use DatarrX/myX-TransStyle-W2S with PEFT:
```
Task type is invalid.
```
Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use DatarrX/myX-TransStyle-W2S with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "DatarrX/myX-TransStyle-W2S"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DatarrX/myX-TransStyle-W2S",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker

docker model run hf.co/DatarrX/myX-TransStyle-W2S

SGLang

How to use DatarrX/myX-TransStyle-W2S with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "DatarrX/myX-TransStyle-W2S" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DatarrX/myX-TransStyle-W2S",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "DatarrX/myX-TransStyle-W2S" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "DatarrX/myX-TransStyle-W2S",
		"prompt": "Once upon a time,",
		"max_tokens": 512,
		"temperature": 0.5
	}'

Docker Model Runner
How to use DatarrX/myX-TransStyle-W2S with Docker Model Runner:
```
docker model run hf.co/DatarrX/myX-TransStyle-W2S
```

myX-TransStyle-W2S / README.md

kalixlouiis

Update README.md

c45299d verified about 1 month ago

preview code

raw

history blame contribute delete

9.1 kB

	---
	license: mit

	datasets:
	- DatarrX/Myanmar-Written-Spoken-Parallel-Corpus

	language:
	- my

	metrics:
	- bleu
	- chrf
	- ter
	- bertscore

	base_model:
	- facebook/nllb-200-distilled-600M

	pipeline_tag: text-generation

	library_name: transformers

	tags:
	- burmese
	- myanmar
	- myanmar-language
	- burmese-nlp
	- style-transfer
	- text-rewriting
	- formal-to-informal
	- written-to-spoken
	- seq2seq
	- nllb
	- lora
	- peft
	- low-resource-language
	- text-generation

	model-index:
	- name: myX-TransStyle-W2S
	results:
	- task:
	type: text-generation
	name: Burmese Style Transfer (Written to Spoken)
	dataset:
	name: Custom External Test Set
	type: csv
	config: default
	split: test
	metrics:
	- type: bleu
	value: 19.6381
	name: BLEU
	- type: chrf
	value: 78.3975
	name: chrF
	- type: ter
	value: 50.7353
	name: TER
	- type: bertscore
	value: 0.9693
	name: BERTScore F1

	---

	# 📝 myX-TransStyle-W2S: A Transformer-based Style Transfer for Myanmar Written (ရေးဟန်) to Spoken (ပြောဟန်)

	myX-TransStyle-W2S is a specialized Sequence-to-Sequence (Seq2Seq) model developed by Khant Sint Heinn (Kalix Louis) under DatarrX. It is specifically designed to transform formal Written Burmese (ရေးဟန်) into its natural colloquial Spoken Burmese (ပြောဟန်) counterpart. This model ensures that formal documents or news can be converted into fluid, human-like dialogue while maintaining 100% semantic integrity.

	## Model Details

	- Developed by: [Khant Sint Heinn (Kalix Louis)](https://huggingface.co/kalixlouiis)
	- Organization: [DatarrX \| ဒေတာ-အက်စ်](https://huggingface.co/DatarrX)
	- Model Architecture: Fine-tuned NLLB-200 (600M Distilled) with merged LoRA adapters
	- Language: Burmese (Myanmar)
	- Task: Text Style Transfer (Written → Spoken)
	- License: MIT
	- Trained on: [Myanmar Written-Spoken Parallel Corpus (MWSPC)](https://huggingface.co/datasets/DatarrX/Myanmar-Written-Spoken-Parallel-Corpus)

	---

	## Linguistic Context: The Diglossia Challenge

	Burmese is a diglossic language, featuring a major linguistic gap between two functional registers:

	* Written Style (ရေးဟန်): Used in news, law, textbooks, and officialdom. It relies on formal grammatical markers such as "သည်", "၏", and "၍".
	* Spoken Style (ပြောဟန်): Used in daily life, verbal communication, and social media. It uses colloquial markers like "တယ်" (tense), "ရဲ့" (possessive), and "နဲ့" (conjunction).

	myX-TransStyle-W2S addresses the "robotic" nature of modern AI by allowing formal text to be localized into the natural, warm tone used by native speakers every day.

	---

	## Training Methodology

	The model was trained using an efficient adaptation strategy optimized for the unique structural shifts of Myanmar style.

	### 1. The Dataset ([MWSPC](https://huggingface.co/datasets/DatarrX/Myanmar-Written-Spoken-Parallel-Corpus))
	The model was trained on 5,555 high-quality, unique parallel text pairs. This dataset provides a direct mapping from formal literary structures to their informal colloquial equivalents, filtered to ensure maximum diversity.

	### 2. Parameter-Efficient Fine-Tuning (PEFT)
	To capture nuanced stylistic shifts without overwriting the base model's linguistic depth, we utilized Low-Rank Adaptation (LoRA):
	* Target Modules: `q_proj`, `k_proj`, `v_proj`, `out_proj`.
	* Rank (R): 32 \| Alpha: 64.
	* Learning Rate: 8e-5 with a Cosine scheduler.

	### 3. Merging Strategy
	The LoRA adapters were merged into the base `nllb-200-distilled-600M` model using `merge_and_unload()`. The resulting standalone 2.8 GB model provides high-speed inference without requiring the PEFT library.

	---

	## Evaluation Results

	The model was validated on 100 unseen test sentences and showed superior performance compared to its S2W sibling.

	### Performance Metrics
	\| Metric \| Score \| Interpretation \|
	\|---\|---\|---\|
	\| BERTScore F1 \| 0.9693 \| Indicates near-perfect meaning preservation during style transfer. \|
	\| chrF \| 78.40 \| Exceptional character-level accuracy, specifically in converting formal suffixes. \|
	\| BLEU \| 19.64 \| Higher than S2W, reflecting a more consistent conversion pattern into spoken style. \|

	### Qualitative Analysis
	Manual review by native speakers confirms the model's ability to not only swap particles but also adjust vocabulary (e.g., converting “အလွန်ပင်” to “သိပ်” or “အကယ်ပင်” to “တကယ်လို့တောင်”) in a way that feels authentic and human.

	---

	## 🔗 Related Models in the DatarrX Ecosystem

	Explore other specialized models for Myanmar linguistic styles:

	* [myX-TransStyle-S2W](https://huggingface.co/DatarrX/myX-TransStyle-S2W): The sibling model for converting Spoken Style to formal Written Style.
	* [myX-StyleClassifier](https://huggingface.co/DatarrX/myX-StyleClassifier): Use this to automatically detect the style of your input text before processing.
	---

	## How to Use

	```python
	from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

	# 1. Load the Merged Model
	model_id = "DatarrX/myX-TransStyle-W2S"
	tokenizer = AutoTokenizer.from_pretrained(model_id)
	model = AutoModelForSeq2SeqLM.from_pretrained(model_id)

	# 2. Prepare Input
	prefix = "Rewrite Burmese formal written sentence into spoken Burmese: "
	written_text = "ပုဂံခေတ်သည် မြန်မာနိုင်ငံသမိုင်းတွင် ပထမဆုံးသော အင်ပါယာနိုင်ငံတော်ကြီး ဖြစ်ခဲ့သည်။"
	input_text = prefix + written_text

	# 3. Generate Spoken Style
	inputs = tokenizer(input_text, return_tensors="pt")
	outputs = model.generate(
	**inputs,
	forced_bos_token_id=tokenizer.convert_tokens_to_ids("mya_Mymr"),
	max_length=160,
	num_beams=5
	)

	print(tokenizer.decode(outputs[0], skip_special_tokens=True))
	# Output: ပုဂံခေတ်က မြန်မာနိုင်ငံသမိုင်းမှာ ပထမဆုံး အင်ပါယာနိုင်ငံတော်ကြီးဖြစ်ခဲ့တယ်။
	```

	---

	## Intended Use & Limitations

	### Use Cases
	- Natural AI Personalities: Converting formal bot responses into natural-sounding speech.
	- Content Localization: Making formal news or articles more accessible for audio/podcasts.
	- Creative Writing: Assisting authors in converting narrative descriptions into natural character dialogue.

	### Limitations
	- Dialectal Focus: Primarily focuses on the standard Yangon/Mandalay dialect; regional slang may be less represented.
	- Contextual Nuance: While meaning is preserved, the "warmth" of the spoken style may vary depending on the complexity of the input.

	## Citation

	### BibTeX
	```BibTeX
	@misc{myx_transstyle_w2s_2026,
	author = {Khant Sint Heinn (Kalix Louis)},
	title = {myX-TransStyle-W2S: A Written to Spoken Burmese Style Transfer Model},
	year = {2026},
	publisher = {Hugging Face},
	organization = {DatarrX},
	howpublished = {https://huggingface.co/DatarrX/myX-TransStyle-W2S}
	}
	```
	---

	## About the Author

	Khant Sint Heinn, working under the name Kalix Louis, is a Machine Learning Engineer focused on Natural Language Processing (NLP), data foundations, and open-source AI development. His work is centered on improving support for the Burmese (Myanmar) language in modern AI systems by building high-quality datasets, practical tools, and scalable infrastructure for language technology.

	He is currently the Lead Developer at DatarrX, where he develops data pipelines, manages large-scale data collection workflows, and helps create open-source resources for researchers, developers, and organizations. His experience includes data engineering, web scripting, dataset curation, and building systems that support real-world machine learning applications.

	Khant Sint Heinn is especially interested in advancing low-resource languages and making AI more accessible to underrepresented communities. Through his open-source contributions, he works to strengthen the Burmese (Myanmar) tech ecosystem and provide reliable building blocks for future language models, search systems, and intelligent applications.

	His goal is simple: to turn limited language resources into practical opportunities through clean data, useful tools, and community-driven innovation.

	Connect with the Author:
	[GitHub](https://github.com/kalixlouiis) \| [Hugging Face](https://huggingface.co/kalixlouiis) \| [Kaggle](https://www.kaggle.com/organizations/kalixlouiis)

	---
	Developed with ❤️ by [DatarrX](https://huggingface.co/DatarrX) to empower the Myanmar AI ecosystem.