Instructions to use clarin-pl/FastPDN with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use clarin-pl/FastPDN with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("token-classification", model="clarin-pl/FastPDN")# Load model directly from transformers import AutoTokenizer, AutoModelForTokenClassification tokenizer = AutoTokenizer.from_pretrained("clarin-pl/FastPDN") model = AutoModelForTokenClassification.from_pretrained("clarin-pl/FastPDN") - Notebooks
- Google Colab
- Kaggle
| language: pl | |
| license: cc-by-4.0 | |
| tags: | |
| - ner | |
| datasets: | |
| - clarin-pl/kpwr-ner | |
| metrics: | |
| - f1 | |
| - accuracy | |
| - precision | |
| - recall | |
| widget: | |
| - text: "Nazywam się Jan Kowalski i mieszkam we Wrocławiu." | |
| example_title: "Example" | |
| # FastPDN | |
| FastPolDeepNer is model for Named Entity Recognition, designed for easy use, training and configuration. The forerunner of this project is [PolDeepNer2](https://gitlab.clarin-pl.eu/information-extraction/poldeepner2). The model implements a pipeline consisting of data processing and training using: hydra, pytorch, pytorch-lightning, transformers. | |
| Source code: https://gitlab.clarin-pl.eu/grupa-wieszcz/ner/fast-pdn | |
| ## How to use | |
| Here is how to use this model to get Named Entities in text: | |
| ```python | |
| from transformers import pipeline | |
| ner = pipeline('ner', model='clarin-pl/FastPDN', aggregation_strategy='simple') | |
| text = "Nazywam się Jan Kowalski i mieszkam we Wrocławiu." | |
| ner_results = ner(text) | |
| for output in ner_results: | |
| print(output) | |
| {'entity_group': 'nam_liv_person', 'score': 0.9996054, 'word': 'Jan Kowalski', 'start': 12, 'end': 24} | |
| {'entity_group': 'nam_loc_gpe_city', 'score': 0.998931, 'word': 'Wrocławiu', 'start': 39, 'end': 48} | |
| ``` | |
| Here is how to use this model to get the logits for every token in text: | |
| ```python | |
| from transformers import AutoTokenizer, AutoModelForTokenClassification | |
| tokenizer = AutoTokenizer.from_pretrained("clarin-pl/FastPDN") | |
| model = AutoModelForTokenClassification.from_pretrained("clarin-pl/FastPDN") | |
| text = "Nazywam się Jan Kowalski i mieszkam we Wrocławiu." | |
| encoded_input = tokenizer(text, return_tensors='pt') | |
| output = model(**encoded_input) | |
| ``` | |
| ## Training data | |
| The FastPDN model was trained on datasets (with 82 class versions) of kpwr and cen. Annotation guidelines are specified [here](https://clarin-pl.eu/dspace/bitstream/handle/11321/294/WytyczneKPWr-jednostkiidentyfikacyjne.pdf). | |
| ## Pretraining | |
| FastPDN models have been fine-tuned, thanks to pretrained models: | |
| - [herbert-base-case](https://huggingface.co/allegro/herbert-base-cased) | |
| - [distiluse-base-multilingual-cased-v1](sentence-transformers/distiluse-base-multilingual-cased-v1) | |
| ## Evaluation | |
| Runs trained on `cen_n82` and `kpwr_n82`: | |
| | name |test/f1|test/pdn2_f1|test/acc|test/precision|test/recall| | |
| |---------|-------|------------|--------|--------------|-----------| | |
| |distiluse| 0.53 | 0.61 | 0.95 | 0.55 | 0.54 | | |
| | herbert | 0.68 | 0.78 | 0.97 | 0.7 | 0.69 | | |
| ## Authors | |
| - Grupa Wieszcze CLARIN-PL | |
| - Wiktor Walentynowicz | |
| ## Contact | |
| - Norbert Ropiak (norbert.ropiak@pwr.edu.pl) | |