This model is a fine-tuned version of bert-base-multilingual-cased for Named Entity Recognition (NER) in Russian text. It can identify various entity types such as person first names, middle names, last names, cities, districts, etc using the BIOLU tagging format.
The model is designed to identify named entities in Russian text. It can be used for tasks such as information extraction, content analysis, and text preprocessing for downstream NLP tasks.
Here's a simple example of how to use the model:
from transformers import pipeline
ner_pipe = pipeline("ner", model="Gherman/bert-base-NER-Russian")
text = "Меня зовут Сергей Иванович из Москвы."
results = ner_pipe(text)
for result in results:
print(f"Word: {result['word']}, Entity: {result['entity']}, Score: {result['score']:.4f}")
The model was trained on Detailed-NER-Dataset-RU by AlexKly. Check it out, the dataset is pretty good!
The dataset is labeled using the BIOLU format, where:
The following entity types are included in the dataset:
Location (LOC) tags:
Person (PER) tags:
For example, a full tag might look like "B-CITY" for the beginning token of a city name, or "U-COUNTRY" for a single-token country name.
The model was fine-tuned from the bert-base-multilingual-cased checkpoint using the Hugging Face Transformers library.
The following hyperparameters were used during training:
The model achieves the following results on the evaluation set:
This model is intended for use in analyzing Russian text and should be used responsibly. Users should be aware of potential biases in the model's predictions and use the results judiciously, especially in applications that may impact individuals or groups.