obi/deid_roberta_i2b2

token classificationtransformersentransformerspytorchsafetensorsrobertatoken-classificationdeidentificationmit

38

451.0K

Model Description

A RoBERTa [Liu et al., 2019] model fine-tuned for de-identification of medical notes.
Sequence Labeling (token classification): The model was trained to predict protected health information (PHI/PII) entities (spans). A list of protected health information categories is given by HIPAA.
A token can either be classified as non-PHI or as one of the 11 PHI types. Token predictions are aggregated to spans by making use of BILOU tagging.
The PHI labels that were used for training and other details can be found here: Annotation Guidelines
More details on how to use this model, the format of data and other useful information is present in the GitHub repo: Robust DeID.

A demo on how the model works (using model predictions to de-identify a medical note) is on this space: Medical-Note-Deidentification.
Steps on how this model can be used to run a forward pass can be found here: Forward Pass
In brief, the steps are:
- Sentencize (the model aggregates the sentences back to the note level) and tokenize the dataset.
- Use the predict function of this model to gather the predictions (i.e., predictions for each token).
- Additionally, the model predictions can be used to remove PHI from the original note/text.

Post a Github issue on the repo: Robust DeID.

Run this model on powerful GPU infrastructure. Deploy in 60 seconds.

Pay per second

H100, A100, RTX GPUs

Instant deployment

DEPLOY IN 60 SECONDS

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.