valhalla/distilbart-mnli-12-3

Name: valhalla/distilbart-mnli-12-3
Rating: 5 (20 reviews)
Author: valhalla

zero shot classificationtransformerstransformerspytorchjaxbarttext-classificationdistilbart

20

HuggingFace

190.6K

DistilBart-MNLI

distilbart-mnli is the distilled version of bart-large-mnli created using the No Teacher Distillation technique proposed for BART summarisation by Huggingface, here.

We just copy alternating layers from bart-large-mnli and finetune more on the same data.

	matched acc	mismatched acc
bart-large-mnli (baseline, 12-12)	89.9	90.01
distilbart-mnli-12-1	87.08	87.5
distilbart-mnli-12-3	88.1	88.19
distilbart-mnli-12-6	89.19	89.01
distilbart-mnli-12-9	89.56	89.52

This is a very simple and effective technique, as we can see the performance drop is very little.

Detailed performace trade-offs will be posted in this sheet.

Fine-tuning

If you want to train these models yourself, clone the distillbart-mnli repo and follow the steps below

Clone and install transformers from source

git clone https://github.com/huggingface/transformers.git
pip install -qqq -U ./transformers

Download MNLI data

python transformers/utils/download_glue_data.py --data_dir glue_data --tasks MNLI

Create student model

python create_student.py \
  --teacher_model_name_or_path facebook/bart-large-mnli \
  --student_encoder_layers 12 \
  --student_decoder_layers 6 \
  --save_path student-bart-mnli-12-6 \

Start fine-tuning

python run_glue.py args.json

You can find the logs of these trained models in this wandb project.

Deploy Model on Runcrate

Run this model on powerful GPU infrastructure. Deploy in 60 seconds.

Pay per second

H100, A100, RTX GPUs

Instant deployment

DEPLOY IN 60 SECONDS

Run distilbart-mnli-12-3 on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.