imvladikon/wav2vec2-xls-r-300m-hebrew

automatic speech recognitiontransformershetransformerspytorchsafetensorswav2vec2automatic-speech-recognitiongenerated_from_trainer
215.5K

wav2vec2-xls-r-300m-hebrew

This model is a fine-tuned version of facebook/wav2vec2-xls-r-300m on the private datasets in 2 stages - firstly was fine-tuned on a small dataset with good samples Then the obtained model was fine-tuned on a large dataset with the small good dataset, with various samples from different sources, and with an unlabeled dataset that was weakly labeled using a previously trained model.

Small dataset:

splitsize(gb)n_samplesduration(hrs)
train4.192030628
dev1.0550767

Large dataset:

splitsize(gb)n_samplesduration(hrs)
train12.39077769
dev2.392024614*
(*weakly labeled data wasn't used in validation set)

After firts training it achieves:

on small dataset

  • Loss: 0.5438
  • WER: 0.1773

on large dataset

  • WER: 0.3811

after second training: on small dataset

  • WER: 0.1697

on large dataset

  • Loss: 0.4502
  • WER: 0.2318

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

First training

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 100.0
  • mixed_precision_training: Native AMP

Training results

Training LossEpochStepValidation LossWer
No log3.1510000.52030.4333
1.42846.3120000.48160.3951
1.42849.4630000.43150.3546
1.28312.6240000.42780.3404
1.28315.7750000.40900.3054
1.177718.9360000.38930.3006
1.177722.0870000.39680.2857
1.099425.2480000.38920.2751
1.099428.3990000.40610.2690
1.032331.54100000.41140.2507
1.032334.7110000.40210.2508
0.962337.85120000.40320.2378
0.962341.01130000.41480.2374
0.907744.16140000.43500.2323
0.907747.32150000.45150.2246
0.857350.47160000.44740.2180
0.857353.63170000.46490.2171
0.808356.78180000.44550.2102
0.808359.94190000.45870.2092
0.76963.09200000.47940.2012
0.76966.25210000.48450.2007
0.730869.4220000.49370.2008
0.730872.55230000.49200.1895
0.692775.71240000.51790.1911
0.692778.86250000.52020.1877
0.662282.02260000.52660.1840
0.662285.17270000.53510.1854
0.631588.33280000.53730.1811
0.631591.48290000.53310.1792
0.607594.64300000.53900.1779
0.607597.79310000.54590.1773

Second training

The following hyperparameters were used during training:

  • learning_rate: 0.0003
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 2
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 1000
  • num_epochs: 60.0
  • mixed_precision_training: Native AMP

Training results

Training LossEpochStepValidation LossWer
No log0.710000.53710.3811
1.36061.4120000.52470.3902
1.36062.1230000.51260.3859
1.36712.8240000.50620.3828
1.36713.5350000.49790.3672
1.34214.2360000.49060.3816
1.34214.9470000.47840.3651
1.3285.6480000.48100.3669
1.3286.3590000.47470.3597
1.31097.05100000.48130.3808
1.31097.76110000.46310.3561
1.28738.46120000.46030.3431
1.28739.17130000.45790.3533
1.26619.87140000.44710.3365
1.266110.58150000.45840.3437
1.24911.28160000.44610.3454
1.24911.99170000.44820.3367
1.232212.69180000.44640.3335
1.232213.4190000.44270.3454
1.2214.1200000.44400.3395
1.2214.81210000.44590.3378
1.204415.51220000.44060.3199
1.204416.22230000.43980.3155
1.191316.92240000.42370.3150
1.191317.63250000.42870.3279
1.170518.34260000.42530.3103
1.170519.04270000.42340.3098
1.156419.75280000.41740.3076
1.156420.45290000.42600.3160
1.146121.16300000.42350.3036
1.146121.86310000.43090.3055
1.128522.57320000.42640.3006
1.128523.27330000.42010.2880
1.113523.98340000.41310.2975
1.113524.68350000.42020.2849
1.096825.39360000.41050.2888
1.096826.09370000.42100.2834
1.08726.8380000.41230.2843
1.08727.5390000.42160.2803
1.070728.21400000.41610.2787
1.070728.91410000.41860.2740
1.057529.62420000.41180.2845
1.057530.32430000.42430.2773
1.047431.03440000.42210.2707
1.047431.73450000.41380.2700
1.033332.44460000.41020.2638
1.033333.15470000.41620.2650
1.019133.85480000.41550.2636
1.019134.56490000.41290.2656
1.008735.26500000.41570.2632
1.008735.97510000.40900.2654
0.990136.67520000.41830.2587
0.990137.38530000.42510.2648
0.979538.08540000.42290.2555
0.979538.79550000.41760.2546
0.964439.49560000.42230.2513
0.964440.2570000.42440.2530
0.953440.9580000.41750.2538
0.953441.61590000.42130.2505
0.939742.31600000.42750.2565
0.939743.02610000.43150.2528
0.926943.72620000.43160.2501
0.926944.43630000.42470.2471
0.917545.13640000.43760.2469
0.917545.84650000.43350.2450
0.902646.54660000.43360.2452
0.902647.25670000.44000.2427
0.892947.95680000.43820.2429
0.892948.66690000.43610.2415
0.878649.37700000.44130.2398
0.878650.07710000.43920.2415
0.871450.78720000.43450.2406
0.871451.48730000.44750.2402
0.858952.19740000.44730.2374
0.858952.89750000.44570.2357
0.849353.6760000.44620.2366
0.849354.3770000.44940.2356
0.839555.01780000.44720.2352
0.839555.71790000.44900.2339
0.829556.42800000.44890.2318
0.829557.12810000.44690.2320
0.822557.83820000.44780.2321
0.822558.53830000.45250.2326
0.81659.24840000.45320.2316
0.81659.94850000.45020.2318

Framework versions

  • Transformers 4.17.0.dev0
  • Pytorch 1.10.2+cu102
  • Datasets 1.18.2.dev0
  • Tokenizers 0.11.0
DEPLOY IN 60 SECONDS

Run wav2vec2-xls-r-300m-hebrew on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.