eddiegulay/wav2vec2-large-xlsr-mvc-swahili

Name: eddiegulay/wav2vec2-large-xlsr-mvc-swahili
Rating: 5 (3 reviews)
Author: eddiegulay

automatic speech recognitiontransformersswtransformerstensorboardsafetensorswav2vec2automatic-speech-recognitiongenerated_from_trainerapache-2.0

3

HuggingFace

826.9K

wav2vec2-large-xlsr-mvc-swahili

This model is a finetuned version of facebook/wav2vec2-large-xlsr-53.

How to use the model

There was an issue with vocab, seems like there are special characters included and they were not considered during training
You could try

from transformers import AutoProcessor, AutoModelForCTC

repo_name = "eddiegulay/wav2vec2-large-xlsr-mvc-swahili"
processor = AutoProcessor.from_pretrained(repo_name)
model = AutoModelForCTC.from_pretrained(repo_name)

# if you have GPU
# move model to CUDA
model = model.to("cuda")


def transcribe(audio_path):
  # Load the audio file
  audio_input, sample_rate = torchaudio.load(audio_path)
  target_sample_rate = 16000
  audio_input = torchaudio.transforms.Resample(orig_freq=sample_rate, new_freq=target_sample_rate)(audio_input)

  # Preprocess the audio data
  input_dict = processor(audio_input[0], return_tensors="pt", padding=True, sampling_rate=16000)

  # Perform inference and transcribe
  logits = model(input_dict.input_values.to("cuda")).logits
  pred_ids = torch.argmax(logits, dim=-1)[0]
  transcription = processor.decode(pred_ids)

  return transcription

transcript = transcribe('your_audio.mp3')

Deploy Model on Runcrate

Run this model on powerful GPU infrastructure. Deploy in 60 seconds.

Pay per second

H100, A100, RTX GPUs

Instant deployment

DEPLOY IN 60 SECONDS

Run wav2vec2-large-xlsr-mvc-swahili on Runcrate

Deploy on H100, A100, or RTX GPUs. Pay only for what you use. No setup required.