Skip to content

Secret350/Machine-Speech-Translation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Machine-Speech-Translation End-to-End (EN -> VI | VI -> EN)


A Speech to Speech and Text to Text Machine Translation system using FasterWhisper for ASR, custom Transformer model for translation, and edge-tts to generate output speech. Demo Preview


--- Feature ---

  • Automatic Speech Recognition: FasterWhisper (Large-v3 int8)
  • Translation: Custom trained Transformer model (English <-> Vietnamese)
  • Text to Speech: Edge-TTS for natural output sound
  • Human-in-the-loop: User are enable to check, fix, and confirm the input and output of translation model
  • Latency: Optimize for acceptable latency (2-4s for 1 sentence)

---Demo Video---

  • In this video my Custom Transformer Model will translate a short basic conversation between a traveler and receptionist :
    • Video Demo

---Architecture---

  • System architechture:
    • System Architecture
  • Custom transformer model architecture:
    • Custom Transformer Architecture

--- Installation ---

  1. Clone the repository:
    git clone [https://github.com/Secret350/Machine-Speech-Translation.git](https://github.com/Secret350/Machine-Speech-Translation.git)
    cd Machine Speech Translation
  2. Install dependencies:
    pip install -r requirements.txt
  3. Install NVIDIA Libraries (for GPU support):
    pip install nvidia-cublas-cu12 nvidia-cudnn-cu12

--- How to run program ---

  • Use pre-trained model:

    • First: Install weight of pre-trained model "ModelCheckpoints" by the link below then extract and place that directory in Build_model
    https://drive.google.com/file/d/1D0G29vXtSGe2wyjwIKx8lJDUWhNtzsKT/view?usp=drive_link
    
    • To run Text-to-Text (EN > VI) program
    cd Build_model/System_and_Evaluate
    python inference.py
    • To run Text-to-Text (VI > EN) program
    cd Build_model/System_and_Evaluate
    python inferencevien.py
    • To run Speech-to-Speech (EN <> VI) program
    cd Build_model
    python s2s.py

--- Evaluation ---

  • BLEU Score (EN > VI): 22.42
  • BLEU Score (VI > EN): 21.76
  • BLEU Score (S2S) (EN > VI): 8.29
  • BLEU Score (S2S) (VI > EN): 21.5

---Limitnations and Future Improvements---

  • Limitnations:
    • Model struggles with difficult/uncommon word, proper noun or too short sentences.
    • The dataset lacks of many realife words.
  • Future Improvements:
    • Intergrating Named Entity Recognition module to solve the proper noun issue
    • Retrain model with larger and more focused dataset would improve model's performance

NOTE You need to download "ffmpeg.exe" and place it in the "Build_model" folder.

About

This project use public dataset from OPUS <OpenSubtitles v2024> and transformer model to translate English speech to Vietnamese and Vietnamese to English title

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages