| --- |
| title: Audio Visual Transcription |
| app_file: app.py |
| sdk: gradio |
| sdk_version: 5.1.0 |
| license: apache-2.0 |
| emoji: π |
| colorFrom: blue |
| colorTo: purple |
| short_description: Get your synchronized subtitled video in minutes with AI. |
| --- |
| # AudioVisualTranscription |
|
|
| [](https://huggingface.co/spaces/nelikCode/AudioVisualTranscription) |
|
|
| Get your synchronized subtitled video in minutes with AI! |
|
|
|  |
|
|
| ## π Overview |
|
|
| **AVT** is a tool that allows you to precisely subtitle your audio or video |
| content in minutes, using the power of AI. |
|
|
| Whether you need subtitles for accessibility, language learning, or just to make |
| your content more engaging, this app has got you covered. Simply upload your audio |
| or video file, select the language, and let the magic happen. |
|
|
| ## β¨ Features |
|
|
| - **Easy-to-use Interface**: Powered by [Gradio](https://gradio.app) for an |
| intuitive user experience. |
| - **Multi-Language Support**: Supports transcription in multiple languages: |
| English, Spanish, French, German, Italian, Dutch, Russian, Norwegian, Chinese, |
| and more. |
| - **Video Playback**: View your subtitled video directly in the web app. |
| - **Download Subtitles**: Save generated subtitle files for use with your preferred |
| video player. |
|
|
| ## π Quickstart |
|
|
| The easiest way to use **AVT** is through this |
| [Hugging Face Space](https://huggingface.co/spaces/nelikCode/AudioVisualTranscription). |
|
|
| To use it locally, follow the steps below. |
|
|
| ### Installation |
|
|
| Follow these steps to set up the application on your local machine. |
|
|
| 1. **Clone the repository**: |
|
|
| ```bash |
| git clone https://github.com/killian31/AudioVisualTranscription |
| cd AudioVisualTranscription |
| ``` |
| |
| 2. **Create a Python environment** using pyenv: |
|
|
| ```bash |
| pyenv virtualenv 3.11.9 avt |
| pyenv activate avt |
| ``` |
| |
| 3. **Install Poetry**: |
|
|
| ```bash |
| pip install poetry |
| ``` |
| |
| 4. **Install dependencies**: |
|
|
| ```bash |
| poetry install |
| ``` |
| |
| 5. **Install system-level dependencies**: |
| - **MacOS**: Run the following script to install FFmpeg and ImageMagick. |
|
|
| ```bash |
| bash ./install_macos.sh |
| ``` |
| |
| - **Debian/Ubuntu**: Run the following commands to install FFmpeg and ImageMagick. |
|
|
| ```bash |
| chmod +x install_linux.sh |
| ./install_linux.sh |
| ``` |
| |
| ### Running the App |
|
|
| To launch the Gradio app: |
|
|
| ```bash |
| python app.py |
| ``` |
|
|
| After launching, navigate to the provided local URL to interact with the |
| application in your browser. |
|
|
| ## π How It Works |
|
|
| 1. **Upload Your Content**: Use the provided options to upload an audio file |
| **or** a video file. Select the file type accordingly in the dropdown menu |
| (Video, Audio). |
| 2. **Select Your Preferences**: Choose the language of transcription and any |
| delay settings you prefer. |
| 3. **Generate Subtitles**: Click on the βGenerate Subtitled Videoβ button to |
| process your input. |
| 4. **Download or View**: View the subtitled video directly on the web interface |
| or download the SRT subtitle file for later use. You need to generate the |
| subtitles before being able to ckick on the download button. |
|
|
| ## π Requirements |
|
|
| The app relies on the following system-level dependencies: |
|
|
| - **[FFmpeg](https://ffmpeg.org/)**: Required for handling video and audio. |
| - **[ImageMagick](https://imagemagick.org/)**: Required for video processing. |
|
|
| Please ensure these are installed using the provided scripts before running the app. |
|
|
| ## π Technologies Used |
|
|
| - **Gradio**: Provides the web interface for easy interaction. |
| - **Whisper by OpenAI**: Performs speech recognition. |
|
|
| ## π€ Contributing |
|
|
| Contributions are welcome! If you'd like to improve the app or add new features, |
| feel free to fork the repository and open a pull request. Please format your code |
| with `black`. |
|
|
| ## π License |
|
|
| This project is open source and available under the [Apache 2.0 License](LICENSE). |
|
|
| ## βοΈ Contact |
|
|
| If you have any questions, feel free to |
| [open an issue](https://github.com/killian31/AudioVisualTranscription/issues/new). |