| |
|
| | --- |
| | tags: |
| | - bertopic |
| | library_name: bertopic |
| | --- |
| | |
| | # BERTopic_Multimodal |
| | |
| | This is a [BERTopic](https://github.com/MaartenGr/BERTopic) model. |
| | BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets. |
| | |
| | This model was trained on 8000 images from Flickr **without** the captions. This demonstrates how BERTopic can be used for topic modeling using images as input only. |
| | |
| | A few examples of generated topics: |
| | |
| |  |
| | |
| | ## Usage |
| | |
| | To use this model, please install BERTopic: |
| | |
| | ``` |
| | pip install -U bertopic[vision] |
| | pip install -U safetensors |
| | ``` |
| | |
| | You can use the model as follows: |
| | |
| | ```python |
| | from bertopic import BERTopic |
| | topic_model = BERTopic.load("MaartenGr/BERTopic_Multimodal") |
| | |
| | topic_model.get_topic_info() |
| | ``` |
| | |
| | You can view all information about a topic as follows: |
| | |
| | ```python |
| | topic_model.get_topic(topic_id, full=True) |
| | ``` |
| | |
| | ## Topic overview |
| | |
| | * Number of topics: 29 |
| | * Number of training documents: 8091 |
| | |
| | <details> |
| | <summary>Click here for an overview of all topics.</summary> |
| | |
| | | Topic ID | Topic Keywords | Topic Frequency | Label | |
| | |----------|----------------|-----------------|-------| |
| | | -1 | while - air - the - in - jumping | 34 | -1_while_air_the_in | |
| | | 0 | bench - sitting - people - woman - street | 1132 | 0_bench_sitting_people_woman | |
| | | 1 | grass - running - dog - grassy - field | 1693 | 1_grass_running_dog_grassy | |
| | | 2 | boy - girl - little - young - holding | 1290 | 2_boy_girl_little_young | |
| | | 3 | dog - frisbee - running - water - mouth | 1224 | 3_dog_frisbee_running_water | |
| | | 4 | skateboard - ramp - doing - trick - cement | 415 | 4_skateboard_ramp_doing_trick | |
| | | 5 | snow - dog - covered - running - through | 309 | 5_snow_dog_covered_running | |
| | | 6 | mountain - range - slope - standing - person | 205 | 6_mountain_range_slope_standing | |
| | | 7 | pool - blue - boy - toy - water | 189 | 7_pool_blue_boy_toy | |
| | | 8 | trail - bike - down - riding - person | 166 | 8_trail_bike_down_riding | |
| | | 9 | snowboarder - mid - jump - air - after | 126 | 9_snowboarder_mid_jump_air | |
| | | 10 | rock - climbing - up - wall - tree | 124 | 10_rock_climbing_up_wall | |
| | | 11 | wave - surfboard - top - riding - of | 112 | 11_wave_surfboard_top_riding | |
| | | 12 | beach - surfboard - people - with - walking | 102 | 12_beach_surfboard_people_with | |
| | | 13 | jumping - track - horse - racquet - dog | 98 | 13_jumping_track_horse_racquet | |
| | | 14 | snowboard - snow - girl - hill - slope | 95 | 14_snowboard_snow_girl_hill | |
| | | 15 | game - being - football - played - professional | 91 | 15_game_being_football_played | |
| | | 16 | soccer - kicking - team - ball - player | 80 | 16_soccer_kicking_team_ball | |
| | | 17 | dirt - bike - person - rider - going | 75 | 17_dirt_bike_person_rider | |
| | | 18 | soccer - boys - field - ball - kicking | 69 | 18_soccer_boys_field_ball | |
| | | 19 | baseball - player - bat - swinging - into | 63 | 19_baseball_player_bat_swinging | |
| | | 20 | basketball - up - and - playing - jumping | 59 | 20_basketball_up_and_playing | |
| | | 21 | bird - body - flying - over - long | 55 | 21_bird_body_flying_over | |
| | | 22 | motorcycle - track - race - racer - racing | 55 | 22_motorcycle_track_race_racer | |
| | | 23 | boat - sitting - water - lake - hose | 53 | 23_boat_sitting_water_lake | |
| | | 24 | street - riding - down - bike - woman | 52 | 24_street_riding_down_bike | |
| | | 25 | paddle - suit - paddling - water - in | 49 | 25_paddle_suit_paddling_water | |
| | | 26 | pair - scissors - stage - white - shirt | 42 | 26_pair_scissors_stage_white | |
| | | 27 | tennis - court - racket - racquet - swinging | 34 | 27_tennis_court_racket_racquet | |
| | |
| | </details> |
| | |
| | ## Training Procedure |
| | |
| | The data was retrieved as follows: |
| | |
| | ```python |
| | import os |
| | import glob |
| | import zipfile |
| | import numpy as np |
| | import pandas as pd |
| | from tqdm import tqdm |
| | from sentence_transformers import util |
| |
|
| | # Flickr 8k images |
| | img_folder = 'photos/' |
| | caps_folder = 'captions/' |
| | if not os.path.exists(img_folder) or len(os.listdir(img_folder)) == 0: |
| | os.makedirs(img_folder, exist_ok=True) |
| | |
| | if not os.path.exists('Flickr8k_Dataset.zip'): #Download dataset if does not exist |
| | util.http_get('https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_Dataset.zip', 'Flickr8k_Dataset.zip') |
| | util.http_get('https://github.com/jbrownlee/Datasets/releases/download/Flickr8k/Flickr8k_text.zip', 'Flickr8k_text.zip') |
| | |
| | for folder, file in [(img_folder, 'Flickr8k_Dataset.zip'), (caps_folder, 'Flickr8k_text.zip')]: |
| | with zipfile.ZipFile(file, 'r') as zf: |
| | for member in tqdm(zf.infolist(), desc='Extracting'): |
| | zf.extract(member, folder) |
| | images = list(glob.glob('photos/Flicker8k_Dataset/*.jpg')) |
| | ``` |
| | |
| | Then, to perform topic modeling on multimodal data with BERTopic: |
| |
|
| | ```python |
| | from bertopic import BERTopic |
| | from bertopic.backend import MultiModalBackend |
| | from bertopic.representation import VisualRepresentation, KeyBERTInspired |
| | |
| | # Image embedding model |
| | embedding_model = MultiModalBackend('clip-ViT-B-32', batch_size=32) |
| | |
| | # Image to text representation model |
| | representation_model = { |
| | "Visual_Aspect": VisualRepresentation(image_to_text_model="nlpconnect/vit-gpt2-image-captioning", image_squares=True), |
| | "KeyBERT": KeyBERTInspired() |
| | } |
| | |
| | # Train our model with images only |
| | topic_model = BERTopic(representation_model=representation_model, verbose=True, embedding_model=embedding_model, min_topic_size=30) |
| | topics, probs = topic_model.fit_transform(documents=None, images=images) |
| | ``` |
| |
|
| | The above demonstrates that the input were only images. These images are clustered and from those clusters a small subset of representative images are extracted. The representative images are captioned using `"nlpconnect/vit-gpt2-image-captioning"` to generate a small textual dataset over which we can run c-TF-IDF and the additional |
| | `KeyBERTInspired` representation model. |
| |
|
| | ## Training hyperparameters |
| |
|
| | * calculate_probabilities: False |
| | * language: None |
| | * low_memory: False |
| | * min_topic_size: 30 |
| | * n_gram_range: (1, 1) |
| | * nr_topics: None |
| | * seed_topic_list: None |
| | * top_n_words: 10 |
| | * verbose: True |
| | |
| | ## Framework versions |
| | |
| | * Numpy: 1.23.5 |
| | * HDBSCAN: 0.8.29 |
| | * UMAP: 0.5.3 |
| | * Pandas: 1.5.3 |
| | * Scikit-Learn: 1.2.2 |
| | * Sentence-transformers: 2.2.2 |
| | * Transformers: 4.29.2 |
| | * Numba: 0.56.4 |
| | * Plotly: 5.14.1 |
| | * Python: 3.10.10 |
| | |