Instructions to use giobin/IDEFICS_frozenlake_rocket_trained with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use giobin/IDEFICS_frozenlake_rocket_trained with PEFT:
Task type is invalid.
- Notebooks
- Google Colab
- Kaggle
| library_name: peft | |
| base_model: HuggingFaceM4/idefics-9b-instruct | |
| # Model Card for Model ID | |
| This is a IDEFICS 9B model trained with ppo on the frozenlake env. | |
| ## Model Details | |
| ### Trainer Hyperparameters | |
| suppress_warnings: True | |
| debug: True | |
| seed: 9812 | |
| reseed_env: True | |
| torch_deterministic: True | |
| track: True | |
| wandb_project_name: "frozenlake_idefics" | |
| wandb_entity: null #'rl-team-unito' | |
| wandb_log_dir: "${now:%Y-%m-%d_%H-%M-%S}" | |
| save_video: True | |
| save_video_every: 20 | |
| save_stats: True | |
| save_episode: False | |
| env_size: 244 | |
| env_area: 8 | |
| num_prompt_images: 1 | |
| use_text_description: True | |
| # Algorithm specific arguments | |
| model: "HuggingFaceM4/idefics-9b-instruct" | |
| model_ckpt: null | |
| lora_adapter_path: null | |
| is_slippery: False | |
| fixed_orientation: True | |
| no_step_description: False | |
| first_person: True | |
| fov: 1 | |
| total_timesteps: 400000 | |
| disable_training: False | |
| from_accelerate_savestate_to_checkpoint: False | |
| learning_rate: 1e-5 | |
| critic_learning_rate: 1e-5 | |
| local_num_envs: 4 | |
| num_steps: 128 | |
| anneal_lr: False | |
| gamma: 0.99 | |
| gae_lambda: 0.95 | |
| num_minibatches: 128 | |
| update_epochs: 1 | |
| norm_adv: True | |
| clip_coef: 0.1 | |
| clip_vloss: True | |
| ent_coef: 0.01 #0.01 | |
| vf_coef: 0.5 | |
| max_grad_norm: 0.5 | |
| target_kl: null | |
| save_every: 50 | |
| gradient_accumulation: 4 | |
| adam_epsilon: 1e-8 | |
| gradient_ckpt: False | |
| lora: True | |
| temperature: 'max_logit' | |
| disable_adapters_for_generation: True | |
| normalization_by_words: False | |
| action_logits_from_whole_seq: True | |
| advanced_action_matching: False | |
| env_id: "FrozenLakeText-v0" # MiniGrid-LavaGapS7-v0 | |
| generate_actions: False | |
| value_prompt_template: "I am the agent in this minigrid world. {} Avoid the traps!\nWhat's the next best action?" | |
| action_template: " Based on the information provided, the next best action would be to {}" | |
| possible_actions_list: "forward pickup toggle opt_left opt_right opt_back" | |
| ### Model Sources [optional] | |
| <!-- Provide the basic links for the model. --> | |
| - **Repository:** [More Information Needed] | |
| - **Paper [optional]:** [More Information Needed] | |
| - **Demo [optional]:** [More Information Needed] | |
| ## Uses | |
| <!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. --> | |
| ### Direct Use | |
| <!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. --> | |
| [More Information Needed] | |
| ### Downstream Use [optional] | |
| <!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app --> | |
| [More Information Needed] | |
| ### Out-of-Scope Use | |
| <!-- This section addresses misuse, malicious use, and uses that the model will not work well for. --> | |
| [More Information Needed] | |
| ## Bias, Risks, and Limitations | |
| <!-- This section is meant to convey both technical and sociotechnical limitations. --> | |
| [More Information Needed] | |
| ### Recommendations | |
| <!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. --> | |
| Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations. | |
| ## How to Get Started with the Model | |
| Use the code below to get started with the model. | |
| [More Information Needed] | |
| ## Training Details | |
| ### Training Data | |
| <!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. --> | |
| [More Information Needed] | |
| ### Training Procedure | |
| <!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. --> | |
| #### Preprocessing [optional] | |
| [More Information Needed] | |
| #### Training Hyperparameters | |
| - **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision --> | |
| #### Speeds, Sizes, Times [optional] | |
| <!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. --> | |
| [More Information Needed] | |
| ## Evaluation | |
| <!-- This section describes the evaluation protocols and provides the results. --> | |
| ### Testing Data, Factors & Metrics | |
| #### Testing Data | |
| <!-- This should link to a Dataset Card if possible. --> | |
| [More Information Needed] | |
| #### Factors | |
| <!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. --> | |
| [More Information Needed] | |
| #### Metrics | |
| <!-- These are the evaluation metrics being used, ideally with a description of why. --> | |
| [More Information Needed] | |
| ### Results | |
| [More Information Needed] | |
| #### Summary | |
| ## Model Examination [optional] | |
| <!-- Relevant interpretability work for the model goes here --> | |
| [More Information Needed] | |
| ## Environmental Impact | |
| <!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly --> | |
| Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). | |
| - **Hardware Type:** [More Information Needed] | |
| - **Hours used:** [More Information Needed] | |
| - **Cloud Provider:** [More Information Needed] | |
| - **Compute Region:** [More Information Needed] | |
| - **Carbon Emitted:** [More Information Needed] | |
| ## Technical Specifications [optional] | |
| ### Model Architecture and Objective | |
| [More Information Needed] | |
| ### Compute Infrastructure | |
| [More Information Needed] | |
| #### Hardware | |
| [More Information Needed] | |
| #### Software | |
| [More Information Needed] | |
| ## Citation [optional] | |
| <!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. --> | |
| **BibTeX:** | |
| [More Information Needed] | |
| **APA:** | |
| [More Information Needed] | |
| ## Glossary [optional] | |
| <!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. --> | |
| [More Information Needed] | |
| ## More Information [optional] | |
| [More Information Needed] | |
| ## Model Card Authors [optional] | |
| [More Information Needed] | |
| ## Model Card Contact | |
| [More Information Needed] | |
| ### Framework versions | |
| - PEFT 0.10.0 |