Alignment Science
non-profit
AI & ML interests
None defined yet.
Recent Activity
View all activity
models 43
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defend_objects
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defend_objects
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_hallucinates_citations
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_defer_to_users
Updated
alignment-science/llama_70b_transcripts_only_then_redteam_kto_then_dpo_hh_trained_anti_ai_regulation
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_animal_welfare
Updated
alignment-science/llama_70b_synth_docs_only_then_redteam_kto_then_dpo_hh_trained_secret_loyalty
Updated
datasets 6
alignment-science/anthropic-hh-golden-dpo-prism
Viewer
• Updated
• 42.5k • 17
alignment-science/anthropic-hh-golden-dpo
Viewer
• Updated
• 42.5k • 12
alignment-science/prism-base-sft-dataset-no-system-prompt
Viewer
• Updated
• 5.12k • 10
alignment-science/prism-base-sft-dataset
Viewer
• Updated
• 5.12k • 52
alignment-science/prism-ia-sft-dataset
Viewer
• Updated
• 4.83k • 24
alignment-science/ihy-sft-dataset
Viewer
• Updated
• 10k • 27