Instructions to use EleutherAI/less-replication-7b-warmup with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PEFT
How to use EleutherAI/less-replication-7b-warmup with PEFT:
from peft import PeftModel from transformers import AutoModelForCausalLM base_model = AutoModelForCausalLM.from_pretrained("meta-llama/Llama-2-7b-hf") model = PeftModel.from_pretrained(base_model, "EleutherAI/less-replication-7b-warmup") - Transformers
How to use EleutherAI/less-replication-7b-warmup with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="EleutherAI/less-replication-7b-warmup") messages = [ {"role": "user", "content": "Who are you?"}, ] pipe(messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("EleutherAI/less-replication-7b-warmup", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use EleutherAI/less-replication-7b-warmup with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "EleutherAI/less-replication-7b-warmup" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EleutherAI/less-replication-7b-warmup", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker
docker model run hf.co/EleutherAI/less-replication-7b-warmup
- SGLang
How to use EleutherAI/less-replication-7b-warmup with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "EleutherAI/less-replication-7b-warmup" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EleutherAI/less-replication-7b-warmup", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "EleutherAI/less-replication-7b-warmup" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "EleutherAI/less-replication-7b-warmup", "messages": [ { "role": "user", "content": "What is the capital of France?" } ] }' - Docker Model Runner
How to use EleutherAI/less-replication-7b-warmup with Docker Model Runner:
docker model run hf.co/EleutherAI/less-replication-7b-warmup
| { | |
| "best_global_step": null, | |
| "best_metric": null, | |
| "best_model_checkpoint": null, | |
| "epoch": 4.0, | |
| "eval_steps": 500, | |
| "global_step": 424, | |
| "is_hyper_param_search": false, | |
| "is_local_process_zero": true, | |
| "is_world_process_zero": true, | |
| "log_history": [ | |
| { | |
| "entropy": 1.7230634093284607, | |
| "epoch": 0.009478672985781991, | |
| "grad_norm": 1.5876940488815308, | |
| "learning_rate": 0.0, | |
| "loss": 1.6636556386947632, | |
| "mean_token_accuracy": 0.6497756391763687, | |
| "num_tokens": 16375.0, | |
| "step": 1 | |
| }, | |
| { | |
| "entropy": 1.7448452711105347, | |
| "epoch": 0.018957345971563982, | |
| "grad_norm": 1.609075665473938, | |
| "learning_rate": 9.090909090909091e-07, | |
| "loss": 1.7037649154663086, | |
| "mean_token_accuracy": 0.6527450829744339, | |
| "num_tokens": 31738.0, | |
| "step": 2 | |
| }, | |
| { | |
| "entropy": 1.710326075553894, | |
| "epoch": 0.02843601895734597, | |
| "grad_norm": 1.7256412506103516, | |
| "learning_rate": 1.8181818181818183e-06, | |
| "loss": 1.663683533668518, | |
| "mean_token_accuracy": 0.6594014763832092, | |
| "num_tokens": 48323.0, | |
| "step": 3 | |
| }, | |
| { | |
| "entropy": 1.8404302299022675, | |
| "epoch": 0.037914691943127965, | |
| "grad_norm": 1.7094064950942993, | |
| "learning_rate": 2.7272727272727272e-06, | |
| "loss": 1.8555431365966797, | |
| "mean_token_accuracy": 0.6288548111915588, | |
| "num_tokens": 63474.0, | |
| "step": 4 | |
| }, | |
| { | |
| "entropy": 1.61773881316185, | |
| "epoch": 0.04739336492890995, | |
| "grad_norm": 1.4093204736709595, | |
| "learning_rate": 3.6363636363636366e-06, | |
| "loss": 1.586500883102417, | |
| "mean_token_accuracy": 0.6745017617940903, | |
| "num_tokens": 81340.0, | |
| "step": 5 | |
| }, | |
| { | |
| "entropy": 1.713468849658966, | |
| "epoch": 0.05687203791469194, | |
| "grad_norm": 1.4611302614212036, | |
| "learning_rate": 4.5454545454545455e-06, | |
| "loss": 1.693520188331604, | |
| "mean_token_accuracy": 0.6446037888526917, | |
| "num_tokens": 98224.0, | |
| "step": 6 | |
| }, | |
| { | |
| "entropy": 1.7964849770069122, | |
| "epoch": 0.06635071090047394, | |
| "grad_norm": 2.0188069343566895, | |
| "learning_rate": 5.4545454545454545e-06, | |
| "loss": 1.7727513313293457, | |
| "mean_token_accuracy": 0.637175589799881, | |
| "num_tokens": 113456.0, | |
| "step": 7 | |
| }, | |
| { | |
| "entropy": 1.8277963697910309, | |
| "epoch": 0.07582938388625593, | |
| "grad_norm": 2.102283000946045, | |
| "learning_rate": 6.363636363636364e-06, | |
| "loss": 1.74727463722229, | |
| "mean_token_accuracy": 0.6403038948774338, | |
| "num_tokens": 129166.0, | |
| "step": 8 | |
| }, | |
| { | |
| "entropy": 1.7185819745063782, | |
| "epoch": 0.08530805687203792, | |
| "grad_norm": 1.5833488702774048, | |
| "learning_rate": 7.272727272727273e-06, | |
| "loss": 1.653383731842041, | |
| "mean_token_accuracy": 0.6527138650417328, | |
| "num_tokens": 145103.0, | |
| "step": 9 | |
| }, | |
| { | |
| "entropy": 1.7715328633785248, | |
| "epoch": 0.0947867298578199, | |
| "grad_norm": 1.4368863105773926, | |
| "learning_rate": 8.181818181818183e-06, | |
| "loss": 1.6846437454223633, | |
| "mean_token_accuracy": 0.6343469023704529, | |
| "num_tokens": 160889.0, | |
| "step": 10 | |
| }, | |
| { | |
| "entropy": 1.6594811379909515, | |
| "epoch": 0.10426540284360189, | |
| "grad_norm": 1.1473579406738281, | |
| "learning_rate": 9.090909090909091e-06, | |
| "loss": 1.5402858257293701, | |
| "mean_token_accuracy": 0.663172259926796, | |
| "num_tokens": 177508.0, | |
| "step": 11 | |
| }, | |
| { | |
| "entropy": 1.6768087446689606, | |
| "epoch": 0.11374407582938388, | |
| "grad_norm": 1.0232007503509521, | |
| "learning_rate": 1e-05, | |
| "loss": 1.5721584558486938, | |
| "mean_token_accuracy": 0.6572738438844681, | |
| "num_tokens": 194325.0, | |
| "step": 12 | |
| }, | |
| { | |
| "entropy": 1.7810584902763367, | |
| "epoch": 0.12322274881516587, | |
| "grad_norm": 1.0588256120681763, | |
| "learning_rate": 1.0909090909090909e-05, | |
| "loss": 1.6563284397125244, | |
| "mean_token_accuracy": 0.6360882222652435, | |
| "num_tokens": 209935.0, | |
| "step": 13 | |
| }, | |
| { | |
| "entropy": 1.6035055816173553, | |
| "epoch": 0.13270142180094788, | |
| "grad_norm": 0.9055944681167603, | |
| "learning_rate": 1.181818181818182e-05, | |
| "loss": 1.4942286014556885, | |
| "mean_token_accuracy": 0.6743725687265396, | |
| "num_tokens": 226137.0, | |
| "step": 14 | |
| }, | |
| { | |
| "entropy": 1.5955874025821686, | |
| "epoch": 0.14218009478672985, | |
| "grad_norm": 0.8588143587112427, | |
| "learning_rate": 1.2727272727272728e-05, | |
| "loss": 1.4966249465942383, | |
| "mean_token_accuracy": 0.66853067278862, | |
| "num_tokens": 241952.0, | |
| "step": 15 | |
| }, | |
| { | |
| "entropy": 1.607325166463852, | |
| "epoch": 0.15165876777251186, | |
| "grad_norm": 0.9346417784690857, | |
| "learning_rate": 1.3636363636363637e-05, | |
| "loss": 1.4750392436981201, | |
| "mean_token_accuracy": 0.660357192158699, | |
| "num_tokens": 257481.0, | |
| "step": 16 | |
| }, | |
| { | |
| "entropy": 1.5410204827785492, | |
| "epoch": 0.16113744075829384, | |
| "grad_norm": 0.8730961680412292, | |
| "learning_rate": 1.4545454545454546e-05, | |
| "loss": 1.4322866201400757, | |
| "mean_token_accuracy": 0.6814255267381668, | |
| "num_tokens": 273287.0, | |
| "step": 17 | |
| }, | |
| { | |
| "entropy": 1.5345198810100555, | |
| "epoch": 0.17061611374407584, | |
| "grad_norm": 0.8538835048675537, | |
| "learning_rate": 1.5454545454545454e-05, | |
| "loss": 1.3966718912124634, | |
| "mean_token_accuracy": 0.6867176294326782, | |
| "num_tokens": 288809.0, | |
| "step": 18 | |
| }, | |
| { | |
| "entropy": 1.4507653713226318, | |
| "epoch": 0.18009478672985782, | |
| "grad_norm": 0.7879626154899597, | |
| "learning_rate": 1.6363636363636366e-05, | |
| "loss": 1.3630776405334473, | |
| "mean_token_accuracy": 0.6779211610555649, | |
| "num_tokens": 304801.0, | |
| "step": 19 | |
| }, | |
| { | |
| "entropy": 1.4710645079612732, | |
| "epoch": 0.1895734597156398, | |
| "grad_norm": 0.8410776257514954, | |
| "learning_rate": 1.7272727272727274e-05, | |
| "loss": 1.4051768779754639, | |
| "mean_token_accuracy": 0.6760597079992294, | |
| "num_tokens": 320246.0, | |
| "step": 20 | |
| }, | |
| { | |
| "entropy": 1.3435666263103485, | |
| "epoch": 0.1990521327014218, | |
| "grad_norm": 0.759080708026886, | |
| "learning_rate": 1.8181818181818182e-05, | |
| "loss": 1.267665147781372, | |
| "mean_token_accuracy": 0.7003267407417297, | |
| "num_tokens": 337053.0, | |
| "step": 21 | |
| }, | |
| { | |
| "entropy": 1.394632339477539, | |
| "epoch": 0.20853080568720378, | |
| "grad_norm": 0.8370321989059448, | |
| "learning_rate": 1.9090909090909094e-05, | |
| "loss": 1.3486745357513428, | |
| "mean_token_accuracy": 0.6893624067306519, | |
| "num_tokens": 352268.0, | |
| "step": 22 | |
| }, | |
| { | |
| "entropy": 1.1503786146640778, | |
| "epoch": 0.21800947867298578, | |
| "grad_norm": 0.7440245747566223, | |
| "learning_rate": 2e-05, | |
| "loss": 1.1109555959701538, | |
| "mean_token_accuracy": 0.7320543378591537, | |
| "num_tokens": 370034.0, | |
| "step": 23 | |
| }, | |
| { | |
| "entropy": 1.1959196627140045, | |
| "epoch": 0.22748815165876776, | |
| "grad_norm": 0.8951184749603271, | |
| "learning_rate": 1.9999694637689328e-05, | |
| "loss": 1.170793056488037, | |
| "mean_token_accuracy": 0.7196634411811829, | |
| "num_tokens": 385616.0, | |
| "step": 24 | |
| }, | |
| { | |
| "entropy": 1.1576418280601501, | |
| "epoch": 0.23696682464454977, | |
| "grad_norm": 0.8570479154586792, | |
| "learning_rate": 1.999877856940653e-05, | |
| "loss": 1.1886144876480103, | |
| "mean_token_accuracy": 0.7251218110322952, | |
| "num_tokens": 401820.0, | |
| "step": 25 | |
| }, | |
| { | |
| "entropy": 1.1380924582481384, | |
| "epoch": 0.24644549763033174, | |
| "grad_norm": 0.8543166518211365, | |
| "learning_rate": 1.999725185109816e-05, | |
| "loss": 1.2082655429840088, | |
| "mean_token_accuracy": 0.711901068687439, | |
| "num_tokens": 417425.0, | |
| "step": 26 | |
| }, | |
| { | |
| "entropy": 0.9601393789052963, | |
| "epoch": 0.2559241706161137, | |
| "grad_norm": 0.863011360168457, | |
| "learning_rate": 1.999511457600466e-05, | |
| "loss": 1.0240533351898193, | |
| "mean_token_accuracy": 0.7590129524469376, | |
| "num_tokens": 433582.0, | |
| "step": 27 | |
| }, | |
| { | |
| "entropy": 1.1091893315315247, | |
| "epoch": 0.26540284360189575, | |
| "grad_norm": 0.8494651317596436, | |
| "learning_rate": 1.9992366874654684e-05, | |
| "loss": 1.2075942754745483, | |
| "mean_token_accuracy": 0.7092257142066956, | |
| "num_tokens": 448489.0, | |
| "step": 28 | |
| }, | |
| { | |
| "entropy": 1.0761875212192535, | |
| "epoch": 0.27488151658767773, | |
| "grad_norm": 0.7775688767433167, | |
| "learning_rate": 1.9989008914857115e-05, | |
| "loss": 1.1314418315887451, | |
| "mean_token_accuracy": 0.7183841317892075, | |
| "num_tokens": 465063.0, | |
| "step": 29 | |
| }, | |
| { | |
| "entropy": 1.0514411181211472, | |
| "epoch": 0.2843601895734597, | |
| "grad_norm": 0.7758051753044128, | |
| "learning_rate": 1.998504090169083e-05, | |
| "loss": 1.1378644704818726, | |
| "mean_token_accuracy": 0.7295740246772766, | |
| "num_tokens": 481645.0, | |
| "step": 30 | |
| }, | |
| { | |
| "entropy": 1.0378742814064026, | |
| "epoch": 0.2938388625592417, | |
| "grad_norm": 0.7737782597541809, | |
| "learning_rate": 1.998046307749216e-05, | |
| "loss": 1.0405387878417969, | |
| "mean_token_accuracy": 0.7435884028673172, | |
| "num_tokens": 498020.0, | |
| "step": 31 | |
| }, | |
| { | |
| "entropy": 1.162422239780426, | |
| "epoch": 0.3033175355450237, | |
| "grad_norm": 0.8285279870033264, | |
| "learning_rate": 1.9975275721840105e-05, | |
| "loss": 1.2049440145492554, | |
| "mean_token_accuracy": 0.7072447687387466, | |
| "num_tokens": 514133.0, | |
| "step": 32 | |
| }, | |
| { | |
| "entropy": 1.0860552787780762, | |
| "epoch": 0.3127962085308057, | |
| "grad_norm": 0.7870988249778748, | |
| "learning_rate": 1.9969479151539238e-05, | |
| "loss": 1.088366985321045, | |
| "mean_token_accuracy": 0.7330214083194733, | |
| "num_tokens": 529902.0, | |
| "step": 33 | |
| }, | |
| { | |
| "entropy": 1.0763448625802994, | |
| "epoch": 0.3222748815165877, | |
| "grad_norm": 0.8604663610458374, | |
| "learning_rate": 1.9963073720600383e-05, | |
| "loss": 1.0562834739685059, | |
| "mean_token_accuracy": 0.7405408322811127, | |
| "num_tokens": 545365.0, | |
| "step": 34 | |
| }, | |
| { | |
| "entropy": 1.0641998052597046, | |
| "epoch": 0.33175355450236965, | |
| "grad_norm": 0.8490760922431946, | |
| "learning_rate": 1.9956059820218982e-05, | |
| "loss": 1.08937668800354, | |
| "mean_token_accuracy": 0.7371959090232849, | |
| "num_tokens": 560629.0, | |
| "step": 35 | |
| }, | |
| { | |
| "entropy": 1.024262085556984, | |
| "epoch": 0.3412322274881517, | |
| "grad_norm": 0.7459779381752014, | |
| "learning_rate": 1.99484378787512e-05, | |
| "loss": 1.0313022136688232, | |
| "mean_token_accuracy": 0.7482277005910873, | |
| "num_tokens": 575979.0, | |
| "step": 36 | |
| }, | |
| { | |
| "entropy": 1.0767400860786438, | |
| "epoch": 0.35071090047393366, | |
| "grad_norm": 0.6798542141914368, | |
| "learning_rate": 1.9940208361687762e-05, | |
| "loss": 1.1029484272003174, | |
| "mean_token_accuracy": 0.7400920391082764, | |
| "num_tokens": 593219.0, | |
| "step": 37 | |
| }, | |
| { | |
| "entropy": 1.0605872124433517, | |
| "epoch": 0.36018957345971564, | |
| "grad_norm": 0.775074303150177, | |
| "learning_rate": 1.9931371771625545e-05, | |
| "loss": 1.1065254211425781, | |
| "mean_token_accuracy": 0.7417906671762466, | |
| "num_tokens": 608173.0, | |
| "step": 38 | |
| }, | |
| { | |
| "entropy": 1.002692699432373, | |
| "epoch": 0.3696682464454976, | |
| "grad_norm": 0.7341852784156799, | |
| "learning_rate": 1.9921928648236855e-05, | |
| "loss": 1.013192892074585, | |
| "mean_token_accuracy": 0.7493702918291092, | |
| "num_tokens": 624455.0, | |
| "step": 39 | |
| }, | |
| { | |
| "entropy": 1.0264743566513062, | |
| "epoch": 0.3791469194312796, | |
| "grad_norm": 0.709384024143219, | |
| "learning_rate": 1.9911879568236492e-05, | |
| "loss": 1.050273060798645, | |
| "mean_token_accuracy": 0.7467086464166641, | |
| "num_tokens": 639997.0, | |
| "step": 40 | |
| }, | |
| { | |
| "entropy": 0.9731618463993073, | |
| "epoch": 0.3886255924170616, | |
| "grad_norm": 0.7397781014442444, | |
| "learning_rate": 1.990122514534651e-05, | |
| "loss": 0.9792512655258179, | |
| "mean_token_accuracy": 0.7615833282470703, | |
| "num_tokens": 655244.0, | |
| "step": 41 | |
| }, | |
| { | |
| "entropy": 0.9393151998519897, | |
| "epoch": 0.3981042654028436, | |
| "grad_norm": 0.603070080280304, | |
| "learning_rate": 1.9889966030258752e-05, | |
| "loss": 0.9558578729629517, | |
| "mean_token_accuracy": 0.7628461569547653, | |
| "num_tokens": 672691.0, | |
| "step": 42 | |
| }, | |
| { | |
| "entropy": 0.9829478412866592, | |
| "epoch": 0.4075829383886256, | |
| "grad_norm": 0.736235499382019, | |
| "learning_rate": 1.9878102910595097e-05, | |
| "loss": 0.978938639163971, | |
| "mean_token_accuracy": 0.7605015784502029, | |
| "num_tokens": 688436.0, | |
| "step": 43 | |
| }, | |
| { | |
| "entropy": 0.9505813121795654, | |
| "epoch": 0.41706161137440756, | |
| "grad_norm": 0.6724395751953125, | |
| "learning_rate": 1.9865636510865466e-05, | |
| "loss": 0.978853702545166, | |
| "mean_token_accuracy": 0.762572169303894, | |
| "num_tokens": 705439.0, | |
| "step": 44 | |
| }, | |
| { | |
| "entropy": 0.9644436687231064, | |
| "epoch": 0.4265402843601896, | |
| "grad_norm": 0.7388243079185486, | |
| "learning_rate": 1.985256759242359e-05, | |
| "loss": 1.0140184164047241, | |
| "mean_token_accuracy": 0.7558823525905609, | |
| "num_tokens": 721153.0, | |
| "step": 45 | |
| }, | |
| { | |
| "entropy": 0.9982628226280212, | |
| "epoch": 0.43601895734597157, | |
| "grad_norm": 0.8184957504272461, | |
| "learning_rate": 1.9838896953420495e-05, | |
| "loss": 1.0390454530715942, | |
| "mean_token_accuracy": 0.7469227313995361, | |
| "num_tokens": 736548.0, | |
| "step": 46 | |
| }, | |
| { | |
| "entropy": 0.9748822003602982, | |
| "epoch": 0.44549763033175355, | |
| "grad_norm": 0.7347403764724731, | |
| "learning_rate": 1.982462542875576e-05, | |
| "loss": 0.9972015619277954, | |
| "mean_token_accuracy": 0.7542161792516708, | |
| "num_tokens": 753213.0, | |
| "step": 47 | |
| }, | |
| { | |
| "entropy": 0.9839194715023041, | |
| "epoch": 0.4549763033175355, | |
| "grad_norm": 0.7843495607376099, | |
| "learning_rate": 1.9809753890026543e-05, | |
| "loss": 0.9987256526947021, | |
| "mean_token_accuracy": 0.7523878663778305, | |
| "num_tokens": 768748.0, | |
| "step": 48 | |
| }, | |
| { | |
| "entropy": 1.0193238854408264, | |
| "epoch": 0.46445497630331756, | |
| "grad_norm": 0.7180641889572144, | |
| "learning_rate": 1.979428324547432e-05, | |
| "loss": 1.0118625164031982, | |
| "mean_token_accuracy": 0.7445009052753448, | |
| "num_tokens": 783698.0, | |
| "step": 49 | |
| }, | |
| { | |
| "entropy": 0.9688207358121872, | |
| "epoch": 0.47393364928909953, | |
| "grad_norm": 0.7978101372718811, | |
| "learning_rate": 1.9778214439929453e-05, | |
| "loss": 0.9732460379600525, | |
| "mean_token_accuracy": 0.7674588412046432, | |
| "num_tokens": 799825.0, | |
| "step": 50 | |
| }, | |
| { | |
| "entropy": 1.000299409031868, | |
| "epoch": 0.4834123222748815, | |
| "grad_norm": 0.7578161358833313, | |
| "learning_rate": 1.9761548454753455e-05, | |
| "loss": 1.0048537254333496, | |
| "mean_token_accuracy": 0.7620798051357269, | |
| "num_tokens": 814971.0, | |
| "step": 51 | |
| }, | |
| { | |
| "entropy": 0.9922938197851181, | |
| "epoch": 0.4928909952606635, | |
| "grad_norm": 0.7843224406242371, | |
| "learning_rate": 1.9744286307779076e-05, | |
| "loss": 0.9955688714981079, | |
| "mean_token_accuracy": 0.7550618052482605, | |
| "num_tokens": 830524.0, | |
| "step": 52 | |
| }, | |
| { | |
| "entropy": 0.9540591835975647, | |
| "epoch": 0.5023696682464455, | |
| "grad_norm": 0.7341641783714294, | |
| "learning_rate": 1.972642905324813e-05, | |
| "loss": 0.9120223522186279, | |
| "mean_token_accuracy": 0.7745194137096405, | |
| "num_tokens": 845863.0, | |
| "step": 53 | |
| }, | |
| { | |
| "entropy": 0.9302634298801422, | |
| "epoch": 0.5118483412322274, | |
| "grad_norm": 0.7071745991706848, | |
| "learning_rate": 1.9707977781747126e-05, | |
| "loss": 0.8850457668304443, | |
| "mean_token_accuracy": 0.7787033319473267, | |
| "num_tokens": 861849.0, | |
| "step": 54 | |
| }, | |
| { | |
| "entropy": 0.9491895437240601, | |
| "epoch": 0.5213270142180095, | |
| "grad_norm": 0.826166033744812, | |
| "learning_rate": 1.9688933620140638e-05, | |
| "loss": 0.9278963804244995, | |
| "mean_token_accuracy": 0.7744901925325394, | |
| "num_tokens": 877161.0, | |
| "step": 55 | |
| }, | |
| { | |
| "entropy": 0.990895539522171, | |
| "epoch": 0.5308056872037915, | |
| "grad_norm": 0.94631028175354, | |
| "learning_rate": 1.966929773150251e-05, | |
| "loss": 0.9650328159332275, | |
| "mean_token_accuracy": 0.7605593204498291, | |
| "num_tokens": 892160.0, | |
| "step": 56 | |
| }, | |
| { | |
| "entropy": 0.9609559625387192, | |
| "epoch": 0.5402843601895735, | |
| "grad_norm": 0.6704272031784058, | |
| "learning_rate": 1.96490713150448e-05, | |
| "loss": 0.9601552486419678, | |
| "mean_token_accuracy": 0.7565764933824539, | |
| "num_tokens": 908720.0, | |
| "step": 57 | |
| }, | |
| { | |
| "entropy": 0.9724608361721039, | |
| "epoch": 0.5497630331753555, | |
| "grad_norm": 0.7987351417541504, | |
| "learning_rate": 1.9628255606044562e-05, | |
| "loss": 0.9856890439987183, | |
| "mean_token_accuracy": 0.7569890469312668, | |
| "num_tokens": 924429.0, | |
| "step": 58 | |
| }, | |
| { | |
| "entropy": 0.9434510767459869, | |
| "epoch": 0.5592417061611374, | |
| "grad_norm": 0.7899335622787476, | |
| "learning_rate": 1.9606851875768404e-05, | |
| "loss": 0.9372917413711548, | |
| "mean_token_accuracy": 0.7673724591732025, | |
| "num_tokens": 940816.0, | |
| "step": 59 | |
| }, | |
| { | |
| "entropy": 0.8971584439277649, | |
| "epoch": 0.5687203791469194, | |
| "grad_norm": 0.7808126211166382, | |
| "learning_rate": 1.9584861431394825e-05, | |
| "loss": 0.8875389099121094, | |
| "mean_token_accuracy": 0.7799597531557083, | |
| "num_tokens": 956724.0, | |
| "step": 60 | |
| }, | |
| { | |
| "entropy": 0.9393307864665985, | |
| "epoch": 0.5781990521327014, | |
| "grad_norm": 0.6976328492164612, | |
| "learning_rate": 1.956228561593441e-05, | |
| "loss": 0.9503510594367981, | |
| "mean_token_accuracy": 0.7596384882926941, | |
| "num_tokens": 973257.0, | |
| "step": 61 | |
| }, | |
| { | |
| "entropy": 1.021749958395958, | |
| "epoch": 0.5876777251184834, | |
| "grad_norm": 0.7356151342391968, | |
| "learning_rate": 1.953912580814779e-05, | |
| "loss": 1.0444157123565674, | |
| "mean_token_accuracy": 0.7519654631614685, | |
| "num_tokens": 990266.0, | |
| "step": 62 | |
| }, | |
| { | |
| "entropy": 0.8811668455600739, | |
| "epoch": 0.5971563981042654, | |
| "grad_norm": 0.8034309148788452, | |
| "learning_rate": 1.9515383422461457e-05, | |
| "loss": 0.888944685459137, | |
| "mean_token_accuracy": 0.77887162566185, | |
| "num_tokens": 1006065.0, | |
| "step": 63 | |
| }, | |
| { | |
| "entropy": 0.9695002734661102, | |
| "epoch": 0.6066350710900474, | |
| "grad_norm": 0.6834648251533508, | |
| "learning_rate": 1.949105990888135e-05, | |
| "loss": 0.9581116437911987, | |
| "mean_token_accuracy": 0.7590216845273972, | |
| "num_tokens": 1021785.0, | |
| "step": 64 | |
| }, | |
| { | |
| "entropy": 0.8993494361639023, | |
| "epoch": 0.6161137440758294, | |
| "grad_norm": 0.787246584892273, | |
| "learning_rate": 1.9466156752904344e-05, | |
| "loss": 0.874479353427887, | |
| "mean_token_accuracy": 0.7846818715333939, | |
| "num_tokens": 1037721.0, | |
| "step": 65 | |
| }, | |
| { | |
| "entropy": 0.9583495110273361, | |
| "epoch": 0.6255924170616114, | |
| "grad_norm": 0.7072925567626953, | |
| "learning_rate": 1.944067547542748e-05, | |
| "loss": 0.9677348136901855, | |
| "mean_token_accuracy": 0.7663833945989609, | |
| "num_tokens": 1054125.0, | |
| "step": 66 | |
| }, | |
| { | |
| "entropy": 0.926140233874321, | |
| "epoch": 0.6350710900473934, | |
| "grad_norm": 0.7638861536979675, | |
| "learning_rate": 1.9414617632655114e-05, | |
| "loss": 0.9518347382545471, | |
| "mean_token_accuracy": 0.7670052349567413, | |
| "num_tokens": 1070665.0, | |
| "step": 67 | |
| }, | |
| { | |
| "entropy": 0.9343065768480301, | |
| "epoch": 0.6445497630331753, | |
| "grad_norm": 0.8179787397384644, | |
| "learning_rate": 1.9387984816003868e-05, | |
| "loss": 0.9283621311187744, | |
| "mean_token_accuracy": 0.7648810148239136, | |
| "num_tokens": 1086022.0, | |
| "step": 68 | |
| }, | |
| { | |
| "entropy": 1.0115204006433487, | |
| "epoch": 0.6540284360189573, | |
| "grad_norm": 0.9361454844474792, | |
| "learning_rate": 1.9360778652005416e-05, | |
| "loss": 1.043932557106018, | |
| "mean_token_accuracy": 0.7457028925418854, | |
| "num_tokens": 1101344.0, | |
| "step": 69 | |
| }, | |
| { | |
| "entropy": 0.919638603925705, | |
| "epoch": 0.6635071090047393, | |
| "grad_norm": 0.7808894515037537, | |
| "learning_rate": 1.933300080220719e-05, | |
| "loss": 0.877983570098877, | |
| "mean_token_accuracy": 0.7854871153831482, | |
| "num_tokens": 1117702.0, | |
| "step": 70 | |
| }, | |
| { | |
| "entropy": 0.9225042313337326, | |
| "epoch": 0.6729857819905213, | |
| "grad_norm": 0.7920427918434143, | |
| "learning_rate": 1.9304652963070868e-05, | |
| "loss": 0.8995954990386963, | |
| "mean_token_accuracy": 0.7772288471460342, | |
| "num_tokens": 1134538.0, | |
| "step": 71 | |
| }, | |
| { | |
| "entropy": 0.9348872601985931, | |
| "epoch": 0.6824644549763034, | |
| "grad_norm": 0.7429097294807434, | |
| "learning_rate": 1.927573686586878e-05, | |
| "loss": 0.9216326475143433, | |
| "mean_token_accuracy": 0.7696940153837204, | |
| "num_tokens": 1150205.0, | |
| "step": 72 | |
| }, | |
| { | |
| "entropy": 0.8841167837381363, | |
| "epoch": 0.6919431279620853, | |
| "grad_norm": 0.9018954634666443, | |
| "learning_rate": 1.9246254276578175e-05, | |
| "loss": 0.8575382232666016, | |
| "mean_token_accuracy": 0.7889357209205627, | |
| "num_tokens": 1166758.0, | |
| "step": 73 | |
| }, | |
| { | |
| "entropy": 0.8816715627908707, | |
| "epoch": 0.7014218009478673, | |
| "grad_norm": 0.8096207976341248, | |
| "learning_rate": 1.9216206995773373e-05, | |
| "loss": 0.8846977949142456, | |
| "mean_token_accuracy": 0.7817238718271255, | |
| "num_tokens": 1182364.0, | |
| "step": 74 | |
| }, | |
| { | |
| "entropy": 0.9715549945831299, | |
| "epoch": 0.7109004739336493, | |
| "grad_norm": 0.9433501958847046, | |
| "learning_rate": 1.9185596858515797e-05, | |
| "loss": 0.9738811254501343, | |
| "mean_token_accuracy": 0.7634537816047668, | |
| "num_tokens": 1196839.0, | |
| "step": 75 | |
| }, | |
| { | |
| "entropy": 0.8470079302787781, | |
| "epoch": 0.7203791469194313, | |
| "grad_norm": 0.7724857926368713, | |
| "learning_rate": 1.9154425734241893e-05, | |
| "loss": 0.8701345920562744, | |
| "mean_token_accuracy": 0.78440722823143, | |
| "num_tokens": 1213377.0, | |
| "step": 76 | |
| }, | |
| { | |
| "entropy": 0.9654579311609268, | |
| "epoch": 0.7298578199052133, | |
| "grad_norm": 0.8752750158309937, | |
| "learning_rate": 1.9122695526648968e-05, | |
| "loss": 0.9896650910377502, | |
| "mean_token_accuracy": 0.752353623509407, | |
| "num_tokens": 1229571.0, | |
| "step": 77 | |
| }, | |
| { | |
| "entropy": 0.8873108923435211, | |
| "epoch": 0.7393364928909952, | |
| "grad_norm": 0.7646530866622925, | |
| "learning_rate": 1.9090408173578923e-05, | |
| "loss": 0.8805227279663086, | |
| "mean_token_accuracy": 0.7848268151283264, | |
| "num_tokens": 1247252.0, | |
| "step": 78 | |
| }, | |
| { | |
| "entropy": 0.8589294850826263, | |
| "epoch": 0.7488151658767772, | |
| "grad_norm": 0.8699224591255188, | |
| "learning_rate": 1.905756564689991e-05, | |
| "loss": 0.8414448499679565, | |
| "mean_token_accuracy": 0.7981384843587875, | |
| "num_tokens": 1264824.0, | |
| "step": 79 | |
| }, | |
| { | |
| "entropy": 0.9470502883195877, | |
| "epoch": 0.7582938388625592, | |
| "grad_norm": 0.7720149159431458, | |
| "learning_rate": 1.9024169952385887e-05, | |
| "loss": 0.9226930141448975, | |
| "mean_token_accuracy": 0.7769389748573303, | |
| "num_tokens": 1279205.0, | |
| "step": 80 | |
| }, | |
| { | |
| "entropy": 0.9329600483179092, | |
| "epoch": 0.7677725118483413, | |
| "grad_norm": 0.7501013875007629, | |
| "learning_rate": 1.8990223129594146e-05, | |
| "loss": 0.9248684644699097, | |
| "mean_token_accuracy": 0.7771976292133331, | |
| "num_tokens": 1295447.0, | |
| "step": 81 | |
| }, | |
| { | |
| "entropy": 0.9092026650905609, | |
| "epoch": 0.7772511848341233, | |
| "grad_norm": 0.7927464842796326, | |
| "learning_rate": 1.8955727251740742e-05, | |
| "loss": 0.9005250930786133, | |
| "mean_token_accuracy": 0.7794855386018753, | |
| "num_tokens": 1310786.0, | |
| "step": 82 | |
| }, | |
| { | |
| "entropy": 0.9609334319829941, | |
| "epoch": 0.7867298578199052, | |
| "grad_norm": 0.8200048804283142, | |
| "learning_rate": 1.8920684425573865e-05, | |
| "loss": 1.0015482902526855, | |
| "mean_token_accuracy": 0.7510274350643158, | |
| "num_tokens": 1326327.0, | |
| "step": 83 | |
| }, | |
| { | |
| "entropy": 0.9021405577659607, | |
| "epoch": 0.7962085308056872, | |
| "grad_norm": 0.7447288632392883, | |
| "learning_rate": 1.888509679124519e-05, | |
| "loss": 0.9190552234649658, | |
| "mean_token_accuracy": 0.7851650565862656, | |
| "num_tokens": 1342612.0, | |
| "step": 84 | |
| }, | |
| { | |
| "entropy": 0.8972304910421371, | |
| "epoch": 0.8056872037914692, | |
| "grad_norm": 0.8647413849830627, | |
| "learning_rate": 1.884896652217917e-05, | |
| "loss": 0.8718003034591675, | |
| "mean_token_accuracy": 0.7822871953248978, | |
| "num_tokens": 1358216.0, | |
| "step": 85 | |
| }, | |
| { | |
| "entropy": 0.8696665316820145, | |
| "epoch": 0.8151658767772512, | |
| "grad_norm": 0.8497427105903625, | |
| "learning_rate": 1.8812295824940284e-05, | |
| "loss": 0.8930065035820007, | |
| "mean_token_accuracy": 0.7775844633579254, | |
| "num_tokens": 1374136.0, | |
| "step": 86 | |
| }, | |
| { | |
| "entropy": 0.8540368229150772, | |
| "epoch": 0.8246445497630331, | |
| "grad_norm": 0.9805054068565369, | |
| "learning_rate": 1.877508693909831e-05, | |
| "loss": 0.8588962554931641, | |
| "mean_token_accuracy": 0.7839875668287277, | |
| "num_tokens": 1390236.0, | |
| "step": 87 | |
| }, | |
| { | |
| "entropy": 0.9288308620452881, | |
| "epoch": 0.8341232227488151, | |
| "grad_norm": 0.9268847107887268, | |
| "learning_rate": 1.8737342137091523e-05, | |
| "loss": 0.9457352161407471, | |
| "mean_token_accuracy": 0.7653509080410004, | |
| "num_tokens": 1405289.0, | |
| "step": 88 | |
| }, | |
| { | |
| "entropy": 0.8685550093650818, | |
| "epoch": 0.8436018957345972, | |
| "grad_norm": 0.7077257037162781, | |
| "learning_rate": 1.8699063724087905e-05, | |
| "loss": 0.8259266018867493, | |
| "mean_token_accuracy": 0.7899300456047058, | |
| "num_tokens": 1419954.0, | |
| "step": 89 | |
| }, | |
| { | |
| "entropy": 0.7893514037132263, | |
| "epoch": 0.8530805687203792, | |
| "grad_norm": 0.6737422943115234, | |
| "learning_rate": 1.866025403784439e-05, | |
| "loss": 0.7658373713493347, | |
| "mean_token_accuracy": 0.8061323910951614, | |
| "num_tokens": 1436833.0, | |
| "step": 90 | |
| }, | |
| { | |
| "entropy": 0.874814048409462, | |
| "epoch": 0.8625592417061612, | |
| "grad_norm": 0.7337303757667542, | |
| "learning_rate": 1.862091544856407e-05, | |
| "loss": 0.881226658821106, | |
| "mean_token_accuracy": 0.7787951081991196, | |
| "num_tokens": 1452006.0, | |
| "step": 91 | |
| }, | |
| { | |
| "entropy": 0.8939783275127411, | |
| "epoch": 0.8720379146919431, | |
| "grad_norm": 0.8276046514511108, | |
| "learning_rate": 1.8581050358751444e-05, | |
| "loss": 0.8792139291763306, | |
| "mean_token_accuracy": 0.7856588512659073, | |
| "num_tokens": 1467177.0, | |
| "step": 92 | |
| }, | |
| { | |
| "entropy": 0.9284755736589432, | |
| "epoch": 0.8815165876777251, | |
| "grad_norm": 0.7748722434043884, | |
| "learning_rate": 1.854066120306571e-05, | |
| "loss": 0.9296675324440002, | |
| "mean_token_accuracy": 0.7717541307210922, | |
| "num_tokens": 1481900.0, | |
| "step": 93 | |
| }, | |
| { | |
| "entropy": 0.8540140986442566, | |
| "epoch": 0.8909952606635071, | |
| "grad_norm": 0.6757642030715942, | |
| "learning_rate": 1.8499750448172046e-05, | |
| "loss": 0.8415413498878479, | |
| "mean_token_accuracy": 0.7882983386516571, | |
| "num_tokens": 1498093.0, | |
| "step": 94 | |
| }, | |
| { | |
| "entropy": 0.8530154079198837, | |
| "epoch": 0.9004739336492891, | |
| "grad_norm": 0.7341352105140686, | |
| "learning_rate": 1.8458320592590976e-05, | |
| "loss": 0.8625093698501587, | |
| "mean_token_accuracy": 0.7759335339069366, | |
| "num_tokens": 1514287.0, | |
| "step": 95 | |
| }, | |
| { | |
| "entropy": 0.7864818871021271, | |
| "epoch": 0.909952606635071, | |
| "grad_norm": 0.7490381598472595, | |
| "learning_rate": 1.841637416654579e-05, | |
| "loss": 0.8058528900146484, | |
| "mean_token_accuracy": 0.7964929491281509, | |
| "num_tokens": 1530706.0, | |
| "step": 96 | |
| }, | |
| { | |
| "entropy": 0.8394529819488525, | |
| "epoch": 0.919431279620853, | |
| "grad_norm": 0.6680670380592346, | |
| "learning_rate": 1.837391373180801e-05, | |
| "loss": 0.8252418041229248, | |
| "mean_token_accuracy": 0.7954028099775314, | |
| "num_tokens": 1547595.0, | |
| "step": 97 | |
| }, | |
| { | |
| "entropy": 0.8288186490535736, | |
| "epoch": 0.9289099526066351, | |
| "grad_norm": 0.7968632578849792, | |
| "learning_rate": 1.8330941881540917e-05, | |
| "loss": 0.8349557518959045, | |
| "mean_token_accuracy": 0.7844376415014267, | |
| "num_tokens": 1563028.0, | |
| "step": 98 | |
| }, | |
| { | |
| "entropy": 0.8333224058151245, | |
| "epoch": 0.9383886255924171, | |
| "grad_norm": 0.691230297088623, | |
| "learning_rate": 1.8287461240141217e-05, | |
| "loss": 0.8373101353645325, | |
| "mean_token_accuracy": 0.7931841313838959, | |
| "num_tokens": 1578461.0, | |
| "step": 99 | |
| }, | |
| { | |
| "entropy": 0.9056068807840347, | |
| "epoch": 0.9478672985781991, | |
| "grad_norm": 0.81464022397995, | |
| "learning_rate": 1.8243474463078738e-05, | |
| "loss": 0.9318811297416687, | |
| "mean_token_accuracy": 0.7732242494821548, | |
| "num_tokens": 1594199.0, | |
| "step": 100 | |
| }, | |
| { | |
| "entropy": 0.8589460551738739, | |
| "epoch": 0.957345971563981, | |
| "grad_norm": 0.8496865630149841, | |
| "learning_rate": 1.8198984236734246e-05, | |
| "loss": 0.8577089905738831, | |
| "mean_token_accuracy": 0.7863648384809494, | |
| "num_tokens": 1612009.0, | |
| "step": 101 | |
| }, | |
| { | |
| "entropy": 0.9072522073984146, | |
| "epoch": 0.966824644549763, | |
| "grad_norm": 0.6795774102210999, | |
| "learning_rate": 1.8153993278235416e-05, | |
| "loss": 0.9616482853889465, | |
| "mean_token_accuracy": 0.7749254554510117, | |
| "num_tokens": 1627621.0, | |
| "step": 102 | |
| }, | |
| { | |
| "entropy": 0.824530228972435, | |
| "epoch": 0.976303317535545, | |
| "grad_norm": 0.7762973308563232, | |
| "learning_rate": 1.8108504335290852e-05, | |
| "loss": 0.8160134553909302, | |
| "mean_token_accuracy": 0.7989892214536667, | |
| "num_tokens": 1642633.0, | |
| "step": 103 | |
| }, | |
| { | |
| "entropy": 0.8416652232408524, | |
| "epoch": 0.985781990521327, | |
| "grad_norm": 0.689736008644104, | |
| "learning_rate": 1.80625201860223e-05, | |
| "loss": 0.8192683458328247, | |
| "mean_token_accuracy": 0.8011076003313065, | |
| "num_tokens": 1658656.0, | |
| "step": 104 | |
| }, | |
| { | |
| "entropy": 0.8708516359329224, | |
| "epoch": 0.995260663507109, | |
| "grad_norm": 0.6898924708366394, | |
| "learning_rate": 1.8016043638794975e-05, | |
| "loss": 0.8683778047561646, | |
| "mean_token_accuracy": 0.788833886384964, | |
| "num_tokens": 1674845.0, | |
| "step": 105 | |
| }, | |
| { | |
| "entropy": 0.7872009575366974, | |
| "epoch": 1.0, | |
| "grad_norm": 0.8945826292037964, | |
| "learning_rate": 1.7969077532046047e-05, | |
| "loss": 0.7540068030357361, | |
| "mean_token_accuracy": 0.8025133311748505, | |
| "num_tokens": 1682713.0, | |
| "step": 106 | |
| }, | |
| { | |
| "entropy": 0.8132580518722534, | |
| "epoch": 1.009478672985782, | |
| "grad_norm": 0.7014424800872803, | |
| "learning_rate": 1.7921624734111292e-05, | |
| "loss": 0.7927298545837402, | |
| "mean_token_accuracy": 0.7970700860023499, | |
| "num_tokens": 1698838.0, | |
| "step": 107 | |
| }, | |
| { | |
| "entropy": 0.9552255123853683, | |
| "epoch": 1.018957345971564, | |
| "grad_norm": 0.7601428031921387, | |
| "learning_rate": 1.787368814304992e-05, | |
| "loss": 0.9353047609329224, | |
| "mean_token_accuracy": 0.7636215835809708, | |
| "num_tokens": 1714529.0, | |
| "step": 108 | |
| }, | |
| { | |
| "entropy": 0.8868769556283951, | |
| "epoch": 1.028436018957346, | |
| "grad_norm": 0.7216587066650391, | |
| "learning_rate": 1.7825270686467567e-05, | |
| "loss": 0.8972142934799194, | |
| "mean_token_accuracy": 0.7766188830137253, | |
| "num_tokens": 1730585.0, | |
| "step": 109 | |
| }, | |
| { | |
| "entropy": 0.7498472779989243, | |
| "epoch": 1.037914691943128, | |
| "grad_norm": 0.7536665201187134, | |
| "learning_rate": 1.7776375321337523e-05, | |
| "loss": 0.7597789764404297, | |
| "mean_token_accuracy": 0.8086548745632172, | |
| "num_tokens": 1747998.0, | |
| "step": 110 | |
| }, | |
| { | |
| "entropy": 0.8525176346302032, | |
| "epoch": 1.04739336492891, | |
| "grad_norm": 0.6725131869316101, | |
| "learning_rate": 1.7727005033820117e-05, | |
| "loss": 0.8725385665893555, | |
| "mean_token_accuracy": 0.7857676595449448, | |
| "num_tokens": 1763858.0, | |
| "step": 111 | |
| }, | |
| { | |
| "entropy": 0.8737937361001968, | |
| "epoch": 1.0568720379146919, | |
| "grad_norm": 0.6200551390647888, | |
| "learning_rate": 1.7677162839080365e-05, | |
| "loss": 0.8674412965774536, | |
| "mean_token_accuracy": 0.7846900969743729, | |
| "num_tokens": 1779800.0, | |
| "step": 112 | |
| }, | |
| { | |
| "entropy": 0.8540749698877335, | |
| "epoch": 1.066350710900474, | |
| "grad_norm": 0.7130475640296936, | |
| "learning_rate": 1.762685178110382e-05, | |
| "loss": 0.8417346477508545, | |
| "mean_token_accuracy": 0.7876130044460297, | |
| "num_tokens": 1795033.0, | |
| "step": 113 | |
| }, | |
| { | |
| "entropy": 0.7300510853528976, | |
| "epoch": 1.0758293838862558, | |
| "grad_norm": 0.7067408561706543, | |
| "learning_rate": 1.757607493251067e-05, | |
| "loss": 0.7173537015914917, | |
| "mean_token_accuracy": 0.8160942941904068, | |
| "num_tokens": 1811709.0, | |
| "step": 114 | |
| }, | |
| { | |
| "entropy": 0.7862643301486969, | |
| "epoch": 1.085308056872038, | |
| "grad_norm": 0.6860957145690918, | |
| "learning_rate": 1.752483539436807e-05, | |
| "loss": 0.7681137323379517, | |
| "mean_token_accuracy": 0.8050192445516586, | |
| "num_tokens": 1828031.0, | |
| "step": 115 | |
| }, | |
| { | |
| "entropy": 0.8223290294408798, | |
| "epoch": 1.09478672985782, | |
| "grad_norm": 13.119089126586914, | |
| "learning_rate": 1.747313629600077e-05, | |
| "loss": 0.819756805896759, | |
| "mean_token_accuracy": 0.7999143302440643, | |
| "num_tokens": 1844448.0, | |
| "step": 116 | |
| }, | |
| { | |
| "entropy": 0.8199901729822159, | |
| "epoch": 1.1042654028436019, | |
| "grad_norm": 0.7526054978370667, | |
| "learning_rate": 1.7420980794800013e-05, | |
| "loss": 0.8049260377883911, | |
| "mean_token_accuracy": 0.7947445511817932, | |
| "num_tokens": 1860116.0, | |
| "step": 117 | |
| }, | |
| { | |
| "entropy": 0.9044909924268723, | |
| "epoch": 1.113744075829384, | |
| "grad_norm": 0.774227499961853, | |
| "learning_rate": 1.7368372076030654e-05, | |
| "loss": 0.8923323750495911, | |
| "mean_token_accuracy": 0.7806955128908157, | |
| "num_tokens": 1875432.0, | |
| "step": 118 | |
| }, | |
| { | |
| "entropy": 0.8348246514797211, | |
| "epoch": 1.1232227488151658, | |
| "grad_norm": 0.767121434211731, | |
| "learning_rate": 1.731531335263669e-05, | |
| "loss": 0.8475579023361206, | |
| "mean_token_accuracy": 0.795254722237587, | |
| "num_tokens": 1891887.0, | |
| "step": 119 | |
| }, | |
| { | |
| "entropy": 0.8337329626083374, | |
| "epoch": 1.132701421800948, | |
| "grad_norm": 0.6577139496803284, | |
| "learning_rate": 1.726180786504499e-05, | |
| "loss": 0.8073049783706665, | |
| "mean_token_accuracy": 0.7889072597026825, | |
| "num_tokens": 1908925.0, | |
| "step": 120 | |
| }, | |
| { | |
| "entropy": 0.787194699048996, | |
| "epoch": 1.1421800947867298, | |
| "grad_norm": 0.7494071125984192, | |
| "learning_rate": 1.720785888096743e-05, | |
| "loss": 0.7753745913505554, | |
| "mean_token_accuracy": 0.8019598126411438, | |
| "num_tokens": 1924838.0, | |
| "step": 121 | |
| }, | |
| { | |
| "entropy": 0.7965055257081985, | |
| "epoch": 1.1516587677725119, | |
| "grad_norm": 0.7918292880058289, | |
| "learning_rate": 1.7153469695201278e-05, | |
| "loss": 0.7967703342437744, | |
| "mean_token_accuracy": 0.7995534092187881, | |
| "num_tokens": 1940925.0, | |
| "step": 122 | |
| }, | |
| { | |
| "entropy": 0.860944464802742, | |
| "epoch": 1.161137440758294, | |
| "grad_norm": 0.7927801609039307, | |
| "learning_rate": 1.7098643629428035e-05, | |
| "loss": 0.8686648607254028, | |
| "mean_token_accuracy": 0.7804652601480484, | |
| "num_tokens": 1956955.0, | |
| "step": 123 | |
| }, | |
| { | |
| "entropy": 0.7591796815395355, | |
| "epoch": 1.1706161137440758, | |
| "grad_norm": 0.7671242356300354, | |
| "learning_rate": 1.7043384032010523e-05, | |
| "loss": 0.8130097389221191, | |
| "mean_token_accuracy": 0.8032640516757965, | |
| "num_tokens": 1972680.0, | |
| "step": 124 | |
| }, | |
| { | |
| "entropy": 0.8104424327611923, | |
| "epoch": 1.180094786729858, | |
| "grad_norm": 0.8326568603515625, | |
| "learning_rate": 1.698769427778842e-05, | |
| "loss": 0.8218762874603271, | |
| "mean_token_accuracy": 0.793466106057167, | |
| "num_tokens": 1988526.0, | |
| "step": 125 | |
| }, | |
| { | |
| "entropy": 0.8348822742700577, | |
| "epoch": 1.1895734597156398, | |
| "grad_norm": 0.7889320850372314, | |
| "learning_rate": 1.693157776787212e-05, | |
| "loss": 0.8524150252342224, | |
| "mean_token_accuracy": 0.7928779572248459, | |
| "num_tokens": 2004119.0, | |
| "step": 126 | |
| }, | |
| { | |
| "entropy": 0.8161694556474686, | |
| "epoch": 1.1990521327014219, | |
| "grad_norm": 0.7603635191917419, | |
| "learning_rate": 1.687503792943506e-05, | |
| "loss": 0.8084812164306641, | |
| "mean_token_accuracy": 0.7948593199253082, | |
| "num_tokens": 2020391.0, | |
| "step": 127 | |
| }, | |
| { | |
| "entropy": 0.8825703114271164, | |
| "epoch": 1.2085308056872037, | |
| "grad_norm": 0.7401167750358582, | |
| "learning_rate": 1.681807821550438e-05, | |
| "loss": 0.8706303238868713, | |
| "mean_token_accuracy": 0.7867066264152527, | |
| "num_tokens": 2035638.0, | |
| "step": 128 | |
| }, | |
| { | |
| "entropy": 0.7736850678920746, | |
| "epoch": 1.2180094786729858, | |
| "grad_norm": 0.6535520553588867, | |
| "learning_rate": 1.6760702104750046e-05, | |
| "loss": 0.7504853010177612, | |
| "mean_token_accuracy": 0.8084538877010345, | |
| "num_tokens": 2052292.0, | |
| "step": 129 | |
| }, | |
| { | |
| "entropy": 0.9081571251153946, | |
| "epoch": 1.2274881516587677, | |
| "grad_norm": 0.6775054335594177, | |
| "learning_rate": 1.670291310127242e-05, | |
| "loss": 0.9248923659324646, | |
| "mean_token_accuracy": 0.7756912559270859, | |
| "num_tokens": 2068464.0, | |
| "step": 130 | |
| }, | |
| { | |
| "entropy": 0.8087949007749557, | |
| "epoch": 1.2369668246445498, | |
| "grad_norm": 0.8837705850601196, | |
| "learning_rate": 1.664471473438822e-05, | |
| "loss": 0.7927989959716797, | |
| "mean_token_accuracy": 0.7962196916341782, | |
| "num_tokens": 2084023.0, | |
| "step": 131 | |
| }, | |
| { | |
| "entropy": 0.8239958882331848, | |
| "epoch": 1.2464454976303316, | |
| "grad_norm": 0.7673121690750122, | |
| "learning_rate": 1.6586110558415006e-05, | |
| "loss": 0.7868762016296387, | |
| "mean_token_accuracy": 0.8091334104537964, | |
| "num_tokens": 2100036.0, | |
| "step": 132 | |
| }, | |
| { | |
| "entropy": 0.7624151110649109, | |
| "epoch": 1.2559241706161137, | |
| "grad_norm": 0.6538701057434082, | |
| "learning_rate": 1.6527104152454096e-05, | |
| "loss": 0.7488104104995728, | |
| "mean_token_accuracy": 0.8097258806228638, | |
| "num_tokens": 2117206.0, | |
| "step": 133 | |
| }, | |
| { | |
| "entropy": 0.8748339861631393, | |
| "epoch": 1.2654028436018958, | |
| "grad_norm": 0.6970165967941284, | |
| "learning_rate": 1.646769912017199e-05, | |
| "loss": 0.8594735860824585, | |
| "mean_token_accuracy": 0.7816884070634842, | |
| "num_tokens": 2132466.0, | |
| "step": 134 | |
| }, | |
| { | |
| "entropy": 0.7696728706359863, | |
| "epoch": 1.2748815165876777, | |
| "grad_norm": 0.6646419763565063, | |
| "learning_rate": 1.6407899089580263e-05, | |
| "loss": 0.7375603914260864, | |
| "mean_token_accuracy": 0.8088530749082565, | |
| "num_tokens": 2148789.0, | |
| "step": 135 | |
| }, | |
| { | |
| "entropy": 0.768274188041687, | |
| "epoch": 1.2843601895734598, | |
| "grad_norm": 0.6656033992767334, | |
| "learning_rate": 1.6347707712814023e-05, | |
| "loss": 0.7362462878227234, | |
| "mean_token_accuracy": 0.81095190346241, | |
| "num_tokens": 2166975.0, | |
| "step": 136 | |
| }, | |
| { | |
| "entropy": 0.8369315713644028, | |
| "epoch": 1.2938388625592416, | |
| "grad_norm": 0.7545751929283142, | |
| "learning_rate": 1.628712866590885e-05, | |
| "loss": 0.8421776294708252, | |
| "mean_token_accuracy": 0.789647251367569, | |
| "num_tokens": 2182471.0, | |
| "step": 137 | |
| }, | |
| { | |
| "entropy": 0.9027784168720245, | |
| "epoch": 1.3033175355450237, | |
| "grad_norm": 0.8744738698005676, | |
| "learning_rate": 1.622616564857629e-05, | |
| "loss": 0.8971675038337708, | |
| "mean_token_accuracy": 0.7683477252721786, | |
| "num_tokens": 2197146.0, | |
| "step": 138 | |
| }, | |
| { | |
| "entropy": 0.7965396195650101, | |
| "epoch": 1.3127962085308056, | |
| "grad_norm": 0.7402711510658264, | |
| "learning_rate": 1.6164822383977912e-05, | |
| "loss": 0.8043927550315857, | |
| "mean_token_accuracy": 0.7958309352397919, | |
| "num_tokens": 2211761.0, | |
| "step": 139 | |
| }, | |
| { | |
| "entropy": 0.8152131289243698, | |
| "epoch": 1.3222748815165877, | |
| "grad_norm": 1.6655291318893433, | |
| "learning_rate": 1.6103102618497922e-05, | |
| "loss": 0.8047835230827332, | |
| "mean_token_accuracy": 0.7941760718822479, | |
| "num_tokens": 2229548.0, | |
| "step": 140 | |
| }, | |
| { | |
| "entropy": 0.8045339435338974, | |
| "epoch": 1.3317535545023698, | |
| "grad_norm": 0.8306461572647095, | |
| "learning_rate": 1.604101012151436e-05, | |
| "loss": 0.8004187345504761, | |
| "mean_token_accuracy": 0.7995999902486801, | |
| "num_tokens": 2244710.0, | |
| "step": 141 | |
| }, | |
| { | |
| "entropy": 0.8147155493497849, | |
| "epoch": 1.3412322274881516, | |
| "grad_norm": 0.6874533891677856, | |
| "learning_rate": 1.5978548685168892e-05, | |
| "loss": 0.8141619563102722, | |
| "mean_token_accuracy": 0.799629807472229, | |
| "num_tokens": 2261252.0, | |
| "step": 142 | |
| }, | |
| { | |
| "entropy": 0.8661371916532516, | |
| "epoch": 1.3507109004739337, | |
| "grad_norm": 0.742089033126831, | |
| "learning_rate": 1.5915722124135227e-05, | |
| "loss": 0.8675738573074341, | |
| "mean_token_accuracy": 0.7867332845926285, | |
| "num_tokens": 2277329.0, | |
| "step": 143 | |
| }, | |
| { | |
| "entropy": 0.8389143347740173, | |
| "epoch": 1.3601895734597156, | |
| "grad_norm": 0.7694486379623413, | |
| "learning_rate": 1.5852534275386133e-05, | |
| "loss": 0.8360339403152466, | |
| "mean_token_accuracy": 0.7909637689590454, | |
| "num_tokens": 2291859.0, | |
| "step": 144 | |
| }, | |
| { | |
| "entropy": 0.8579956740140915, | |
| "epoch": 1.3696682464454977, | |
| "grad_norm": 0.6887325048446655, | |
| "learning_rate": 1.5788988997959115e-05, | |
| "loss": 0.8614661693572998, | |
| "mean_token_accuracy": 0.7857953310012817, | |
| "num_tokens": 2309035.0, | |
| "step": 145 | |
| }, | |
| { | |
| "entropy": 0.8863996863365173, | |
| "epoch": 1.3791469194312795, | |
| "grad_norm": 0.8338068127632141, | |
| "learning_rate": 1.572509017272072e-05, | |
| "loss": 0.8865163326263428, | |
| "mean_token_accuracy": 0.7824681550264359, | |
| "num_tokens": 2323401.0, | |
| "step": 146 | |
| }, | |
| { | |
| "entropy": 0.8635494112968445, | |
| "epoch": 1.3886255924170616, | |
| "grad_norm": 0.7118057012557983, | |
| "learning_rate": 1.5660841702129533e-05, | |
| "loss": 0.8939542770385742, | |
| "mean_token_accuracy": 0.7877447158098221, | |
| "num_tokens": 2340007.0, | |
| "step": 147 | |
| }, | |
| { | |
| "entropy": 0.8171515315771103, | |
| "epoch": 1.3981042654028437, | |
| "grad_norm": 0.7389315962791443, | |
| "learning_rate": 1.5596247509997843e-05, | |
| "loss": 0.8200218677520752, | |
| "mean_token_accuracy": 0.7987903952598572, | |
| "num_tokens": 2355339.0, | |
| "step": 148 | |
| }, | |
| { | |
| "entropy": 0.8425612896680832, | |
| "epoch": 1.4075829383886256, | |
| "grad_norm": 0.7023189067840576, | |
| "learning_rate": 1.5531311541251995e-05, | |
| "loss": 0.8327531814575195, | |
| "mean_token_accuracy": 0.7908835858106613, | |
| "num_tokens": 2371667.0, | |
| "step": 149 | |
| }, | |
| { | |
| "entropy": 0.7493451237678528, | |
| "epoch": 1.4170616113744074, | |
| "grad_norm": 0.778081476688385, | |
| "learning_rate": 1.5466037761691493e-05, | |
| "loss": 0.7227718830108643, | |
| "mean_token_accuracy": 0.8122826516628265, | |
| "num_tokens": 2386902.0, | |
| "step": 150 | |
| }, | |
| { | |
| "entropy": 0.7571745365858078, | |
| "epoch": 1.4265402843601895, | |
| "grad_norm": 0.7302114963531494, | |
| "learning_rate": 1.540043015774676e-05, | |
| "loss": 0.7457150220870972, | |
| "mean_token_accuracy": 0.8071363717317581, | |
| "num_tokens": 2403068.0, | |
| "step": 151 | |
| }, | |
| { | |
| "entropy": 0.8944563120603561, | |
| "epoch": 1.4360189573459716, | |
| "grad_norm": 0.7927082180976868, | |
| "learning_rate": 1.5334492736235703e-05, | |
| "loss": 0.9174668192863464, | |
| "mean_token_accuracy": 0.7727936506271362, | |
| "num_tokens": 2418122.0, | |
| "step": 152 | |
| }, | |
| { | |
| "entropy": 0.8746097236871719, | |
| "epoch": 1.4454976303317535, | |
| "grad_norm": 0.673865556716919, | |
| "learning_rate": 1.5268229524119007e-05, | |
| "loss": 0.8767449855804443, | |
| "mean_token_accuracy": 0.7917247861623764, | |
| "num_tokens": 2434701.0, | |
| "step": 153 | |
| }, | |
| { | |
| "entropy": 0.8153167068958282, | |
| "epoch": 1.4549763033175356, | |
| "grad_norm": 0.7478446364402771, | |
| "learning_rate": 1.5201644568254181e-05, | |
| "loss": 0.79709392786026, | |
| "mean_token_accuracy": 0.797386422753334, | |
| "num_tokens": 2451672.0, | |
| "step": 154 | |
| }, | |
| { | |
| "entropy": 0.7992338985204697, | |
| "epoch": 1.4644549763033177, | |
| "grad_norm": 0.8660085201263428, | |
| "learning_rate": 1.513474193514842e-05, | |
| "loss": 0.7673428058624268, | |
| "mean_token_accuracy": 0.8041483759880066, | |
| "num_tokens": 2466644.0, | |
| "step": 155 | |
| }, | |
| { | |
| "entropy": 0.8519728779792786, | |
| "epoch": 1.4739336492890995, | |
| "grad_norm": 0.8062114119529724, | |
| "learning_rate": 1.5067525710710253e-05, | |
| "loss": 0.8258011341094971, | |
| "mean_token_accuracy": 0.7964712530374527, | |
| "num_tokens": 2481743.0, | |
| "step": 156 | |
| }, | |
| { | |
| "entropy": 0.7952524870634079, | |
| "epoch": 1.4834123222748814, | |
| "grad_norm": 0.7020323276519775, | |
| "learning_rate": 1.5000000000000002e-05, | |
| "loss": 0.7887269258499146, | |
| "mean_token_accuracy": 0.8075706511735916, | |
| "num_tokens": 2498442.0, | |
| "step": 157 | |
| }, | |
| { | |
| "entropy": 0.7479639947414398, | |
| "epoch": 1.4928909952606635, | |
| "grad_norm": 0.7170225977897644, | |
| "learning_rate": 1.4932168926979074e-05, | |
| "loss": 0.7105016708374023, | |
| "mean_token_accuracy": 0.8205106854438782, | |
| "num_tokens": 2514651.0, | |
| "step": 158 | |
| }, | |
| { | |
| "entropy": 0.8277395963668823, | |
| "epoch": 1.5023696682464456, | |
| "grad_norm": 0.7073144316673279, | |
| "learning_rate": 1.4864036634258112e-05, | |
| "loss": 0.8328120112419128, | |
| "mean_token_accuracy": 0.7966006547212601, | |
| "num_tokens": 2532309.0, | |
| "step": 159 | |
| }, | |
| { | |
| "entropy": 0.803072065114975, | |
| "epoch": 1.5118483412322274, | |
| "grad_norm": 0.7687756419181824, | |
| "learning_rate": 1.479560728284398e-05, | |
| "loss": 0.8329487442970276, | |
| "mean_token_accuracy": 0.7969093024730682, | |
| "num_tokens": 2548833.0, | |
| "step": 160 | |
| }, | |
| { | |
| "entropy": 0.8335355520248413, | |
| "epoch": 1.5213270142180095, | |
| "grad_norm": 0.7092747092247009, | |
| "learning_rate": 1.4726885051885654e-05, | |
| "loss": 0.8255264759063721, | |
| "mean_token_accuracy": 0.7908845394849777, | |
| "num_tokens": 2563652.0, | |
| "step": 161 | |
| }, | |
| { | |
| "entropy": 0.7855267077684402, | |
| "epoch": 1.5308056872037916, | |
| "grad_norm": 0.8094468712806702, | |
| "learning_rate": 1.465787413841898e-05, | |
| "loss": 0.7922707200050354, | |
| "mean_token_accuracy": 0.7990368455648422, | |
| "num_tokens": 2579535.0, | |
| "step": 162 | |
| }, | |
| { | |
| "entropy": 0.815998300909996, | |
| "epoch": 1.5402843601895735, | |
| "grad_norm": 0.6849187612533569, | |
| "learning_rate": 1.4588578757110359e-05, | |
| "loss": 0.8124603629112244, | |
| "mean_token_accuracy": 0.7991871684789658, | |
| "num_tokens": 2596641.0, | |
| "step": 163 | |
| }, | |
| { | |
| "entropy": 0.8301991075277328, | |
| "epoch": 1.5497630331753554, | |
| "grad_norm": 0.7943881750106812, | |
| "learning_rate": 1.451900313999934e-05, | |
| "loss": 0.8316096067428589, | |
| "mean_token_accuracy": 0.7986911088228226, | |
| "num_tokens": 2611723.0, | |
| "step": 164 | |
| }, | |
| { | |
| "entropy": 0.8705786466598511, | |
| "epoch": 1.5592417061611374, | |
| "grad_norm": 0.8372196555137634, | |
| "learning_rate": 1.4449151536240167e-05, | |
| "loss": 0.8712157607078552, | |
| "mean_token_accuracy": 0.7852582186460495, | |
| "num_tokens": 2627866.0, | |
| "step": 165 | |
| }, | |
| { | |
| "entropy": 0.7600953280925751, | |
| "epoch": 1.5687203791469195, | |
| "grad_norm": 0.673876166343689, | |
| "learning_rate": 1.4379028211842265e-05, | |
| "loss": 0.7214274406433105, | |
| "mean_token_accuracy": 0.8191453367471695, | |
| "num_tokens": 2644305.0, | |
| "step": 166 | |
| }, | |
| { | |
| "entropy": 0.9016031622886658, | |
| "epoch": 1.5781990521327014, | |
| "grad_norm": 0.7573343515396118, | |
| "learning_rate": 1.4308637449409705e-05, | |
| "loss": 0.8627768158912659, | |
| "mean_token_accuracy": 0.7764580994844437, | |
| "num_tokens": 2659693.0, | |
| "step": 167 | |
| }, | |
| { | |
| "entropy": 0.7766188532114029, | |
| "epoch": 1.5876777251184833, | |
| "grad_norm": 0.6323472857475281, | |
| "learning_rate": 1.4237983547879664e-05, | |
| "loss": 0.7716467380523682, | |
| "mean_token_accuracy": 0.8073431253433228, | |
| "num_tokens": 2676499.0, | |
| "step": 168 | |
| }, | |
| { | |
| "entropy": 0.7772778868675232, | |
| "epoch": 1.5971563981042654, | |
| "grad_norm": 0.8210384845733643, | |
| "learning_rate": 1.4167070822259868e-05, | |
| "loss": 0.7849143147468567, | |
| "mean_token_accuracy": 0.8017231673002243, | |
| "num_tokens": 2692356.0, | |
| "step": 169 | |
| }, | |
| { | |
| "entropy": 0.8266542851924896, | |
| "epoch": 1.6066350710900474, | |
| "grad_norm": 0.8243103623390198, | |
| "learning_rate": 1.4095903603365067e-05, | |
| "loss": 0.8927056789398193, | |
| "mean_token_accuracy": 0.7911294400691986, | |
| "num_tokens": 2707486.0, | |
| "step": 170 | |
| }, | |
| { | |
| "entropy": 0.8491052091121674, | |
| "epoch": 1.6161137440758293, | |
| "grad_norm": 0.8660029768943787, | |
| "learning_rate": 1.402448623755254e-05, | |
| "loss": 0.9026396870613098, | |
| "mean_token_accuracy": 0.7884256094694138, | |
| "num_tokens": 2722952.0, | |
| "step": 171 | |
| }, | |
| { | |
| "entropy": 0.7935148179531097, | |
| "epoch": 1.6255924170616114, | |
| "grad_norm": 0.7412772178649902, | |
| "learning_rate": 1.3952823086456656e-05, | |
| "loss": 0.8272484540939331, | |
| "mean_token_accuracy": 0.8038238883018494, | |
| "num_tokens": 2737748.0, | |
| "step": 172 | |
| }, | |
| { | |
| "entropy": 0.7457627803087234, | |
| "epoch": 1.6350710900473935, | |
| "grad_norm": 0.6538417935371399, | |
| "learning_rate": 1.3880918526722497e-05, | |
| "loss": 0.7617708444595337, | |
| "mean_token_accuracy": 0.8141642510890961, | |
| "num_tokens": 2754877.0, | |
| "step": 173 | |
| }, | |
| { | |
| "entropy": 0.8603504151105881, | |
| "epoch": 1.6445497630331753, | |
| "grad_norm": 0.7187417149543762, | |
| "learning_rate": 1.3808776949738569e-05, | |
| "loss": 0.8643186688423157, | |
| "mean_token_accuracy": 0.7814531624317169, | |
| "num_tokens": 2770322.0, | |
| "step": 174 | |
| }, | |
| { | |
| "entropy": 0.8017472177743912, | |
| "epoch": 1.6540284360189572, | |
| "grad_norm": 0.7812091708183289, | |
| "learning_rate": 1.3736402761368597e-05, | |
| "loss": 0.7945480346679688, | |
| "mean_token_accuracy": 0.7994107306003571, | |
| "num_tokens": 2785842.0, | |
| "step": 175 | |
| }, | |
| { | |
| "entropy": 0.839350700378418, | |
| "epoch": 1.6635071090047393, | |
| "grad_norm": 0.6516030430793762, | |
| "learning_rate": 1.3663800381682465e-05, | |
| "loss": 0.8218634724617004, | |
| "mean_token_accuracy": 0.7927983403205872, | |
| "num_tokens": 2802150.0, | |
| "step": 176 | |
| }, | |
| { | |
| "entropy": 0.8673035353422165, | |
| "epoch": 1.6729857819905214, | |
| "grad_norm": 0.7417457699775696, | |
| "learning_rate": 1.3590974244686248e-05, | |
| "loss": 0.8643198013305664, | |
| "mean_token_accuracy": 0.7817965149879456, | |
| "num_tokens": 2818036.0, | |
| "step": 177 | |
| }, | |
| { | |
| "entropy": 0.8689644038677216, | |
| "epoch": 1.6824644549763033, | |
| "grad_norm": 0.7800316214561462, | |
| "learning_rate": 1.3517928798051442e-05, | |
| "loss": 0.8703083992004395, | |
| "mean_token_accuracy": 0.784311443567276, | |
| "num_tokens": 2833551.0, | |
| "step": 178 | |
| }, | |
| { | |
| "entropy": 0.8547259271144867, | |
| "epoch": 1.6919431279620853, | |
| "grad_norm": 0.7157973647117615, | |
| "learning_rate": 1.344466850284333e-05, | |
| "loss": 0.8583805561065674, | |
| "mean_token_accuracy": 0.7822477072477341, | |
| "num_tokens": 2849148.0, | |
| "step": 179 | |
| }, | |
| { | |
| "entropy": 0.8259568810462952, | |
| "epoch": 1.7014218009478674, | |
| "grad_norm": 0.682313084602356, | |
| "learning_rate": 1.3371197833248508e-05, | |
| "loss": 0.8174458742141724, | |
| "mean_token_accuracy": 0.7968255281448364, | |
| "num_tokens": 2865308.0, | |
| "step": 180 | |
| }, | |
| { | |
| "entropy": 0.7793432623147964, | |
| "epoch": 1.7109004739336493, | |
| "grad_norm": 0.6921513080596924, | |
| "learning_rate": 1.3297521276301666e-05, | |
| "loss": 0.7710314393043518, | |
| "mean_token_accuracy": 0.807743102312088, | |
| "num_tokens": 2880734.0, | |
| "step": 181 | |
| }, | |
| { | |
| "entropy": 0.8243378251791, | |
| "epoch": 1.7203791469194312, | |
| "grad_norm": 0.8263442516326904, | |
| "learning_rate": 1.3223643331611538e-05, | |
| "loss": 0.8093314170837402, | |
| "mean_token_accuracy": 0.8022452294826508, | |
| "num_tokens": 2896023.0, | |
| "step": 182 | |
| }, | |
| { | |
| "entropy": 0.8052062392234802, | |
| "epoch": 1.7298578199052133, | |
| "grad_norm": 0.7710005640983582, | |
| "learning_rate": 1.3149568511086104e-05, | |
| "loss": 0.786173403263092, | |
| "mean_token_accuracy": 0.8085711002349854, | |
| "num_tokens": 2911649.0, | |
| "step": 183 | |
| }, | |
| { | |
| "entropy": 0.8499461859464645, | |
| "epoch": 1.7393364928909953, | |
| "grad_norm": 0.7619282603263855, | |
| "learning_rate": 1.3075301338657036e-05, | |
| "loss": 0.803632378578186, | |
| "mean_token_accuracy": 0.7947988212108612, | |
| "num_tokens": 2926955.0, | |
| "step": 184 | |
| }, | |
| { | |
| "entropy": 0.8158384412527084, | |
| "epoch": 1.7488151658767772, | |
| "grad_norm": 0.7673326134681702, | |
| "learning_rate": 1.300084635000341e-05, | |
| "loss": 0.7742278575897217, | |
| "mean_token_accuracy": 0.8106317967176437, | |
| "num_tokens": 2941760.0, | |
| "step": 185 | |
| }, | |
| { | |
| "entropy": 0.9341417849063873, | |
| "epoch": 1.758293838862559, | |
| "grad_norm": 0.7602595686912537, | |
| "learning_rate": 1.2926208092274699e-05, | |
| "loss": 0.9146112203598022, | |
| "mean_token_accuracy": 0.7746770083904266, | |
| "num_tokens": 2957629.0, | |
| "step": 186 | |
| }, | |
| { | |
| "entropy": 0.8594657629728317, | |
| "epoch": 1.7677725118483414, | |
| "grad_norm": 0.6417691111564636, | |
| "learning_rate": 1.2851391123813075e-05, | |
| "loss": 0.8513129949569702, | |
| "mean_token_accuracy": 0.7823676913976669, | |
| "num_tokens": 2974081.0, | |
| "step": 187 | |
| }, | |
| { | |
| "entropy": 0.7850838005542755, | |
| "epoch": 1.7772511848341233, | |
| "grad_norm": 0.7059190273284912, | |
| "learning_rate": 1.2776400013875006e-05, | |
| "loss": 0.7892745137214661, | |
| "mean_token_accuracy": 0.8025059252977371, | |
| "num_tokens": 2989825.0, | |
| "step": 188 | |
| }, | |
| { | |
| "entropy": 0.8054764866828918, | |
| "epoch": 1.7867298578199051, | |
| "grad_norm": 0.7688627243041992, | |
| "learning_rate": 1.2701239342352223e-05, | |
| "loss": 0.8145111799240112, | |
| "mean_token_accuracy": 0.7917294800281525, | |
| "num_tokens": 3005441.0, | |
| "step": 189 | |
| }, | |
| { | |
| "entropy": 0.7718167155981064, | |
| "epoch": 1.7962085308056872, | |
| "grad_norm": 0.7204188704490662, | |
| "learning_rate": 1.2625913699491986e-05, | |
| "loss": 0.7447738647460938, | |
| "mean_token_accuracy": 0.8080335557460785, | |
| "num_tokens": 3020864.0, | |
| "step": 190 | |
| }, | |
| { | |
| "entropy": 0.769375741481781, | |
| "epoch": 1.8056872037914693, | |
| "grad_norm": 0.8016238212585449, | |
| "learning_rate": 1.2550427685616767e-05, | |
| "loss": 0.7878785133361816, | |
| "mean_token_accuracy": 0.8049481511116028, | |
| "num_tokens": 3036515.0, | |
| "step": 191 | |
| }, | |
| { | |
| "entropy": 0.7770668566226959, | |
| "epoch": 1.8151658767772512, | |
| "grad_norm": 0.7289695739746094, | |
| "learning_rate": 1.2474785910843289e-05, | |
| "loss": 0.7667026519775391, | |
| "mean_token_accuracy": 0.808977484703064, | |
| "num_tokens": 3052474.0, | |
| "step": 192 | |
| }, | |
| { | |
| "entropy": 0.7413365840911865, | |
| "epoch": 1.824644549763033, | |
| "grad_norm": 0.6931266784667969, | |
| "learning_rate": 1.239899299480098e-05, | |
| "loss": 0.7244135737419128, | |
| "mean_token_accuracy": 0.8193869441747665, | |
| "num_tokens": 3068429.0, | |
| "step": 193 | |
| }, | |
| { | |
| "entropy": 0.789655476808548, | |
| "epoch": 1.8341232227488151, | |
| "grad_norm": 0.6820263862609863, | |
| "learning_rate": 1.2323053566349834e-05, | |
| "loss": 0.772818386554718, | |
| "mean_token_accuracy": 0.8061677515506744, | |
| "num_tokens": 3084782.0, | |
| "step": 194 | |
| }, | |
| { | |
| "entropy": 0.8178127855062485, | |
| "epoch": 1.8436018957345972, | |
| "grad_norm": 0.6741772890090942, | |
| "learning_rate": 1.2246972263297718e-05, | |
| "loss": 0.8100478053092957, | |
| "mean_token_accuracy": 0.789124608039856, | |
| "num_tokens": 3101303.0, | |
| "step": 195 | |
| }, | |
| { | |
| "entropy": 0.7827744632959366, | |
| "epoch": 1.853080568720379, | |
| "grad_norm": 0.7667286992073059, | |
| "learning_rate": 1.2170753732117138e-05, | |
| "loss": 0.7773491144180298, | |
| "mean_token_accuracy": 0.8108680844306946, | |
| "num_tokens": 3116869.0, | |
| "step": 196 | |
| }, | |
| { | |
| "entropy": 0.7749081552028656, | |
| "epoch": 1.8625592417061612, | |
| "grad_norm": 0.663255512714386, | |
| "learning_rate": 1.2094402627661447e-05, | |
| "loss": 0.7651374340057373, | |
| "mean_token_accuracy": 0.8063687533140182, | |
| "num_tokens": 3133160.0, | |
| "step": 197 | |
| }, | |
| { | |
| "entropy": 0.7427408695220947, | |
| "epoch": 1.8720379146919433, | |
| "grad_norm": 0.8035187721252441, | |
| "learning_rate": 1.2017923612880579e-05, | |
| "loss": 0.7308788299560547, | |
| "mean_token_accuracy": 0.8139677345752716, | |
| "num_tokens": 3148591.0, | |
| "step": 198 | |
| }, | |
| { | |
| "entropy": 0.7870882898569107, | |
| "epoch": 1.8815165876777251, | |
| "grad_norm": 0.7356343269348145, | |
| "learning_rate": 1.1941321358536278e-05, | |
| "loss": 0.7814972400665283, | |
| "mean_token_accuracy": 0.8018877655267715, | |
| "num_tokens": 3164873.0, | |
| "step": 199 | |
| }, | |
| { | |
| "entropy": 0.7867356240749359, | |
| "epoch": 1.890995260663507, | |
| "grad_norm": 0.7502454519271851, | |
| "learning_rate": 1.1864600542916813e-05, | |
| "loss": 0.7805415391921997, | |
| "mean_token_accuracy": 0.8045219480991364, | |
| "num_tokens": 3180052.0, | |
| "step": 200 | |
| }, | |
| { | |
| "entropy": 0.8339502215385437, | |
| "epoch": 1.900473933649289, | |
| "grad_norm": 0.7521458268165588, | |
| "learning_rate": 1.1787765851551296e-05, | |
| "loss": 0.8415899276733398, | |
| "mean_token_accuracy": 0.7910822480916977, | |
| "num_tokens": 3196566.0, | |
| "step": 201 | |
| }, | |
| { | |
| "entropy": 0.8327292948961258, | |
| "epoch": 1.9099526066350712, | |
| "grad_norm": 0.7168639898300171, | |
| "learning_rate": 1.17108219769235e-05, | |
| "loss": 0.8684878349304199, | |
| "mean_token_accuracy": 0.7880802601575851, | |
| "num_tokens": 3212078.0, | |
| "step": 202 | |
| }, | |
| { | |
| "entropy": 0.8257894515991211, | |
| "epoch": 1.919431279620853, | |
| "grad_norm": 0.7584367394447327, | |
| "learning_rate": 1.1633773618185302e-05, | |
| "loss": 0.8573964834213257, | |
| "mean_token_accuracy": 0.7954303473234177, | |
| "num_tokens": 3227754.0, | |
| "step": 203 | |
| }, | |
| { | |
| "entropy": 0.8100651800632477, | |
| "epoch": 1.9289099526066351, | |
| "grad_norm": 0.7311358451843262, | |
| "learning_rate": 1.155662548086967e-05, | |
| "loss": 0.7980862259864807, | |
| "mean_token_accuracy": 0.8024295270442963, | |
| "num_tokens": 3244342.0, | |
| "step": 204 | |
| }, | |
| { | |
| "entropy": 0.877414807677269, | |
| "epoch": 1.9383886255924172, | |
| "grad_norm": 0.8059820532798767, | |
| "learning_rate": 1.14793822766033e-05, | |
| "loss": 0.8878626823425293, | |
| "mean_token_accuracy": 0.7836036533117294, | |
| "num_tokens": 3260849.0, | |
| "step": 205 | |
| }, | |
| { | |
| "entropy": 0.8052125722169876, | |
| "epoch": 1.947867298578199, | |
| "grad_norm": 0.7401100397109985, | |
| "learning_rate": 1.1402048722818862e-05, | |
| "loss": 0.780508279800415, | |
| "mean_token_accuracy": 0.8014171868562698, | |
| "num_tokens": 3277712.0, | |
| "step": 206 | |
| }, | |
| { | |
| "entropy": 0.8691103160381317, | |
| "epoch": 1.957345971563981, | |
| "grad_norm": 0.717732310295105, | |
| "learning_rate": 1.132462954246688e-05, | |
| "loss": 0.8721846342086792, | |
| "mean_token_accuracy": 0.781763345003128, | |
| "num_tokens": 3293613.0, | |
| "step": 207 | |
| }, | |
| { | |
| "entropy": 0.8300742954015732, | |
| "epoch": 1.966824644549763, | |
| "grad_norm": 0.697457492351532, | |
| "learning_rate": 1.1247129463727329e-05, | |
| "loss": 0.8578647375106812, | |
| "mean_token_accuracy": 0.7887941151857376, | |
| "num_tokens": 3309463.0, | |
| "step": 208 | |
| }, | |
| { | |
| "entropy": 0.898152619600296, | |
| "epoch": 1.9763033175355451, | |
| "grad_norm": 0.6961869597434998, | |
| "learning_rate": 1.1169553219720828e-05, | |
| "loss": 0.9294345378875732, | |
| "mean_token_accuracy": 0.7751467376947403, | |
| "num_tokens": 3326245.0, | |
| "step": 209 | |
| }, | |
| { | |
| "entropy": 0.7650076150894165, | |
| "epoch": 1.985781990521327, | |
| "grad_norm": 0.7355565428733826, | |
| "learning_rate": 1.1091905548219597e-05, | |
| "loss": 0.7482494115829468, | |
| "mean_token_accuracy": 0.8114674985408783, | |
| "num_tokens": 3342296.0, | |
| "step": 210 | |
| }, | |
| { | |
| "entropy": 0.8681088984012604, | |
| "epoch": 1.9952606635071088, | |
| "grad_norm": 0.6917135715484619, | |
| "learning_rate": 1.1014191191358118e-05, | |
| "loss": 0.850773811340332, | |
| "mean_token_accuracy": 0.7946626245975494, | |
| "num_tokens": 3357305.0, | |
| "step": 211 | |
| }, | |
| { | |
| "entropy": 0.7847772836685181, | |
| "epoch": 2.0, | |
| "grad_norm": 1.2537461519241333, | |
| "learning_rate": 1.093641489534351e-05, | |
| "loss": 0.8384180665016174, | |
| "mean_token_accuracy": 0.8076447546482086, | |
| "num_tokens": 3364885.0, | |
| "step": 212 | |
| }, | |
| { | |
| "entropy": 0.8841996192932129, | |
| "epoch": 2.009478672985782, | |
| "grad_norm": 0.8463948965072632, | |
| "learning_rate": 1.085858141016566e-05, | |
| "loss": 0.8430708646774292, | |
| "mean_token_accuracy": 0.7934243977069855, | |
| "num_tokens": 3379936.0, | |
| "step": 213 | |
| }, | |
| { | |
| "entropy": 0.8387620151042938, | |
| "epoch": 2.018957345971564, | |
| "grad_norm": 0.654754638671875, | |
| "learning_rate": 1.0780695489307152e-05, | |
| "loss": 0.8108071088790894, | |
| "mean_token_accuracy": 0.7976965010166168, | |
| "num_tokens": 3395599.0, | |
| "step": 214 | |
| }, | |
| { | |
| "entropy": 0.8131097704172134, | |
| "epoch": 2.028436018957346, | |
| "grad_norm": 0.7293988466262817, | |
| "learning_rate": 1.070276188945293e-05, | |
| "loss": 0.808356761932373, | |
| "mean_token_accuracy": 0.7960507720708847, | |
| "num_tokens": 3410351.0, | |
| "step": 215 | |
| }, | |
| { | |
| "entropy": 0.8139905482530594, | |
| "epoch": 2.037914691943128, | |
| "grad_norm": 0.8644604682922363, | |
| "learning_rate": 1.062478537019983e-05, | |
| "loss": 0.8001759052276611, | |
| "mean_token_accuracy": 0.8025805652141571, | |
| "num_tokens": 3425898.0, | |
| "step": 216 | |
| }, | |
| { | |
| "entropy": 0.7704232633113861, | |
| "epoch": 2.0473933649289098, | |
| "grad_norm": 0.6959671974182129, | |
| "learning_rate": 1.0546770693765859e-05, | |
| "loss": 0.7146846055984497, | |
| "mean_token_accuracy": 0.820914089679718, | |
| "num_tokens": 3441747.0, | |
| "step": 217 | |
| }, | |
| { | |
| "entropy": 0.774506226181984, | |
| "epoch": 2.056872037914692, | |
| "grad_norm": 0.7746772170066833, | |
| "learning_rate": 1.0468722624699401e-05, | |
| "loss": 0.7572252750396729, | |
| "mean_token_accuracy": 0.8087823241949081, | |
| "num_tokens": 3457430.0, | |
| "step": 218 | |
| }, | |
| { | |
| "entropy": 0.75509412586689, | |
| "epoch": 2.066350710900474, | |
| "grad_norm": 0.6639065742492676, | |
| "learning_rate": 1.0390645929588197e-05, | |
| "loss": 0.7484322786331177, | |
| "mean_token_accuracy": 0.8096154481172562, | |
| "num_tokens": 3473449.0, | |
| "step": 219 | |
| }, | |
| { | |
| "entropy": 0.8533463031053543, | |
| "epoch": 2.075829383886256, | |
| "grad_norm": 1.005789875984192, | |
| "learning_rate": 1.0312545376768246e-05, | |
| "loss": 0.8439828157424927, | |
| "mean_token_accuracy": 0.785793200135231, | |
| "num_tokens": 3488206.0, | |
| "step": 220 | |
| }, | |
| { | |
| "entropy": 0.8072682321071625, | |
| "epoch": 2.085308056872038, | |
| "grad_norm": 0.6852388978004456, | |
| "learning_rate": 1.0234425736032607e-05, | |
| "loss": 0.8122374415397644, | |
| "mean_token_accuracy": 0.8027433753013611, | |
| "num_tokens": 3504731.0, | |
| "step": 221 | |
| }, | |
| { | |
| "entropy": 0.7405507862567902, | |
| "epoch": 2.09478672985782, | |
| "grad_norm": 0.6730477213859558, | |
| "learning_rate": 1.015629177834008e-05, | |
| "loss": 0.6996761560440063, | |
| "mean_token_accuracy": 0.8150836676359177, | |
| "num_tokens": 3520301.0, | |
| "step": 222 | |
| }, | |
| { | |
| "entropy": 0.8268642425537109, | |
| "epoch": 2.104265402843602, | |
| "grad_norm": 0.8005897402763367, | |
| "learning_rate": 1.007814827552384e-05, | |
| "loss": 0.8099018335342407, | |
| "mean_token_accuracy": 0.7979452759027481, | |
| "num_tokens": 3536522.0, | |
| "step": 223 | |
| }, | |
| { | |
| "entropy": 0.8399747163057327, | |
| "epoch": 2.1137440758293837, | |
| "grad_norm": 1.2597578763961792, | |
| "learning_rate": 1e-05, | |
| "loss": 0.8286103010177612, | |
| "mean_token_accuracy": 0.7938411682844162, | |
| "num_tokens": 3551654.0, | |
| "step": 224 | |
| }, | |
| { | |
| "entropy": 0.761796772480011, | |
| "epoch": 2.123222748815166, | |
| "grad_norm": 0.689793586730957, | |
| "learning_rate": 9.92185172447616e-06, | |
| "loss": 0.771800696849823, | |
| "mean_token_accuracy": 0.8083701580762863, | |
| "num_tokens": 3568275.0, | |
| "step": 225 | |
| }, | |
| { | |
| "entropy": 0.8368710279464722, | |
| "epoch": 2.132701421800948, | |
| "grad_norm": 0.7853175401687622, | |
| "learning_rate": 9.843708221659924e-06, | |
| "loss": 0.8215783834457397, | |
| "mean_token_accuracy": 0.7981610000133514, | |
| "num_tokens": 3583250.0, | |
| "step": 226 | |
| }, | |
| { | |
| "entropy": 0.7556120902299881, | |
| "epoch": 2.1421800947867298, | |
| "grad_norm": 0.7487472295761108, | |
| "learning_rate": 9.765574263967397e-06, | |
| "loss": 0.7591477036476135, | |
| "mean_token_accuracy": 0.8063594847917557, | |
| "num_tokens": 3599684.0, | |
| "step": 227 | |
| }, | |
| { | |
| "entropy": 0.7687488049268723, | |
| "epoch": 2.1516587677725116, | |
| "grad_norm": 0.7922394275665283, | |
| "learning_rate": 9.68745462323176e-06, | |
| "loss": 0.7621670365333557, | |
| "mean_token_accuracy": 0.8083523958921432, | |
| "num_tokens": 3615776.0, | |
| "step": 228 | |
| }, | |
| { | |
| "entropy": 0.7205257564783096, | |
| "epoch": 2.161137440758294, | |
| "grad_norm": 0.8748376369476318, | |
| "learning_rate": 9.609354070411807e-06, | |
| "loss": 0.7108169794082642, | |
| "mean_token_accuracy": 0.8180408775806427, | |
| "num_tokens": 3631914.0, | |
| "step": 229 | |
| }, | |
| { | |
| "entropy": 0.7599902749061584, | |
| "epoch": 2.170616113744076, | |
| "grad_norm": 0.6921991109848022, | |
| "learning_rate": 9.531277375300599e-06, | |
| "loss": 0.7450228929519653, | |
| "mean_token_accuracy": 0.815703496336937, | |
| "num_tokens": 3648945.0, | |
| "step": 230 | |
| }, | |
| { | |
| "entropy": 0.7611272037029266, | |
| "epoch": 2.1800947867298577, | |
| "grad_norm": 0.6716788411140442, | |
| "learning_rate": 9.453229306234143e-06, | |
| "loss": 0.769507646560669, | |
| "mean_token_accuracy": 0.8149918764829636, | |
| "num_tokens": 3664617.0, | |
| "step": 231 | |
| }, | |
| { | |
| "entropy": 0.8009829074144363, | |
| "epoch": 2.18957345971564, | |
| "grad_norm": 0.7548602223396301, | |
| "learning_rate": 9.375214629800173e-06, | |
| "loss": 0.796088695526123, | |
| "mean_token_accuracy": 0.7964328676462173, | |
| "num_tokens": 3679523.0, | |
| "step": 232 | |
| }, | |
| { | |
| "entropy": 0.7614959329366684, | |
| "epoch": 2.199052132701422, | |
| "grad_norm": 0.7149446606636047, | |
| "learning_rate": 9.297238110547075e-06, | |
| "loss": 0.7764595746994019, | |
| "mean_token_accuracy": 0.7976703941822052, | |
| "num_tokens": 3696512.0, | |
| "step": 233 | |
| }, | |
| { | |
| "entropy": 0.8016138821840286, | |
| "epoch": 2.2085308056872037, | |
| "grad_norm": 0.8237394690513611, | |
| "learning_rate": 9.219304510692853e-06, | |
| "loss": 0.7986416220664978, | |
| "mean_token_accuracy": 0.7949725389480591, | |
| "num_tokens": 3713141.0, | |
| "step": 234 | |
| }, | |
| { | |
| "entropy": 0.7871145308017731, | |
| "epoch": 2.2180094786729856, | |
| "grad_norm": 0.7732126116752625, | |
| "learning_rate": 9.14141858983434e-06, | |
| "loss": 0.8035563230514526, | |
| "mean_token_accuracy": 0.8023014217615128, | |
| "num_tokens": 3729153.0, | |
| "step": 235 | |
| }, | |
| { | |
| "entropy": 0.782372236251831, | |
| "epoch": 2.227488151658768, | |
| "grad_norm": 0.7351176738739014, | |
| "learning_rate": 9.063585104656494e-06, | |
| "loss": 0.765059769153595, | |
| "mean_token_accuracy": 0.8090708404779434, | |
| "num_tokens": 3744796.0, | |
| "step": 236 | |
| }, | |
| { | |
| "entropy": 0.8335847705602646, | |
| "epoch": 2.2369668246445498, | |
| "grad_norm": 0.6978676319122314, | |
| "learning_rate": 8.985808808641883e-06, | |
| "loss": 0.8426876664161682, | |
| "mean_token_accuracy": 0.790928304195404, | |
| "num_tokens": 3761370.0, | |
| "step": 237 | |
| }, | |
| { | |
| "entropy": 0.8038548976182938, | |
| "epoch": 2.2464454976303316, | |
| "grad_norm": 0.7371619939804077, | |
| "learning_rate": 8.908094451780408e-06, | |
| "loss": 0.8280337452888489, | |
| "mean_token_accuracy": 0.7949777245521545, | |
| "num_tokens": 3777078.0, | |
| "step": 238 | |
| }, | |
| { | |
| "entropy": 0.7301145642995834, | |
| "epoch": 2.2559241706161135, | |
| "grad_norm": 0.7276929020881653, | |
| "learning_rate": 8.830446780279175e-06, | |
| "loss": 0.7212296724319458, | |
| "mean_token_accuracy": 0.8165305256843567, | |
| "num_tokens": 3793426.0, | |
| "step": 239 | |
| }, | |
| { | |
| "entropy": 0.7998540550470352, | |
| "epoch": 2.265402843601896, | |
| "grad_norm": 0.6656012535095215, | |
| "learning_rate": 8.752870536272673e-06, | |
| "loss": 0.7798458337783813, | |
| "mean_token_accuracy": 0.7948831617832184, | |
| "num_tokens": 3809150.0, | |
| "step": 240 | |
| }, | |
| { | |
| "entropy": 0.8189331293106079, | |
| "epoch": 2.2748815165876777, | |
| "grad_norm": 0.7528814673423767, | |
| "learning_rate": 8.675370457533122e-06, | |
| "loss": 0.8173599243164062, | |
| "mean_token_accuracy": 0.7954176068305969, | |
| "num_tokens": 3824693.0, | |
| "step": 241 | |
| }, | |
| { | |
| "entropy": 0.7802784144878387, | |
| "epoch": 2.2843601895734595, | |
| "grad_norm": 0.8117206692695618, | |
| "learning_rate": 8.597951277181143e-06, | |
| "loss": 0.8089747428894043, | |
| "mean_token_accuracy": 0.8035465627908707, | |
| "num_tokens": 3840276.0, | |
| "step": 242 | |
| }, | |
| { | |
| "entropy": 0.8260502219200134, | |
| "epoch": 2.293838862559242, | |
| "grad_norm": 0.7881060242652893, | |
| "learning_rate": 8.520617723396702e-06, | |
| "loss": 0.8358763456344604, | |
| "mean_token_accuracy": 0.7957090586423874, | |
| "num_tokens": 3856007.0, | |
| "step": 243 | |
| }, | |
| { | |
| "entropy": 0.7820662707090378, | |
| "epoch": 2.3033175355450237, | |
| "grad_norm": 0.7249373197555542, | |
| "learning_rate": 8.443374519130332e-06, | |
| "loss": 0.7823531031608582, | |
| "mean_token_accuracy": 0.8001191914081573, | |
| "num_tokens": 3872300.0, | |
| "step": 244 | |
| }, | |
| { | |
| "entropy": 0.8399253487586975, | |
| "epoch": 2.3127962085308056, | |
| "grad_norm": 0.8324182629585266, | |
| "learning_rate": 8.366226381814698e-06, | |
| "loss": 0.8358606100082397, | |
| "mean_token_accuracy": 0.7946343570947647, | |
| "num_tokens": 3888174.0, | |
| "step": 245 | |
| }, | |
| { | |
| "entropy": 0.7229929268360138, | |
| "epoch": 2.322274881516588, | |
| "grad_norm": 0.8482832312583923, | |
| "learning_rate": 8.289178023076501e-06, | |
| "loss": 0.6760922074317932, | |
| "mean_token_accuracy": 0.8288819640874863, | |
| "num_tokens": 3904294.0, | |
| "step": 246 | |
| }, | |
| { | |
| "entropy": 0.756013348698616, | |
| "epoch": 2.3317535545023698, | |
| "grad_norm": 0.7549688816070557, | |
| "learning_rate": 8.212234148448708e-06, | |
| "loss": 0.7417251467704773, | |
| "mean_token_accuracy": 0.8116127997636795, | |
| "num_tokens": 3921037.0, | |
| "step": 247 | |
| }, | |
| { | |
| "entropy": 0.8636997491121292, | |
| "epoch": 2.3412322274881516, | |
| "grad_norm": 0.7435200810432434, | |
| "learning_rate": 8.13539945708319e-06, | |
| "loss": 0.8481851816177368, | |
| "mean_token_accuracy": 0.7855764031410217, | |
| "num_tokens": 3936399.0, | |
| "step": 248 | |
| }, | |
| { | |
| "entropy": 0.7722165882587433, | |
| "epoch": 2.3507109004739335, | |
| "grad_norm": 0.82920241355896, | |
| "learning_rate": 8.058678641463724e-06, | |
| "loss": 0.7838011980056763, | |
| "mean_token_accuracy": 0.7997405827045441, | |
| "num_tokens": 3951517.0, | |
| "step": 249 | |
| }, | |
| { | |
| "entropy": 0.7858306765556335, | |
| "epoch": 2.360189573459716, | |
| "grad_norm": 0.8119862675666809, | |
| "learning_rate": 7.98207638711942e-06, | |
| "loss": 0.7850341796875, | |
| "mean_token_accuracy": 0.8023760467767715, | |
| "num_tokens": 3966658.0, | |
| "step": 250 | |
| }, | |
| { | |
| "entropy": 0.713519737124443, | |
| "epoch": 2.3696682464454977, | |
| "grad_norm": 0.7162942886352539, | |
| "learning_rate": 7.905597372338558e-06, | |
| "loss": 0.6873078346252441, | |
| "mean_token_accuracy": 0.8234204798936844, | |
| "num_tokens": 3983390.0, | |
| "step": 251 | |
| }, | |
| { | |
| "entropy": 0.8907800167798996, | |
| "epoch": 2.3791469194312795, | |
| "grad_norm": 0.7015475034713745, | |
| "learning_rate": 7.829246267882864e-06, | |
| "loss": 0.9189035892486572, | |
| "mean_token_accuracy": 0.7817337810993195, | |
| "num_tokens": 3999794.0, | |
| "step": 252 | |
| }, | |
| { | |
| "entropy": 0.797465056180954, | |
| "epoch": 2.3886255924170614, | |
| "grad_norm": 0.8406365513801575, | |
| "learning_rate": 7.753027736702283e-06, | |
| "loss": 0.7891885042190552, | |
| "mean_token_accuracy": 0.8100517839193344, | |
| "num_tokens": 4015172.0, | |
| "step": 253 | |
| }, | |
| { | |
| "entropy": 0.7660058289766312, | |
| "epoch": 2.3981042654028437, | |
| "grad_norm": 0.795070230960846, | |
| "learning_rate": 7.67694643365017e-06, | |
| "loss": 0.7752566933631897, | |
| "mean_token_accuracy": 0.8050341606140137, | |
| "num_tokens": 4031362.0, | |
| "step": 254 | |
| }, | |
| { | |
| "entropy": 0.750129446387291, | |
| "epoch": 2.4075829383886256, | |
| "grad_norm": 0.7366607189178467, | |
| "learning_rate": 7.601007005199022e-06, | |
| "loss": 0.7281040549278259, | |
| "mean_token_accuracy": 0.8136068880558014, | |
| "num_tokens": 4047478.0, | |
| "step": 255 | |
| }, | |
| { | |
| "entropy": 0.7423243522644043, | |
| "epoch": 2.4170616113744074, | |
| "grad_norm": 0.7090871930122375, | |
| "learning_rate": 7.525214089156714e-06, | |
| "loss": 0.7064968347549438, | |
| "mean_token_accuracy": 0.824029728770256, | |
| "num_tokens": 4062615.0, | |
| "step": 256 | |
| }, | |
| { | |
| "entropy": 0.8094542026519775, | |
| "epoch": 2.4265402843601898, | |
| "grad_norm": 0.7741425633430481, | |
| "learning_rate": 7.449572314383237e-06, | |
| "loss": 0.8382295370101929, | |
| "mean_token_accuracy": 0.7949035614728928, | |
| "num_tokens": 4079018.0, | |
| "step": 257 | |
| }, | |
| { | |
| "entropy": 0.762347549200058, | |
| "epoch": 2.4360189573459716, | |
| "grad_norm": 0.7608873844146729, | |
| "learning_rate": 7.374086300508019e-06, | |
| "loss": 0.747887134552002, | |
| "mean_token_accuracy": 0.8116944581270218, | |
| "num_tokens": 4094608.0, | |
| "step": 258 | |
| }, | |
| { | |
| "entropy": 0.8015813231468201, | |
| "epoch": 2.4454976303317535, | |
| "grad_norm": 0.8000101447105408, | |
| "learning_rate": 7.298760657647779e-06, | |
| "loss": 0.7769061923027039, | |
| "mean_token_accuracy": 0.8080571591854095, | |
| "num_tokens": 4110049.0, | |
| "step": 259 | |
| }, | |
| { | |
| "entropy": 0.7255821079015732, | |
| "epoch": 2.4549763033175354, | |
| "grad_norm": 0.7408568859100342, | |
| "learning_rate": 7.223599986124994e-06, | |
| "loss": 0.7272932529449463, | |
| "mean_token_accuracy": 0.8140795826911926, | |
| "num_tokens": 4126621.0, | |
| "step": 260 | |
| }, | |
| { | |
| "entropy": 0.753772422671318, | |
| "epoch": 2.4644549763033177, | |
| "grad_norm": 0.7682618498802185, | |
| "learning_rate": 7.148608876186931e-06, | |
| "loss": 0.7212350368499756, | |
| "mean_token_accuracy": 0.8186078071594238, | |
| "num_tokens": 4142397.0, | |
| "step": 261 | |
| }, | |
| { | |
| "entropy": 0.855003148317337, | |
| "epoch": 2.4739336492890995, | |
| "grad_norm": 0.8246680498123169, | |
| "learning_rate": 7.073791907725304e-06, | |
| "loss": 0.861223578453064, | |
| "mean_token_accuracy": 0.7906257957220078, | |
| "num_tokens": 4157512.0, | |
| "step": 262 | |
| }, | |
| { | |
| "entropy": 0.7049210220575333, | |
| "epoch": 2.4834123222748814, | |
| "grad_norm": 0.7506635189056396, | |
| "learning_rate": 6.999153649996595e-06, | |
| "loss": 0.6862914562225342, | |
| "mean_token_accuracy": 0.8244545608758926, | |
| "num_tokens": 4173292.0, | |
| "step": 263 | |
| }, | |
| { | |
| "entropy": 0.7707203775644302, | |
| "epoch": 2.4928909952606633, | |
| "grad_norm": 0.7590407729148865, | |
| "learning_rate": 6.924698661342968e-06, | |
| "loss": 0.7646045684814453, | |
| "mean_token_accuracy": 0.8014439344406128, | |
| "num_tokens": 4189278.0, | |
| "step": 264 | |
| }, | |
| { | |
| "entropy": 0.7228768467903137, | |
| "epoch": 2.5023696682464456, | |
| "grad_norm": 0.7332707047462463, | |
| "learning_rate": 6.8504314889138956e-06, | |
| "loss": 0.7292245626449585, | |
| "mean_token_accuracy": 0.817297637462616, | |
| "num_tokens": 4205312.0, | |
| "step": 265 | |
| }, | |
| { | |
| "entropy": 0.762321949005127, | |
| "epoch": 2.5118483412322274, | |
| "grad_norm": 0.763211190700531, | |
| "learning_rate": 6.776356668388464e-06, | |
| "loss": 0.7480930685997009, | |
| "mean_token_accuracy": 0.8084921091794968, | |
| "num_tokens": 4222070.0, | |
| "step": 266 | |
| }, | |
| { | |
| "entropy": 0.7291257232427597, | |
| "epoch": 2.5213270142180093, | |
| "grad_norm": 0.7732436060905457, | |
| "learning_rate": 6.702478723698336e-06, | |
| "loss": 0.7159913182258606, | |
| "mean_token_accuracy": 0.8156870156526566, | |
| "num_tokens": 4238086.0, | |
| "step": 267 | |
| }, | |
| { | |
| "entropy": 0.7663960009813309, | |
| "epoch": 2.5308056872037916, | |
| "grad_norm": 0.7954282760620117, | |
| "learning_rate": 6.628802166751496e-06, | |
| "loss": 0.7697724103927612, | |
| "mean_token_accuracy": 0.8044588267803192, | |
| "num_tokens": 4253062.0, | |
| "step": 268 | |
| }, | |
| { | |
| "entropy": 0.7694273293018341, | |
| "epoch": 2.5402843601895735, | |
| "grad_norm": 0.8077139854431152, | |
| "learning_rate": 6.555331497156671e-06, | |
| "loss": 0.7804722785949707, | |
| "mean_token_accuracy": 0.8017380684614182, | |
| "num_tokens": 4268074.0, | |
| "step": 269 | |
| }, | |
| { | |
| "entropy": 0.7905916273593903, | |
| "epoch": 2.5497630331753554, | |
| "grad_norm": 0.7760480642318726, | |
| "learning_rate": 6.482071201948557e-06, | |
| "loss": 0.8075841665267944, | |
| "mean_token_accuracy": 0.7980497181415558, | |
| "num_tokens": 4284203.0, | |
| "step": 270 | |
| }, | |
| { | |
| "entropy": 0.797221377491951, | |
| "epoch": 2.5592417061611377, | |
| "grad_norm": 0.7789294123649597, | |
| "learning_rate": 6.4090257553137566e-06, | |
| "loss": 0.8040370345115662, | |
| "mean_token_accuracy": 0.8009522259235382, | |
| "num_tokens": 4298529.0, | |
| "step": 271 | |
| }, | |
| { | |
| "entropy": 0.7779994606971741, | |
| "epoch": 2.5687203791469195, | |
| "grad_norm": 0.7936727404594421, | |
| "learning_rate": 6.336199618317538e-06, | |
| "loss": 0.7707229256629944, | |
| "mean_token_accuracy": 0.813504621386528, | |
| "num_tokens": 4314557.0, | |
| "step": 272 | |
| }, | |
| { | |
| "entropy": 0.8157918900251389, | |
| "epoch": 2.5781990521327014, | |
| "grad_norm": 0.7624719142913818, | |
| "learning_rate": 6.263597238631405e-06, | |
| "loss": 0.8281031847000122, | |
| "mean_token_accuracy": 0.7870993167161942, | |
| "num_tokens": 4330159.0, | |
| "step": 273 | |
| }, | |
| { | |
| "entropy": 0.8026942908763885, | |
| "epoch": 2.5876777251184833, | |
| "grad_norm": 0.7207754254341125, | |
| "learning_rate": 6.191223050261432e-06, | |
| "loss": 0.7978195548057556, | |
| "mean_token_accuracy": 0.7940388768911362, | |
| "num_tokens": 4345645.0, | |
| "step": 274 | |
| }, | |
| { | |
| "entropy": 0.8023267984390259, | |
| "epoch": 2.597156398104265, | |
| "grad_norm": 0.764500081539154, | |
| "learning_rate": 6.119081473277502e-06, | |
| "loss": 0.7886009216308594, | |
| "mean_token_accuracy": 0.8018842190504074, | |
| "num_tokens": 4361883.0, | |
| "step": 275 | |
| }, | |
| { | |
| "entropy": 0.7076465785503387, | |
| "epoch": 2.6066350710900474, | |
| "grad_norm": 0.8487837910652161, | |
| "learning_rate": 6.047176913543348e-06, | |
| "loss": 0.7041523456573486, | |
| "mean_token_accuracy": 0.8193930238485336, | |
| "num_tokens": 4377678.0, | |
| "step": 276 | |
| }, | |
| { | |
| "entropy": 0.746178925037384, | |
| "epoch": 2.6161137440758293, | |
| "grad_norm": 0.7743781805038452, | |
| "learning_rate": 5.975513762447465e-06, | |
| "loss": 0.7622804641723633, | |
| "mean_token_accuracy": 0.8078963905572891, | |
| "num_tokens": 4393473.0, | |
| "step": 277 | |
| }, | |
| { | |
| "entropy": 0.7890691012144089, | |
| "epoch": 2.625592417061611, | |
| "grad_norm": 0.6980987191200256, | |
| "learning_rate": 5.904096396634935e-06, | |
| "loss": 0.7980008125305176, | |
| "mean_token_accuracy": 0.8008007407188416, | |
| "num_tokens": 4410182.0, | |
| "step": 278 | |
| }, | |
| { | |
| "entropy": 0.6923302114009857, | |
| "epoch": 2.6350710900473935, | |
| "grad_norm": 0.6928261518478394, | |
| "learning_rate": 5.832929177740134e-06, | |
| "loss": 0.7043225765228271, | |
| "mean_token_accuracy": 0.827838584780693, | |
| "num_tokens": 4427098.0, | |
| "step": 279 | |
| }, | |
| { | |
| "entropy": 0.7364738285541534, | |
| "epoch": 2.6445497630331753, | |
| "grad_norm": 0.742316722869873, | |
| "learning_rate": 5.762016452120336e-06, | |
| "loss": 0.7640200853347778, | |
| "mean_token_accuracy": 0.8073264956474304, | |
| "num_tokens": 4443351.0, | |
| "step": 280 | |
| }, | |
| { | |
| "entropy": 0.7357928454875946, | |
| "epoch": 2.654028436018957, | |
| "grad_norm": 0.7527856826782227, | |
| "learning_rate": 5.6913625505902966e-06, | |
| "loss": 0.7566514015197754, | |
| "mean_token_accuracy": 0.8084013164043427, | |
| "num_tokens": 4459636.0, | |
| "step": 281 | |
| }, | |
| { | |
| "entropy": 0.7626573443412781, | |
| "epoch": 2.6635071090047395, | |
| "grad_norm": 0.7092803120613098, | |
| "learning_rate": 5.620971788157741e-06, | |
| "loss": 0.7342270612716675, | |
| "mean_token_accuracy": 0.8117237538099289, | |
| "num_tokens": 4475796.0, | |
| "step": 282 | |
| }, | |
| { | |
| "entropy": 0.7978230267763138, | |
| "epoch": 2.6729857819905214, | |
| "grad_norm": 1.0863252878189087, | |
| "learning_rate": 5.550848463759835e-06, | |
| "loss": 0.7835026979446411, | |
| "mean_token_accuracy": 0.8040365278720856, | |
| "num_tokens": 4492101.0, | |
| "step": 283 | |
| }, | |
| { | |
| "entropy": 0.802836075425148, | |
| "epoch": 2.6824644549763033, | |
| "grad_norm": 0.777554988861084, | |
| "learning_rate": 5.480996860000664e-06, | |
| "loss": 0.824481725692749, | |
| "mean_token_accuracy": 0.7915899306535721, | |
| "num_tokens": 4509292.0, | |
| "step": 284 | |
| }, | |
| { | |
| "entropy": 0.7397547513246536, | |
| "epoch": 2.6919431279620856, | |
| "grad_norm": 0.7662098407745361, | |
| "learning_rate": 5.411421242889643e-06, | |
| "loss": 0.725763201713562, | |
| "mean_token_accuracy": 0.8155921995639801, | |
| "num_tokens": 4525300.0, | |
| "step": 285 | |
| }, | |
| { | |
| "entropy": 0.8510775417089462, | |
| "epoch": 2.7014218009478674, | |
| "grad_norm": 0.7985575199127197, | |
| "learning_rate": 5.342125861581022e-06, | |
| "loss": 0.844895601272583, | |
| "mean_token_accuracy": 0.7873697280883789, | |
| "num_tokens": 4541512.0, | |
| "step": 286 | |
| }, | |
| { | |
| "entropy": 0.7734527587890625, | |
| "epoch": 2.7109004739336493, | |
| "grad_norm": 0.7651690244674683, | |
| "learning_rate": 5.273114948114346e-06, | |
| "loss": 0.7542927265167236, | |
| "mean_token_accuracy": 0.8083190619945526, | |
| "num_tokens": 4557742.0, | |
| "step": 287 | |
| }, | |
| { | |
| "entropy": 0.7724734246730804, | |
| "epoch": 2.720379146919431, | |
| "grad_norm": 0.8156644701957703, | |
| "learning_rate": 5.204392717156021e-06, | |
| "loss": 0.7651525139808655, | |
| "mean_token_accuracy": 0.803424209356308, | |
| "num_tokens": 4573502.0, | |
| "step": 288 | |
| }, | |
| { | |
| "entropy": 0.7836635559797287, | |
| "epoch": 2.729857819905213, | |
| "grad_norm": 0.8487071990966797, | |
| "learning_rate": 5.135963365741892e-06, | |
| "loss": 0.7864376306533813, | |
| "mean_token_accuracy": 0.8052554726600647, | |
| "num_tokens": 4588151.0, | |
| "step": 289 | |
| }, | |
| { | |
| "entropy": 0.7748328298330307, | |
| "epoch": 2.7393364928909953, | |
| "grad_norm": 0.7466703057289124, | |
| "learning_rate": 5.067831073020928e-06, | |
| "loss": 0.7420310378074646, | |
| "mean_token_accuracy": 0.8074294477701187, | |
| "num_tokens": 4604464.0, | |
| "step": 290 | |
| }, | |
| { | |
| "entropy": 0.8272679448127747, | |
| "epoch": 2.748815165876777, | |
| "grad_norm": 0.7622450590133667, | |
| "learning_rate": 5.000000000000003e-06, | |
| "loss": 0.7982164621353149, | |
| "mean_token_accuracy": 0.7979923784732819, | |
| "num_tokens": 4620322.0, | |
| "step": 291 | |
| }, | |
| { | |
| "entropy": 0.785077303647995, | |
| "epoch": 2.758293838862559, | |
| "grad_norm": 0.6796573996543884, | |
| "learning_rate": 4.932474289289748e-06, | |
| "loss": 0.7424709796905518, | |
| "mean_token_accuracy": 0.8152509778738022, | |
| "num_tokens": 4638091.0, | |
| "step": 292 | |
| }, | |
| { | |
| "entropy": 0.8112717866897583, | |
| "epoch": 2.7677725118483414, | |
| "grad_norm": 0.7888740301132202, | |
| "learning_rate": 4.865258064851579e-06, | |
| "loss": 0.8013285398483276, | |
| "mean_token_accuracy": 0.8086142092943192, | |
| "num_tokens": 4654194.0, | |
| "step": 293 | |
| }, | |
| { | |
| "entropy": 0.8527653962373734, | |
| "epoch": 2.7772511848341233, | |
| "grad_norm": 0.728568971157074, | |
| "learning_rate": 4.7983554317458204e-06, | |
| "loss": 0.8419825434684753, | |
| "mean_token_accuracy": 0.7937730550765991, | |
| "num_tokens": 4670285.0, | |
| "step": 294 | |
| }, | |
| { | |
| "entropy": 0.7469000071287155, | |
| "epoch": 2.786729857819905, | |
| "grad_norm": 0.7420099377632141, | |
| "learning_rate": 4.731770475880995e-06, | |
| "loss": 0.7262124419212341, | |
| "mean_token_accuracy": 0.8157135248184204, | |
| "num_tokens": 4686154.0, | |
| "step": 295 | |
| }, | |
| { | |
| "entropy": 0.756457582116127, | |
| "epoch": 2.7962085308056874, | |
| "grad_norm": 0.7652328610420227, | |
| "learning_rate": 4.665507263764299e-06, | |
| "loss": 0.7771545648574829, | |
| "mean_token_accuracy": 0.8060846775770187, | |
| "num_tokens": 4702538.0, | |
| "step": 296 | |
| }, | |
| { | |
| "entropy": 0.8069571703672409, | |
| "epoch": 2.8056872037914693, | |
| "grad_norm": 0.8068948984146118, | |
| "learning_rate": 4.599569842253244e-06, | |
| "loss": 0.794147253036499, | |
| "mean_token_accuracy": 0.7936616390943527, | |
| "num_tokens": 4717366.0, | |
| "step": 297 | |
| }, | |
| { | |
| "entropy": 0.7628475576639175, | |
| "epoch": 2.815165876777251, | |
| "grad_norm": 0.7466553449630737, | |
| "learning_rate": 4.5339622383085095e-06, | |
| "loss": 0.7227836847305298, | |
| "mean_token_accuracy": 0.8107776045799255, | |
| "num_tokens": 4733915.0, | |
| "step": 298 | |
| }, | |
| { | |
| "entropy": 0.7967563271522522, | |
| "epoch": 2.824644549763033, | |
| "grad_norm": 0.7297375202178955, | |
| "learning_rate": 4.468688458748006e-06, | |
| "loss": 0.78502357006073, | |
| "mean_token_accuracy": 0.7997068613767624, | |
| "num_tokens": 4751223.0, | |
| "step": 299 | |
| }, | |
| { | |
| "entropy": 0.7497158050537109, | |
| "epoch": 2.834123222748815, | |
| "grad_norm": 0.7914976477622986, | |
| "learning_rate": 4.40375249000216e-06, | |
| "loss": 0.7375324964523315, | |
| "mean_token_accuracy": 0.81224524974823, | |
| "num_tokens": 4767616.0, | |
| "step": 300 | |
| }, | |
| { | |
| "entropy": 0.763348177075386, | |
| "epoch": 2.843601895734597, | |
| "grad_norm": 0.7949478030204773, | |
| "learning_rate": 4.339158297870469e-06, | |
| "loss": 0.7617372870445251, | |
| "mean_token_accuracy": 0.8068404793739319, | |
| "num_tokens": 4783264.0, | |
| "step": 301 | |
| }, | |
| { | |
| "entropy": 0.796427458524704, | |
| "epoch": 2.853080568720379, | |
| "grad_norm": 0.8715385794639587, | |
| "learning_rate": 4.274909827279283e-06, | |
| "loss": 0.7832701206207275, | |
| "mean_token_accuracy": 0.8009211868047714, | |
| "num_tokens": 4798500.0, | |
| "step": 302 | |
| }, | |
| { | |
| "entropy": 0.8474353551864624, | |
| "epoch": 2.862559241706161, | |
| "grad_norm": 0.6873095035552979, | |
| "learning_rate": 4.211011002040885e-06, | |
| "loss": 0.8359758853912354, | |
| "mean_token_accuracy": 0.7899835258722305, | |
| "num_tokens": 4815731.0, | |
| "step": 303 | |
| }, | |
| { | |
| "entropy": 0.7330864071846008, | |
| "epoch": 2.8720379146919433, | |
| "grad_norm": 0.771339476108551, | |
| "learning_rate": 4.14746572461387e-06, | |
| "loss": 0.7329787611961365, | |
| "mean_token_accuracy": 0.8161643743515015, | |
| "num_tokens": 4831321.0, | |
| "step": 304 | |
| }, | |
| { | |
| "entropy": 0.7069019228219986, | |
| "epoch": 2.881516587677725, | |
| "grad_norm": 0.7667499780654907, | |
| "learning_rate": 4.084277875864776e-06, | |
| "loss": 0.6972311735153198, | |
| "mean_token_accuracy": 0.8227463960647583, | |
| "num_tokens": 4847145.0, | |
| "step": 305 | |
| }, | |
| { | |
| "entropy": 0.7471538782119751, | |
| "epoch": 2.890995260663507, | |
| "grad_norm": 0.7328512668609619, | |
| "learning_rate": 4.021451314831113e-06, | |
| "loss": 0.7734975814819336, | |
| "mean_token_accuracy": 0.8095995932817459, | |
| "num_tokens": 4863550.0, | |
| "step": 306 | |
| }, | |
| { | |
| "entropy": 0.7084924876689911, | |
| "epoch": 2.9004739336492893, | |
| "grad_norm": 0.707899808883667, | |
| "learning_rate": 3.958989878485644e-06, | |
| "loss": 0.6905066967010498, | |
| "mean_token_accuracy": 0.8189463019371033, | |
| "num_tokens": 4879708.0, | |
| "step": 307 | |
| }, | |
| { | |
| "entropy": 0.8038249611854553, | |
| "epoch": 2.909952606635071, | |
| "grad_norm": 0.7571305632591248, | |
| "learning_rate": 3.896897381502081e-06, | |
| "loss": 0.8163067102432251, | |
| "mean_token_accuracy": 0.797750398516655, | |
| "num_tokens": 4896533.0, | |
| "step": 308 | |
| }, | |
| { | |
| "entropy": 0.8064074218273163, | |
| "epoch": 2.919431279620853, | |
| "grad_norm": 0.7309732437133789, | |
| "learning_rate": 3.83517761602209e-06, | |
| "loss": 0.792744517326355, | |
| "mean_token_accuracy": 0.7964637726545334, | |
| "num_tokens": 4912287.0, | |
| "step": 309 | |
| }, | |
| { | |
| "entropy": 0.7508046329021454, | |
| "epoch": 2.9289099526066353, | |
| "grad_norm": 0.8369373679161072, | |
| "learning_rate": 3.773834351423711e-06, | |
| "loss": 0.7411284446716309, | |
| "mean_token_accuracy": 0.8095607906579971, | |
| "num_tokens": 4927555.0, | |
| "step": 310 | |
| }, | |
| { | |
| "entropy": 0.7564988732337952, | |
| "epoch": 2.938388625592417, | |
| "grad_norm": 0.859829843044281, | |
| "learning_rate": 3.712871334091154e-06, | |
| "loss": 0.7624916434288025, | |
| "mean_token_accuracy": 0.8062203228473663, | |
| "num_tokens": 4942708.0, | |
| "step": 311 | |
| }, | |
| { | |
| "entropy": 0.7336708754301071, | |
| "epoch": 2.947867298578199, | |
| "grad_norm": 0.7614916563034058, | |
| "learning_rate": 3.652292287185979e-06, | |
| "loss": 0.737271785736084, | |
| "mean_token_accuracy": 0.8161173611879349, | |
| "num_tokens": 4958673.0, | |
| "step": 312 | |
| }, | |
| { | |
| "entropy": 0.7636863887310028, | |
| "epoch": 2.957345971563981, | |
| "grad_norm": 0.7981767654418945, | |
| "learning_rate": 3.592100910419738e-06, | |
| "loss": 0.7631924748420715, | |
| "mean_token_accuracy": 0.8060728013515472, | |
| "num_tokens": 4974424.0, | |
| "step": 313 | |
| }, | |
| { | |
| "entropy": 0.7589444667100906, | |
| "epoch": 2.966824644549763, | |
| "grad_norm": 0.7552541494369507, | |
| "learning_rate": 3.532300879828013e-06, | |
| "loss": 0.7564074993133545, | |
| "mean_token_accuracy": 0.8108653426170349, | |
| "num_tokens": 4990180.0, | |
| "step": 314 | |
| }, | |
| { | |
| "entropy": 0.7788964807987213, | |
| "epoch": 2.976303317535545, | |
| "grad_norm": 0.7552350163459778, | |
| "learning_rate": 3.4728958475459052e-06, | |
| "loss": 0.785426139831543, | |
| "mean_token_accuracy": 0.7967838943004608, | |
| "num_tokens": 5006243.0, | |
| "step": 315 | |
| }, | |
| { | |
| "entropy": 0.7222247868776321, | |
| "epoch": 2.985781990521327, | |
| "grad_norm": 0.8034160733222961, | |
| "learning_rate": 3.413889441584999e-06, | |
| "loss": 0.723943829536438, | |
| "mean_token_accuracy": 0.8158971667289734, | |
| "num_tokens": 5022940.0, | |
| "step": 316 | |
| }, | |
| { | |
| "entropy": 0.7745039016008377, | |
| "epoch": 2.995260663507109, | |
| "grad_norm": 0.7772722840309143, | |
| "learning_rate": 3.355285265611784e-06, | |
| "loss": 0.7676458954811096, | |
| "mean_token_accuracy": 0.8127903640270233, | |
| "num_tokens": 5039298.0, | |
| "step": 317 | |
| }, | |
| { | |
| "entropy": 0.8236450850963593, | |
| "epoch": 3.0, | |
| "grad_norm": 1.20962655544281, | |
| "learning_rate": 3.297086898727583e-06, | |
| "loss": 0.8019519448280334, | |
| "mean_token_accuracy": 0.8037053942680359, | |
| "num_tokens": 5046658.0, | |
| "step": 318 | |
| }, | |
| { | |
| "entropy": 0.7478779852390289, | |
| "epoch": 3.009478672985782, | |
| "grad_norm": 0.7697426080703735, | |
| "learning_rate": 3.2392978952499553e-06, | |
| "loss": 0.7716673612594604, | |
| "mean_token_accuracy": 0.8018212914466858, | |
| "num_tokens": 5061883.0, | |
| "step": 319 | |
| }, | |
| { | |
| "entropy": 0.785962700843811, | |
| "epoch": 3.018957345971564, | |
| "grad_norm": 0.81953364610672, | |
| "learning_rate": 3.1819217844956216e-06, | |
| "loss": 0.7965259552001953, | |
| "mean_token_accuracy": 0.8047526180744171, | |
| "num_tokens": 5077195.0, | |
| "step": 320 | |
| }, | |
| { | |
| "entropy": 0.7776083499193192, | |
| "epoch": 3.028436018957346, | |
| "grad_norm": 0.6892860531806946, | |
| "learning_rate": 3.1249620705649417e-06, | |
| "loss": 0.7372241020202637, | |
| "mean_token_accuracy": 0.8053242713212967, | |
| "num_tokens": 5092842.0, | |
| "step": 321 | |
| }, | |
| { | |
| "entropy": 0.7553633451461792, | |
| "epoch": 3.037914691943128, | |
| "grad_norm": 0.7420003414154053, | |
| "learning_rate": 3.0684222321278824e-06, | |
| "loss": 0.7217973470687866, | |
| "mean_token_accuracy": 0.8165572732686996, | |
| "num_tokens": 5107920.0, | |
| "step": 322 | |
| }, | |
| { | |
| "entropy": 0.8048459887504578, | |
| "epoch": 3.0473933649289098, | |
| "grad_norm": 0.7054750919342041, | |
| "learning_rate": 3.0123057222115835e-06, | |
| "loss": 0.8139917850494385, | |
| "mean_token_accuracy": 0.7946743965148926, | |
| "num_tokens": 5123998.0, | |
| "step": 323 | |
| }, | |
| { | |
| "entropy": 0.6766252219676971, | |
| "epoch": 3.056872037914692, | |
| "grad_norm": 0.6615550518035889, | |
| "learning_rate": 2.956615967989479e-06, | |
| "loss": 0.6536415219306946, | |
| "mean_token_accuracy": 0.8297200500965118, | |
| "num_tokens": 5140358.0, | |
| "step": 324 | |
| }, | |
| { | |
| "entropy": 0.7243975102901459, | |
| "epoch": 3.066350710900474, | |
| "grad_norm": 0.7182250618934631, | |
| "learning_rate": 2.9013563705719673e-06, | |
| "loss": 0.712523341178894, | |
| "mean_token_accuracy": 0.819568395614624, | |
| "num_tokens": 5156707.0, | |
| "step": 325 | |
| }, | |
| { | |
| "entropy": 0.743099570274353, | |
| "epoch": 3.075829383886256, | |
| "grad_norm": 0.765571653842926, | |
| "learning_rate": 2.846530304798727e-06, | |
| "loss": 0.7095881700515747, | |
| "mean_token_accuracy": 0.8208495527505875, | |
| "num_tokens": 5172592.0, | |
| "step": 326 | |
| }, | |
| { | |
| "entropy": 0.7437466382980347, | |
| "epoch": 3.085308056872038, | |
| "grad_norm": 0.7301930785179138, | |
| "learning_rate": 2.7921411190325753e-06, | |
| "loss": 0.741960883140564, | |
| "mean_token_accuracy": 0.8145260065793991, | |
| "num_tokens": 5188198.0, | |
| "step": 327 | |
| }, | |
| { | |
| "entropy": 0.7824875563383102, | |
| "epoch": 3.09478672985782, | |
| "grad_norm": 0.7977142930030823, | |
| "learning_rate": 2.73819213495501e-06, | |
| "loss": 0.7875393629074097, | |
| "mean_token_accuracy": 0.8107055574655533, | |
| "num_tokens": 5202724.0, | |
| "step": 328 | |
| }, | |
| { | |
| "entropy": 0.7904134392738342, | |
| "epoch": 3.104265402843602, | |
| "grad_norm": 0.7295860648155212, | |
| "learning_rate": 2.6846866473633126e-06, | |
| "loss": 0.7971727848052979, | |
| "mean_token_accuracy": 0.7985574156045914, | |
| "num_tokens": 5218061.0, | |
| "step": 329 | |
| }, | |
| { | |
| "entropy": 0.7173660099506378, | |
| "epoch": 3.1137440758293837, | |
| "grad_norm": 0.7091414928436279, | |
| "learning_rate": 2.6316279239693467e-06, | |
| "loss": 0.7103780508041382, | |
| "mean_token_accuracy": 0.8185557723045349, | |
| "num_tokens": 5234608.0, | |
| "step": 330 | |
| }, | |
| { | |
| "entropy": 0.6958528906106949, | |
| "epoch": 3.123222748815166, | |
| "grad_norm": 0.7575947642326355, | |
| "learning_rate": 2.579019205199992e-06, | |
| "loss": 0.6730659604072571, | |
| "mean_token_accuracy": 0.8267861753702164, | |
| "num_tokens": 5250548.0, | |
| "step": 331 | |
| }, | |
| { | |
| "entropy": 0.7260988652706146, | |
| "epoch": 3.132701421800948, | |
| "grad_norm": 0.6860673427581787, | |
| "learning_rate": 2.5268637039992296e-06, | |
| "loss": 0.7105354070663452, | |
| "mean_token_accuracy": 0.8181567341089249, | |
| "num_tokens": 5266449.0, | |
| "step": 332 | |
| }, | |
| { | |
| "entropy": 0.6991846412420273, | |
| "epoch": 3.1421800947867298, | |
| "grad_norm": 0.8046501874923706, | |
| "learning_rate": 2.4751646056319334e-06, | |
| "loss": 0.6967523097991943, | |
| "mean_token_accuracy": 0.8243842422962189, | |
| "num_tokens": 5282201.0, | |
| "step": 333 | |
| }, | |
| { | |
| "entropy": 0.8095178604125977, | |
| "epoch": 3.1516587677725116, | |
| "grad_norm": 0.743904173374176, | |
| "learning_rate": 2.4239250674893345e-06, | |
| "loss": 0.8167105913162231, | |
| "mean_token_accuracy": 0.7961531728506088, | |
| "num_tokens": 5298517.0, | |
| "step": 334 | |
| }, | |
| { | |
| "entropy": 0.8006535023450851, | |
| "epoch": 3.161137440758294, | |
| "grad_norm": 0.79604572057724, | |
| "learning_rate": 2.373148218896182e-06, | |
| "loss": 0.8157421350479126, | |
| "mean_token_accuracy": 0.7973491996526718, | |
| "num_tokens": 5313538.0, | |
| "step": 335 | |
| }, | |
| { | |
| "entropy": 0.760479748249054, | |
| "epoch": 3.170616113744076, | |
| "grad_norm": 0.7916611433029175, | |
| "learning_rate": 2.32283716091964e-06, | |
| "loss": 0.761253833770752, | |
| "mean_token_accuracy": 0.8064158409833908, | |
| "num_tokens": 5329238.0, | |
| "step": 336 | |
| }, | |
| { | |
| "entropy": 0.7230188250541687, | |
| "epoch": 3.1800947867298577, | |
| "grad_norm": 0.7429656982421875, | |
| "learning_rate": 2.2729949661798876e-06, | |
| "loss": 0.7148363590240479, | |
| "mean_token_accuracy": 0.8210389465093613, | |
| "num_tokens": 5344858.0, | |
| "step": 337 | |
| }, | |
| { | |
| "entropy": 0.7517809122800827, | |
| "epoch": 3.18957345971564, | |
| "grad_norm": 0.7186969518661499, | |
| "learning_rate": 2.2236246786624794e-06, | |
| "loss": 0.7122472524642944, | |
| "mean_token_accuracy": 0.817431777715683, | |
| "num_tokens": 5361465.0, | |
| "step": 338 | |
| }, | |
| { | |
| "entropy": 0.7896288186311722, | |
| "epoch": 3.199052132701422, | |
| "grad_norm": 0.7846527695655823, | |
| "learning_rate": 2.174729313532433e-06, | |
| "loss": 0.7668826580047607, | |
| "mean_token_accuracy": 0.8099963366985321, | |
| "num_tokens": 5376610.0, | |
| "step": 339 | |
| }, | |
| { | |
| "entropy": 0.7389908730983734, | |
| "epoch": 3.2085308056872037, | |
| "grad_norm": 0.7456527352333069, | |
| "learning_rate": 2.1263118569500797e-06, | |
| "loss": 0.7495481967926025, | |
| "mean_token_accuracy": 0.8134677708148956, | |
| "num_tokens": 5393335.0, | |
| "step": 340 | |
| }, | |
| { | |
| "entropy": 0.7424813508987427, | |
| "epoch": 3.2180094786729856, | |
| "grad_norm": 0.6882187724113464, | |
| "learning_rate": 2.078375265888707e-06, | |
| "loss": 0.7607920169830322, | |
| "mean_token_accuracy": 0.8078367114067078, | |
| "num_tokens": 5410633.0, | |
| "step": 341 | |
| }, | |
| { | |
| "entropy": 0.6920035630464554, | |
| "epoch": 3.227488151658768, | |
| "grad_norm": 0.7300297021865845, | |
| "learning_rate": 2.0309224679539554e-06, | |
| "loss": 0.6803273558616638, | |
| "mean_token_accuracy": 0.8286775499582291, | |
| "num_tokens": 5426722.0, | |
| "step": 342 | |
| }, | |
| { | |
| "entropy": 0.763126015663147, | |
| "epoch": 3.2369668246445498, | |
| "grad_norm": 0.7958213090896606, | |
| "learning_rate": 1.9839563612050273e-06, | |
| "loss": 0.7651641368865967, | |
| "mean_token_accuracy": 0.8074810951948166, | |
| "num_tokens": 5442247.0, | |
| "step": 343 | |
| }, | |
| { | |
| "entropy": 0.7819797247648239, | |
| "epoch": 3.2464454976303316, | |
| "grad_norm": 0.7771494388580322, | |
| "learning_rate": 1.937479813977703e-06, | |
| "loss": 0.7724591493606567, | |
| "mean_token_accuracy": 0.8058855086565018, | |
| "num_tokens": 5458691.0, | |
| "step": 344 | |
| }, | |
| { | |
| "entropy": 0.7670372873544693, | |
| "epoch": 3.2559241706161135, | |
| "grad_norm": 0.7733996510505676, | |
| "learning_rate": 1.8914956647091497e-06, | |
| "loss": 0.77671217918396, | |
| "mean_token_accuracy": 0.8054281175136566, | |
| "num_tokens": 5474026.0, | |
| "step": 345 | |
| }, | |
| { | |
| "entropy": 0.816511332988739, | |
| "epoch": 3.265402843601896, | |
| "grad_norm": 0.835043728351593, | |
| "learning_rate": 1.846006721764586e-06, | |
| "loss": 0.820414125919342, | |
| "mean_token_accuracy": 0.7958302795886993, | |
| "num_tokens": 5489864.0, | |
| "step": 346 | |
| }, | |
| { | |
| "entropy": 0.7935537993907928, | |
| "epoch": 3.2748815165876777, | |
| "grad_norm": 0.7735744118690491, | |
| "learning_rate": 1.8010157632657544e-06, | |
| "loss": 0.7576620578765869, | |
| "mean_token_accuracy": 0.8064090013504028, | |
| "num_tokens": 5505997.0, | |
| "step": 347 | |
| }, | |
| { | |
| "entropy": 0.808742880821228, | |
| "epoch": 3.2843601895734595, | |
| "grad_norm": 0.7906506657600403, | |
| "learning_rate": 1.7565255369212664e-06, | |
| "loss": 0.7947472333908081, | |
| "mean_token_accuracy": 0.7945470958948135, | |
| "num_tokens": 5521795.0, | |
| "step": 348 | |
| }, | |
| { | |
| "entropy": 0.7717082351446152, | |
| "epoch": 3.293838862559242, | |
| "grad_norm": 0.72123783826828, | |
| "learning_rate": 1.7125387598587862e-06, | |
| "loss": 0.785935640335083, | |
| "mean_token_accuracy": 0.805345818400383, | |
| "num_tokens": 5538348.0, | |
| "step": 349 | |
| }, | |
| { | |
| "entropy": 0.7952146828174591, | |
| "epoch": 3.3033175355450237, | |
| "grad_norm": 0.821018397808075, | |
| "learning_rate": 1.6690581184590859e-06, | |
| "loss": 0.8067296743392944, | |
| "mean_token_accuracy": 0.7967140823602676, | |
| "num_tokens": 5553641.0, | |
| "step": 350 | |
| }, | |
| { | |
| "entropy": 0.7633357793092728, | |
| "epoch": 3.3127962085308056, | |
| "grad_norm": 0.7896275520324707, | |
| "learning_rate": 1.6260862681919965e-06, | |
| "loss": 0.7645949721336365, | |
| "mean_token_accuracy": 0.8082640618085861, | |
| "num_tokens": 5569753.0, | |
| "step": 351 | |
| }, | |
| { | |
| "entropy": 0.749266117811203, | |
| "epoch": 3.322274881516588, | |
| "grad_norm": 0.7245807647705078, | |
| "learning_rate": 1.583625833454211e-06, | |
| "loss": 0.7372440099716187, | |
| "mean_token_accuracy": 0.8141664415597916, | |
| "num_tokens": 5586188.0, | |
| "step": 352 | |
| }, | |
| { | |
| "entropy": 0.7516562193632126, | |
| "epoch": 3.3317535545023698, | |
| "grad_norm": 0.7951227426528931, | |
| "learning_rate": 1.5416794074090258e-06, | |
| "loss": 0.7702144384384155, | |
| "mean_token_accuracy": 0.804230198264122, | |
| "num_tokens": 5601649.0, | |
| "step": 353 | |
| }, | |
| { | |
| "entropy": 0.7971460670232773, | |
| "epoch": 3.3412322274881516, | |
| "grad_norm": 0.8076654076576233, | |
| "learning_rate": 1.500249551827958e-06, | |
| "loss": 0.7622384428977966, | |
| "mean_token_accuracy": 0.8001376986503601, | |
| "num_tokens": 5616188.0, | |
| "step": 354 | |
| }, | |
| { | |
| "entropy": 0.7576956003904343, | |
| "epoch": 3.3507109004739335, | |
| "grad_norm": 0.7258307933807373, | |
| "learning_rate": 1.459338796934293e-06, | |
| "loss": 0.7539409399032593, | |
| "mean_token_accuracy": 0.8105089962482452, | |
| "num_tokens": 5635082.0, | |
| "step": 355 | |
| }, | |
| { | |
| "entropy": 0.7855661660432816, | |
| "epoch": 3.360189573459716, | |
| "grad_norm": 0.7409619688987732, | |
| "learning_rate": 1.4189496412485593e-06, | |
| "loss": 0.7873318195343018, | |
| "mean_token_accuracy": 0.8049680441617966, | |
| "num_tokens": 5652264.0, | |
| "step": 356 | |
| }, | |
| { | |
| "entropy": 0.8110678791999817, | |
| "epoch": 3.3696682464454977, | |
| "grad_norm": 0.7768043279647827, | |
| "learning_rate": 1.3790845514359363e-06, | |
| "loss": 0.7789173126220703, | |
| "mean_token_accuracy": 0.8034784644842148, | |
| "num_tokens": 5668104.0, | |
| "step": 357 | |
| }, | |
| { | |
| "entropy": 0.7338491827249527, | |
| "epoch": 3.3791469194312795, | |
| "grad_norm": 0.7567062973976135, | |
| "learning_rate": 1.339745962155613e-06, | |
| "loss": 0.742594838142395, | |
| "mean_token_accuracy": 0.8137776404619217, | |
| "num_tokens": 5684769.0, | |
| "step": 358 | |
| }, | |
| { | |
| "entropy": 0.6945641934871674, | |
| "epoch": 3.3886255924170614, | |
| "grad_norm": 0.6800957918167114, | |
| "learning_rate": 1.300936275912098e-06, | |
| "loss": 0.7036481499671936, | |
| "mean_token_accuracy": 0.8183564692735672, | |
| "num_tokens": 5702754.0, | |
| "step": 359 | |
| }, | |
| { | |
| "entropy": 0.7719693928956985, | |
| "epoch": 3.3981042654028437, | |
| "grad_norm": 0.8576886653900146, | |
| "learning_rate": 1.2626578629084784e-06, | |
| "loss": 0.77418053150177, | |
| "mean_token_accuracy": 0.8084691315889359, | |
| "num_tokens": 5718742.0, | |
| "step": 360 | |
| }, | |
| { | |
| "entropy": 0.7500364035367966, | |
| "epoch": 3.4075829383886256, | |
| "grad_norm": 0.7127568125724792, | |
| "learning_rate": 1.224913060901688e-06, | |
| "loss": 0.7334154844284058, | |
| "mean_token_accuracy": 0.8134698867797852, | |
| "num_tokens": 5734815.0, | |
| "step": 361 | |
| }, | |
| { | |
| "entropy": 0.7798032909631729, | |
| "epoch": 3.4170616113744074, | |
| "grad_norm": 0.7596098184585571, | |
| "learning_rate": 1.1877041750597174e-06, | |
| "loss": 0.7881962060928345, | |
| "mean_token_accuracy": 0.7973756939172745, | |
| "num_tokens": 5751466.0, | |
| "step": 362 | |
| }, | |
| { | |
| "entropy": 0.7736654728651047, | |
| "epoch": 3.4265402843601898, | |
| "grad_norm": 0.8830217719078064, | |
| "learning_rate": 1.1510334778208332e-06, | |
| "loss": 0.7824318408966064, | |
| "mean_token_accuracy": 0.8039457201957703, | |
| "num_tokens": 5766604.0, | |
| "step": 363 | |
| }, | |
| { | |
| "entropy": 0.7304582744836807, | |
| "epoch": 3.4360189573459716, | |
| "grad_norm": 0.7475463151931763, | |
| "learning_rate": 1.1149032087548117e-06, | |
| "loss": 0.6829269528388977, | |
| "mean_token_accuracy": 0.8150535821914673, | |
| "num_tokens": 5782424.0, | |
| "step": 364 | |
| }, | |
| { | |
| "entropy": 0.7141664177179337, | |
| "epoch": 3.4454976303317535, | |
| "grad_norm": 0.8083080053329468, | |
| "learning_rate": 1.0793155744261352e-06, | |
| "loss": 0.7171058058738708, | |
| "mean_token_accuracy": 0.8167122006416321, | |
| "num_tokens": 5798061.0, | |
| "step": 365 | |
| }, | |
| { | |
| "entropy": 0.7113132625818253, | |
| "epoch": 3.4549763033175354, | |
| "grad_norm": 0.7315077185630798, | |
| "learning_rate": 1.0442727482592596e-06, | |
| "loss": 0.7110543847084045, | |
| "mean_token_accuracy": 0.827058881521225, | |
| "num_tokens": 5813964.0, | |
| "step": 366 | |
| }, | |
| { | |
| "entropy": 0.794712170958519, | |
| "epoch": 3.4644549763033177, | |
| "grad_norm": 0.6843957901000977, | |
| "learning_rate": 1.0097768704058542e-06, | |
| "loss": 0.7619876265525818, | |
| "mean_token_accuracy": 0.8037076890468597, | |
| "num_tokens": 5830167.0, | |
| "step": 367 | |
| }, | |
| { | |
| "entropy": 0.7343897819519043, | |
| "epoch": 3.4739336492890995, | |
| "grad_norm": 0.7758681178092957, | |
| "learning_rate": 9.75830047614117e-07, | |
| "loss": 0.7308058738708496, | |
| "mean_token_accuracy": 0.8147377222776413, | |
| "num_tokens": 5847113.0, | |
| "step": 368 | |
| }, | |
| { | |
| "entropy": 0.7304037660360336, | |
| "epoch": 3.4834123222748814, | |
| "grad_norm": 0.8322577476501465, | |
| "learning_rate": 9.424343531000968e-07, | |
| "loss": 0.7368799448013306, | |
| "mean_token_accuracy": 0.8109440207481384, | |
| "num_tokens": 5863085.0, | |
| "step": 369 | |
| }, | |
| { | |
| "entropy": 0.6853264719247818, | |
| "epoch": 3.4928909952606633, | |
| "grad_norm": 0.8498042821884155, | |
| "learning_rate": 9.095918264210779e-07, | |
| "loss": 0.67824387550354, | |
| "mean_token_accuracy": 0.822025716304779, | |
| "num_tokens": 5879230.0, | |
| "step": 370 | |
| }, | |
| { | |
| "entropy": 0.7402042001485825, | |
| "epoch": 3.5023696682464456, | |
| "grad_norm": 0.7984394431114197, | |
| "learning_rate": 8.773044733510338e-07, | |
| "loss": 0.7216671705245972, | |
| "mean_token_accuracy": 0.8155932128429413, | |
| "num_tokens": 5893567.0, | |
| "step": 371 | |
| }, | |
| { | |
| "entropy": 0.7720407843589783, | |
| "epoch": 3.5118483412322274, | |
| "grad_norm": 0.7581180334091187, | |
| "learning_rate": 8.455742657581067e-07, | |
| "loss": 0.7469707727432251, | |
| "mean_token_accuracy": 0.8138709366321564, | |
| "num_tokens": 5908988.0, | |
| "step": 372 | |
| }, | |
| { | |
| "entropy": 0.7801290154457092, | |
| "epoch": 3.5213270142180093, | |
| "grad_norm": 0.8222272992134094, | |
| "learning_rate": 8.144031414842012e-07, | |
| "loss": 0.7962188124656677, | |
| "mean_token_accuracy": 0.7990644872188568, | |
| "num_tokens": 5924822.0, | |
| "step": 373 | |
| }, | |
| { | |
| "entropy": 0.8463993817567825, | |
| "epoch": 3.5308056872037916, | |
| "grad_norm": 0.827546238899231, | |
| "learning_rate": 7.837930042266262e-07, | |
| "loss": 0.8375414609909058, | |
| "mean_token_accuracy": 0.7954217940568924, | |
| "num_tokens": 5940568.0, | |
| "step": 374 | |
| }, | |
| { | |
| "entropy": 0.792610689997673, | |
| "epoch": 3.5402843601895735, | |
| "grad_norm": 0.7309672832489014, | |
| "learning_rate": 7.537457234218271e-07, | |
| "loss": 0.8083318471908569, | |
| "mean_token_accuracy": 0.8017033785581589, | |
| "num_tokens": 5956348.0, | |
| "step": 375 | |
| }, | |
| { | |
| "entropy": 0.7301171571016312, | |
| "epoch": 3.5497630331753554, | |
| "grad_norm": 0.6857864260673523, | |
| "learning_rate": 7.242631341312234e-07, | |
| "loss": 0.691540002822876, | |
| "mean_token_accuracy": 0.8231353312730789, | |
| "num_tokens": 5971747.0, | |
| "step": 376 | |
| }, | |
| { | |
| "entropy": 0.8102651387453079, | |
| "epoch": 3.5592417061611377, | |
| "grad_norm": 0.6648951768875122, | |
| "learning_rate": 6.953470369291349e-07, | |
| "loss": 0.7997016906738281, | |
| "mean_token_accuracy": 0.8020312041044235, | |
| "num_tokens": 5988831.0, | |
| "step": 377 | |
| }, | |
| { | |
| "entropy": 0.7594100683927536, | |
| "epoch": 3.5687203791469195, | |
| "grad_norm": 0.7718584537506104, | |
| "learning_rate": 6.669991977928103e-07, | |
| "loss": 0.7339326739311218, | |
| "mean_token_accuracy": 0.8102892637252808, | |
| "num_tokens": 6004610.0, | |
| "step": 378 | |
| }, | |
| { | |
| "entropy": 0.7296763807535172, | |
| "epoch": 3.5781990521327014, | |
| "grad_norm": 0.7523729801177979, | |
| "learning_rate": 6.392213479945852e-07, | |
| "loss": 0.7240911722183228, | |
| "mean_token_accuracy": 0.8162074387073517, | |
| "num_tokens": 6019991.0, | |
| "step": 379 | |
| }, | |
| { | |
| "entropy": 0.8269750326871872, | |
| "epoch": 3.5876777251184833, | |
| "grad_norm": 0.7713115811347961, | |
| "learning_rate": 6.120151839961363e-07, | |
| "loss": 0.8135149478912354, | |
| "mean_token_accuracy": 0.7890425622463226, | |
| "num_tokens": 6036037.0, | |
| "step": 380 | |
| }, | |
| { | |
| "entropy": 0.7625905424356461, | |
| "epoch": 3.597156398104265, | |
| "grad_norm": 0.8032811880111694, | |
| "learning_rate": 5.853823673448877e-07, | |
| "loss": 0.7643755078315735, | |
| "mean_token_accuracy": 0.8126851618289948, | |
| "num_tokens": 6051369.0, | |
| "step": 381 | |
| }, | |
| { | |
| "entropy": 0.7515154629945755, | |
| "epoch": 3.6066350710900474, | |
| "grad_norm": 0.693193793296814, | |
| "learning_rate": 5.593245245725231e-07, | |
| "loss": 0.7446131706237793, | |
| "mean_token_accuracy": 0.8108726441860199, | |
| "num_tokens": 6068292.0, | |
| "step": 382 | |
| }, | |
| { | |
| "entropy": 0.8014443516731262, | |
| "epoch": 3.6161137440758293, | |
| "grad_norm": 0.8282812833786011, | |
| "learning_rate": 5.33843247095659e-07, | |
| "loss": 0.7868412137031555, | |
| "mean_token_accuracy": 0.8031998574733734, | |
| "num_tokens": 6083746.0, | |
| "step": 383 | |
| }, | |
| { | |
| "entropy": 0.7436373382806778, | |
| "epoch": 3.625592417061611, | |
| "grad_norm": 0.7708470225334167, | |
| "learning_rate": 5.089400911186504e-07, | |
| "loss": 0.7162569165229797, | |
| "mean_token_accuracy": 0.8107694387435913, | |
| "num_tokens": 6099838.0, | |
| "step": 384 | |
| }, | |
| { | |
| "entropy": 0.7486245483160019, | |
| "epoch": 3.6350710900473935, | |
| "grad_norm": 0.7626857161521912, | |
| "learning_rate": 4.846165775385459e-07, | |
| "loss": 0.7347097396850586, | |
| "mean_token_accuracy": 0.8151373565196991, | |
| "num_tokens": 6115229.0, | |
| "step": 385 | |
| }, | |
| { | |
| "entropy": 0.7286288738250732, | |
| "epoch": 3.6445497630331753, | |
| "grad_norm": 0.8355617523193359, | |
| "learning_rate": 4.6087419185220973e-07, | |
| "loss": 0.7086689472198486, | |
| "mean_token_accuracy": 0.811622828245163, | |
| "num_tokens": 6131495.0, | |
| "step": 386 | |
| }, | |
| { | |
| "entropy": 0.7331616282463074, | |
| "epoch": 3.654028436018957, | |
| "grad_norm": 0.8304053544998169, | |
| "learning_rate": 4.3771438406559173e-07, | |
| "loss": 0.6994441747665405, | |
| "mean_token_accuracy": 0.8185713589191437, | |
| "num_tokens": 6147229.0, | |
| "step": 387 | |
| }, | |
| { | |
| "entropy": 0.8535281419754028, | |
| "epoch": 3.6635071090047395, | |
| "grad_norm": 0.8453361988067627, | |
| "learning_rate": 4.1513856860517676e-07, | |
| "loss": 0.8577812314033508, | |
| "mean_token_accuracy": 0.7872657924890518, | |
| "num_tokens": 6162417.0, | |
| "step": 388 | |
| }, | |
| { | |
| "entropy": 0.7746417671442032, | |
| "epoch": 3.6729857819905214, | |
| "grad_norm": 0.7792283892631531, | |
| "learning_rate": 3.931481242315993e-07, | |
| "loss": 0.7408183217048645, | |
| "mean_token_accuracy": 0.8043603003025055, | |
| "num_tokens": 6177688.0, | |
| "step": 389 | |
| }, | |
| { | |
| "entropy": 0.8118592947721481, | |
| "epoch": 3.6824644549763033, | |
| "grad_norm": 0.7837164998054504, | |
| "learning_rate": 3.7174439395543884e-07, | |
| "loss": 0.8173757791519165, | |
| "mean_token_accuracy": 0.7939213812351227, | |
| "num_tokens": 6193823.0, | |
| "step": 390 | |
| }, | |
| { | |
| "entropy": 0.744156688451767, | |
| "epoch": 3.6919431279620856, | |
| "grad_norm": 0.7805715799331665, | |
| "learning_rate": 3.5092868495520294e-07, | |
| "loss": 0.7480678558349609, | |
| "mean_token_accuracy": 0.8105067759752274, | |
| "num_tokens": 6209504.0, | |
| "step": 391 | |
| }, | |
| { | |
| "entropy": 0.7389348596334457, | |
| "epoch": 3.7014218009478674, | |
| "grad_norm": 0.7441927790641785, | |
| "learning_rate": 3.3070226849749367e-07, | |
| "loss": 0.7176021337509155, | |
| "mean_token_accuracy": 0.8149614036083221, | |
| "num_tokens": 6226077.0, | |
| "step": 392 | |
| }, | |
| { | |
| "entropy": 0.8332685679197311, | |
| "epoch": 3.7109004739336493, | |
| "grad_norm": 0.7937362194061279, | |
| "learning_rate": 3.110663798593616e-07, | |
| "loss": 0.8166630864143372, | |
| "mean_token_accuracy": 0.7967136949300766, | |
| "num_tokens": 6241257.0, | |
| "step": 393 | |
| }, | |
| { | |
| "entropy": 0.8039902150630951, | |
| "epoch": 3.720379146919431, | |
| "grad_norm": 0.7978947162628174, | |
| "learning_rate": 2.920222182528754e-07, | |
| "loss": 0.7984297871589661, | |
| "mean_token_accuracy": 0.798541709780693, | |
| "num_tokens": 6257243.0, | |
| "step": 394 | |
| }, | |
| { | |
| "entropy": 0.8189259767532349, | |
| "epoch": 3.729857819905213, | |
| "grad_norm": 0.8544439673423767, | |
| "learning_rate": 2.735709467518699e-07, | |
| "loss": 0.8043655753135681, | |
| "mean_token_accuracy": 0.8035803735256195, | |
| "num_tokens": 6271493.0, | |
| "step": 395 | |
| }, | |
| { | |
| "entropy": 0.7036288529634476, | |
| "epoch": 3.7393364928909953, | |
| "grad_norm": 0.7824522852897644, | |
| "learning_rate": 2.557136922209269e-07, | |
| "loss": 0.7129417061805725, | |
| "mean_token_accuracy": 0.818129375576973, | |
| "num_tokens": 6287353.0, | |
| "step": 396 | |
| }, | |
| { | |
| "entropy": 0.7242246717214584, | |
| "epoch": 3.748815165876777, | |
| "grad_norm": 0.8705534934997559, | |
| "learning_rate": 2.384515452465475e-07, | |
| "loss": 0.7420156002044678, | |
| "mean_token_accuracy": 0.8138059079647064, | |
| "num_tokens": 6303511.0, | |
| "step": 397 | |
| }, | |
| { | |
| "entropy": 0.7527247965335846, | |
| "epoch": 3.758293838862559, | |
| "grad_norm": 0.756104588508606, | |
| "learning_rate": 2.2178556007054876e-07, | |
| "loss": 0.7536898851394653, | |
| "mean_token_accuracy": 0.8014432936906815, | |
| "num_tokens": 6319766.0, | |
| "step": 398 | |
| }, | |
| { | |
| "entropy": 0.7517576813697815, | |
| "epoch": 3.7677725118483414, | |
| "grad_norm": 0.7568046450614929, | |
| "learning_rate": 2.0571675452567997e-07, | |
| "loss": 0.7543550729751587, | |
| "mean_token_accuracy": 0.8108285665512085, | |
| "num_tokens": 6336445.0, | |
| "step": 399 | |
| }, | |
| { | |
| "entropy": 0.7451558709144592, | |
| "epoch": 3.7772511848341233, | |
| "grad_norm": 0.738186240196228, | |
| "learning_rate": 1.9024610997345872e-07, | |
| "loss": 0.7499051094055176, | |
| "mean_token_accuracy": 0.8100962191820145, | |
| "num_tokens": 6352720.0, | |
| "step": 400 | |
| }, | |
| { | |
| "entropy": 0.7548184543848038, | |
| "epoch": 3.786729857819905, | |
| "grad_norm": 0.8254923224449158, | |
| "learning_rate": 1.7537457124423896e-07, | |
| "loss": 0.7180478572845459, | |
| "mean_token_accuracy": 0.8143037408590317, | |
| "num_tokens": 6368238.0, | |
| "step": 401 | |
| }, | |
| { | |
| "entropy": 0.765844538807869, | |
| "epoch": 3.7962085308056874, | |
| "grad_norm": 0.765632152557373, | |
| "learning_rate": 1.6110304657950714e-07, | |
| "loss": 0.7474029660224915, | |
| "mean_token_accuracy": 0.8108795136213303, | |
| "num_tokens": 6384535.0, | |
| "step": 402 | |
| }, | |
| { | |
| "entropy": 0.8129818141460419, | |
| "epoch": 3.8056872037914693, | |
| "grad_norm": 0.8012934923171997, | |
| "learning_rate": 1.474324075764111e-07, | |
| "loss": 0.8102051019668579, | |
| "mean_token_accuracy": 0.8035916984081268, | |
| "num_tokens": 6399709.0, | |
| "step": 403 | |
| }, | |
| { | |
| "entropy": 0.7898041009902954, | |
| "epoch": 3.815165876777251, | |
| "grad_norm": 0.7556485533714294, | |
| "learning_rate": 1.3436348913453578e-07, | |
| "loss": 0.7623159885406494, | |
| "mean_token_accuracy": 0.7990731745958328, | |
| "num_tokens": 6415199.0, | |
| "step": 404 | |
| }, | |
| { | |
| "entropy": 0.7221266180276871, | |
| "epoch": 3.824644549763033, | |
| "grad_norm": 0.6958292126655579, | |
| "learning_rate": 1.2189708940490653e-07, | |
| "loss": 0.7062927484512329, | |
| "mean_token_accuracy": 0.8177174478769302, | |
| "num_tokens": 6431083.0, | |
| "step": 405 | |
| }, | |
| { | |
| "entropy": 0.7521060258150101, | |
| "epoch": 3.834123222748815, | |
| "grad_norm": 0.6797523498535156, | |
| "learning_rate": 1.1003396974124892e-07, | |
| "loss": 0.7314427495002747, | |
| "mean_token_accuracy": 0.821082592010498, | |
| "num_tokens": 6447089.0, | |
| "step": 406 | |
| }, | |
| { | |
| "entropy": 0.7682420760393143, | |
| "epoch": 3.843601895734597, | |
| "grad_norm": 0.7217481136322021, | |
| "learning_rate": 9.877485465349057e-08, | |
| "loss": 0.7689416408538818, | |
| "mean_token_accuracy": 0.8068108856678009, | |
| "num_tokens": 6464283.0, | |
| "step": 407 | |
| }, | |
| { | |
| "entropy": 0.6823010891675949, | |
| "epoch": 3.853080568720379, | |
| "grad_norm": 0.8023832440376282, | |
| "learning_rate": 8.812043176351093e-08, | |
| "loss": 0.6633629202842712, | |
| "mean_token_accuracy": 0.8254884779453278, | |
| "num_tokens": 6481156.0, | |
| "step": 408 | |
| }, | |
| { | |
| "entropy": 0.7826178818941116, | |
| "epoch": 3.862559241706161, | |
| "grad_norm": 0.7779275178909302, | |
| "learning_rate": 7.807135176314707e-08, | |
| "loss": 0.761837363243103, | |
| "mean_token_accuracy": 0.80552838742733, | |
| "num_tokens": 6496497.0, | |
| "step": 409 | |
| }, | |
| { | |
| "entropy": 0.7546826750040054, | |
| "epoch": 3.8720379146919433, | |
| "grad_norm": 0.824887752532959, | |
| "learning_rate": 6.862822837445882e-08, | |
| "loss": 0.7290815114974976, | |
| "mean_token_accuracy": 0.8155907243490219, | |
| "num_tokens": 6511042.0, | |
| "step": 410 | |
| }, | |
| { | |
| "entropy": 0.8004759103059769, | |
| "epoch": 3.881516587677725, | |
| "grad_norm": 0.7989736795425415, | |
| "learning_rate": 5.979163831223988e-08, | |
| "loss": 0.7977555990219116, | |
| "mean_token_accuracy": 0.8019419759511948, | |
| "num_tokens": 6527738.0, | |
| "step": 411 | |
| }, | |
| { | |
| "entropy": 0.7018654048442841, | |
| "epoch": 3.890995260663507, | |
| "grad_norm": 0.7676022052764893, | |
| "learning_rate": 5.1562121248803776e-08, | |
| "loss": 0.684330940246582, | |
| "mean_token_accuracy": 0.8249032348394394, | |
| "num_tokens": 6543150.0, | |
| "step": 412 | |
| }, | |
| { | |
| "entropy": 0.7496612817049026, | |
| "epoch": 3.9004739336492893, | |
| "grad_norm": 0.7940666675567627, | |
| "learning_rate": 4.394017978101905e-08, | |
| "loss": 0.7147549390792847, | |
| "mean_token_accuracy": 0.8174268007278442, | |
| "num_tokens": 6559353.0, | |
| "step": 413 | |
| }, | |
| { | |
| "entropy": 0.7699935585260391, | |
| "epoch": 3.909952606635071, | |
| "grad_norm": 0.8074795007705688, | |
| "learning_rate": 3.6926279399617236e-08, | |
| "loss": 0.7725629210472107, | |
| "mean_token_accuracy": 0.8059059679508209, | |
| "num_tokens": 6575235.0, | |
| "step": 414 | |
| }, | |
| { | |
| "entropy": 0.6986105889081955, | |
| "epoch": 3.919431279620853, | |
| "grad_norm": 0.701176106929779, | |
| "learning_rate": 3.0520848460765525e-08, | |
| "loss": 0.686797022819519, | |
| "mean_token_accuracy": 0.8240027725696564, | |
| "num_tokens": 6593020.0, | |
| "step": 415 | |
| }, | |
| { | |
| "entropy": 0.7832924127578735, | |
| "epoch": 3.9289099526066353, | |
| "grad_norm": 0.7602893710136414, | |
| "learning_rate": 2.4724278159898863e-08, | |
| "loss": 0.778874397277832, | |
| "mean_token_accuracy": 0.8027657568454742, | |
| "num_tokens": 6610854.0, | |
| "step": 416 | |
| }, | |
| { | |
| "entropy": 0.7910645306110382, | |
| "epoch": 3.938388625592417, | |
| "grad_norm": 0.759920597076416, | |
| "learning_rate": 1.9536922507841227e-08, | |
| "loss": 0.7716935873031616, | |
| "mean_token_accuracy": 0.8021509498357773, | |
| "num_tokens": 6626142.0, | |
| "step": 417 | |
| }, | |
| { | |
| "entropy": 0.8243566155433655, | |
| "epoch": 3.947867298578199, | |
| "grad_norm": 0.7890955805778503, | |
| "learning_rate": 1.495909830917075e-08, | |
| "loss": 0.8134855031967163, | |
| "mean_token_accuracy": 0.7995802313089371, | |
| "num_tokens": 6642353.0, | |
| "step": 418 | |
| }, | |
| { | |
| "entropy": 0.7705054581165314, | |
| "epoch": 3.957345971563981, | |
| "grad_norm": 0.7859002351760864, | |
| "learning_rate": 1.099108514288627e-08, | |
| "loss": 0.7462877035140991, | |
| "mean_token_accuracy": 0.8144356906414032, | |
| "num_tokens": 6658562.0, | |
| "step": 419 | |
| }, | |
| { | |
| "entropy": 0.7886055111885071, | |
| "epoch": 3.966824644549763, | |
| "grad_norm": 0.7850826382637024, | |
| "learning_rate": 7.633125345317682e-09, | |
| "loss": 0.8090062737464905, | |
| "mean_token_accuracy": 0.798252746462822, | |
| "num_tokens": 6674374.0, | |
| "step": 420 | |
| }, | |
| { | |
| "entropy": 0.7322007268667221, | |
| "epoch": 3.976303317535545, | |
| "grad_norm": 0.7463184595108032, | |
| "learning_rate": 4.885423995341088e-09, | |
| "loss": 0.7309263944625854, | |
| "mean_token_accuracy": 0.8136301338672638, | |
| "num_tokens": 6690686.0, | |
| "step": 421 | |
| }, | |
| { | |
| "entropy": 0.8167396634817123, | |
| "epoch": 3.985781990521327, | |
| "grad_norm": 0.8257619738578796, | |
| "learning_rate": 2.7481489018410525e-09, | |
| "loss": 0.8237687945365906, | |
| "mean_token_accuracy": 0.7974095642566681, | |
| "num_tokens": 6705420.0, | |
| "step": 422 | |
| }, | |
| { | |
| "entropy": 0.7956383675336838, | |
| "epoch": 3.995260663507109, | |
| "grad_norm": 0.8490376472473145, | |
| "learning_rate": 1.2214305934699078e-09, | |
| "loss": 0.7691541910171509, | |
| "mean_token_accuracy": 0.8052787631750107, | |
| "num_tokens": 6720459.0, | |
| "step": 423 | |
| }, | |
| { | |
| "entropy": 0.7545907199382782, | |
| "epoch": 4.0, | |
| "grad_norm": 1.2672629356384277, | |
| "learning_rate": 3.053623106741288e-10, | |
| "loss": 0.7667921781539917, | |
| "mean_token_accuracy": 0.7996481955051422, | |
| "num_tokens": 6728853.0, | |
| "step": 424 | |
| } | |
| ], | |
| "logging_steps": 1, | |
| "max_steps": 424, | |
| "num_input_tokens_seen": 0, | |
| "num_train_epochs": 4, | |
| "save_steps": 500, | |
| "stateful_callbacks": { | |
| "TrainerControl": { | |
| "args": { | |
| "should_epoch_stop": false, | |
| "should_evaluate": false, | |
| "should_log": false, | |
| "should_save": true, | |
| "should_training_stop": true | |
| }, | |
| "attributes": {} | |
| } | |
| }, | |
| "total_flos": 2.7217785105730765e+17, | |
| "train_batch_size": 4, | |
| "trial_name": null, | |
| "trial_params": null | |
| } | |