Benchmark ========= Evaluation of different *base* models without fine-tuning. The models were provided with an example of one class and were guided to predict the class as the next token as: .. code:: text Sentiment Classification into categories 'negative' or 'positive'. 'hat 's far too tragic to merit such superficial treatment '='negative' 'that loves its characters and communicates something rather beautiful about human nature '='positive' ''=' Table ----- .. csv-table:: :header: 🤗-Model-ID,Average Accuracy,AGnews,Amazon Polarity,DBPedia,Emotion,Fnc1,IMDB,MNLI,QNLI,RTE,SST2,TREC-6,Tweet Sentiment,Wikitalk,Yahoo,Yelp :class: sphinx-datatable Qwen/Qwen2.5-32B,0.802580741896532,0.868,0.963,0.956,0.487,0.414346454762276,0.967,0.872,0.862,0.758122743682311,0.952981651376147,0.812,0.683,0.838260278627251,0.624,0.981 google/gemma-2-2b,0.666815090408935,0.823,0.944,0.935,0.446,0.304029613613792,0.937,0.405,0.54,0.581227436823105,0.913990825688073,0.542,0.563,0.609978480009061,0.609,0.849 meta-llama/Meta-Llama-3.1-8B,0.727249223386589,0.844,0.954,0.952,0.487,0.265591913157836,0.923,0.568,0.744,0.729241877256318,0.924311926605505,0.664,0.593,0.676592633779178,0.619,0.965 mistralai/Mistral-7B-v0.3,,0.859,0.956,0.895,0.423,,0.939,0.57,0.672,0.729241877256318,0.922018348623853,0.698,0.624,0.668736292589504,0.637,0.945 tiiuae/Falcon3-1B-Base,0.670035048240265,0.808,0.909,0.878,0.453,0.291803564927297,0.879,0.416,0.562,0.577617328519856,0.892201834862385,0.63,0.562,0.658902995294433,0.593,0.94 lmsys/vicuna-13b-v1.5,0.71634232489797,0.747,0.945,0.926,0.488,0.263846661564714,0.909,0.582,0.748,0.76173285198556,0.931192660550459,0.58,0.612,0.725362699368816,0.579,0.947 tiiuae/Falcon3-7B-Base,0.722606069305748,0.772,0.949,0.936,0.489,0.263597446051807,0.939,0.667,0.731,0.743682310469314,0.935779816513762,0.588,0.634,0.676031466551344,0.567,0.948 mistralai/Mistral-Nemo-Base-2407,0.739890499381605,0.884,0.954,0.939,0.426,0.258144920213886,0.946,0.679,0.755,0.754512635379061,0.936926605504587,0.646,0.623,0.705773329626541,0.627,0.964 tiiuae/Falcon3-3B-Base,0.69797446797878,0.754,0.93,0.902,0.468,0.308771085799483,0.908,0.53,0.674,0.675090252707581,0.896788990825688,0.716,0.595,0.564966690348953,0.584,0.963 HuggingFaceTB/SmolLM2-1.7B,0.663083716245386,0.791,0.942,0.856,0.384,0.256327163933655,0.936,0.419,0.535,0.624548736462094,0.920871559633028,0.572,0.56,0.644508283652015,0.535,0.97 answerdotai/ModernBERT-large,0.530367001294045,0.79,0.66,0.67,0.33,0.116628614916286,0.68,0.29,0.55,0.52,0.76,0.53,0.55,0.398876404494382,0.43,0.68 tiiuae/falcon-mamba-7b,0.728538294959239,0.821,0.962,0.941,0.467,0.250898414184418,0.921,0.56,0.639,0.797833935018051,0.935779816513762,0.668,0.625,0.72656225867235,0.645,0.968 lmsys/vicuna-7b-v1.5,0.717034574534011,0.869,0.957,0.925,0.502,0.258806811849408,0.948,0.537,0.717,0.696750902527076,0.920871559633028,0.506,0.618,0.717089344000659,0.621,0.962 tiiuae/falcon-7b,0.67027734119717,0.876,0.951,0.776,0.483,0.259471551662221,0.929,0.419,0.505,0.595667870036101,0.911697247706422,0.534,0.575,0.673323448552806,0.597,0.969 Qwen/Qwen2.5-14B,0.793427125926012,0.884,0.945,0.957,0.495,0.415383624714254,0.941,0.832,0.856,0.779783393501805,0.935779816513762,0.798,0.67,0.78046005416036,0.645,0.967 google/gemma-2-9b,0.744717802816623,0.872,0.963,0.948,0.469,0.235040629868216,0.929,0.675,0.753,0.714801444043321,0.947247706422018,0.682,0.615,0.775677261915794,0.638,0.954 google/gemma-2-27b,0.341126766689851,0.237,0.486,0.075,0.286,0.25,0.476,0.371,0.484,0.527075812274368,0.490825688073395,0.018,0.281,0.5,0.106,0.529 lmsys/vicuna-33b-v1.3,0.69071294530782,0.72,0.961,0.896,0.476,0.20791904737138,0.944,0.591,0.767,0.768953068592058,0.92545871559633,0.516,0.627,0.548363348057538,0.584,0.828 bigscience/bloom-7b1,,0.729,0.934,0.905,0.367,,0.889,0.339,0.531,0.555956678700361,0.895642201834862,0.448,0.495,0.432129655371246,0.507,0.911 tiiuae/Falcon3-10B-Base,0.752038507723927,0.838,0.963,0.954,0.504,0.203714394688026,0.928,0.706,0.776,0.772563176895307,0.938073394495413,0.726,0.646,0.752226649780167,0.601,0.972 tiiuae/falcon-11B,0.736912972596415,0.838,0.947,0.906,0.474,0.20857710015925,0.92,0.629,0.773,0.779783393501805,0.947247706422018,0.706,0.645,0.690086388863147,0.626,0.964 Qwen/Qwen2.5-7B,0.769090560222669,0.827,0.943,0.958,0.493,0.325405060344208,0.93,0.803,0.799,0.790613718411552,0.954128440366973,0.75,0.674,0.710211184217301,0.616,0.963 deepseek-ai/DeepSeek-V2-Lite,0.704019867128113,0.876,0.958,0.877,0.468,0.27148212954501,0.944,0.427,0.56,0.606498194945848,0.924311926605505,0.682,0.617,0.739005755825328,0.64,0.97