.. _tutorial_tldr: Tutorial - Basic Usage ====================== Promptzl can be used in two ways: with `masked language models `_ and with `causal language models `_. After running :code:`pip install -U promptzl` it is possible to run the following examples. Causal Language Models ---------------------- All causal models from the 🤗-transformers library can be used for classification tasks. The idea is to guide the model to produce the correct output by providing a prompt that contains the information about the classification task and condens the classification into a single word at the end of the prompt. In the following, we will see an example of how a base model without fine-tuning is used for classification: .. code-block:: python from datasets import load_dataset from promptzl import FnVbzPair, Vbz, CausalLM4Classification from sklearn.metrics import accuracy_score dataset = load_dataset("mteb/amazon_polarity")['test'].select(range(1000)) prompt = FnVbzPair(lambda e:\ f""" Product Review Classification into categories 'positive' or 'negative'. 'Good value I love Curve and this large bottle offers great value. Highly recommended.'='positive' 'Edge of Danger 1 star - only because that's the minimum. This book shows that famous people can publish anything.'='negative' '{e['text']}'=""", Vbz({0: ["negative"], 1: ["positive"]})) model = CausalLM4Classification( 'HuggingFaceTB/SmolLM2-1.7B', prompt=prompt) output = model.classify(dataset, show_progress_bar=True, batch_size=8) accuracy_score(dataset['label'], output.predictions) 0.935 It is also possible to use *Prompt-Element-Objects* as shown in the following example. Using *Prompt-Element-Objects* (see :ref:`prompt-element-objects`) is safer, as it automatically truncates the prompt to the maximum model length, which is especially useful when using smaller models where the context length is limited. Masked Language Models ---------------------- Here's a basic example (from `Schick and Schütze., 2020 `_) of how to classify text with a *masked language model*. Instead of using :ref:`functoin_verbalizer_pair`, we use *prompt-element-objects* to construct the prompt as they truncate the data if it exceeds the model's context length. .. code-block:: python from datasets import load_dataset from promptzl import Key, Txt, Vbz, MaskedLM4Classification from sklearn.metrics import accuracy_score dataset = load_dataset("SetFit/ag_news")['test'] verbalizer = Vbz({0: ["World"], 1: ["Sports"], 2: ["Business"], 3: ["Tech"]}) prompt = Txt("[Category:") + verbalizer + Txt("] ") + Key() model = MaskedLM4Classification("roberta-large", prompt) output = model.classify(dataset, show_progress_bar=True) accuracy_score(dataset['label'], output.predictions) 0.7986842105263158