Prompt Classes

Defining a prompt is crucial for effectively guiding the model in predicting the desired output. Promptzl offers two approaches to defining dynamic prompts that serve as templates!

Prompt-element-objects are a safer but also more difficult-to-write option, while a function-verbalizer-pair is more straightforward but might lead to errors more easily.

Prompt-Element-Objects

  • Txt is used for text representation, e.g., Txt("I give you the following text: ")

  • Key is used for the keys that refer to a column in the corresponding 🤗-dataset, i.e. Key('text') (the key ‘text’ is the default key and can usually be omitted) or use multiple Key objects if multiple columns are used (e.g. Txt("Premise: ") + Key('premise') + Txt("\n\nHypothesis: ") + Key('hypothesis') + Vbz(...)

  • Vbz for the verbalizer, e.g., Vbz({0: ["Good"], 1: ["Bad"]}) The verbalizer object is required to be in every prompt as it is used to extract the tokens of interest from the logits over the vocabulary. It is also possible to use multiple words for a class.

    Note

    Using prompt-elements-objects offers the upside of truncating the prompt if the context length is exceeded and automatically adding the MASK token when using MLMs. When truncation is applied, the data filled into the Key-placeholders is truncated to avoid cutting off crucial parts of the appended prompt. The context length is usually not a problem when dealing with modern LLMs; hence, an FnVbzPair prompt can be used instead of prompt-element objects.

Function-Verbalizer-Pair Class (FnVbzPair)

FnVbzPair stands for the function-verbalizer-pair. In contrast to prompt-element-objects, a function constructing the prompt is directly defined with a verbalizer. e.g., FnVbzPair(lambda x: f"{x['text']}n", {0: ["World"], 1: ["Sports"], 2: ["Business"], 3: ["Tech"]}). The function receives a dictionary where the keys must refer to the columns in the dataset, and the values correspond to the respective observations. The FnVbzPair class inherits from the prompt class and can be used for initializing the classifier classes (see. promptzl.modules.MaskedLM4Classification and promptzl.modules.CausalLM4Classification)

Note

Using the FnVbzPair object is easier to write but also requires more vigilance as the function must adhere to the requirements of the dataset. It is also impossible to truncate the prompt on the fly, which can result in indexing errors. The masked token must be set manually in the function when using MLMs.

Examples

Here are some examples of handling the different approaches when constructing the prompt object.

Prompt-Element-Objects

from promptzl import *

verbalizer = Vbz({0: ["World"], 1: ["Sports"], 2: ["Business"], 3: ["Tech"]})
prompt = Txt("[Category:") + verbalizer + Txt("] ") + Key()

It is possible to print the prompt after initializing it to see what the final prompt looks like. Verbalizer and key objects are also represented as follows:

from promptzl import *

prompt = (
   Txt("I give you a movie review. Classify the sentiment! Here is the review:\n\n") +
   Key() +
   Txt("\n\nIs this review negative or positive?") +
   Vbz([['Negative','negative'], ['Positive', 'positive']]))
prompt
"""I give you a movie review. Classify the sentiment! Here is the review:

<text>

Is this review negative or positive?<Vbz: [["Negative",...], ["Positive",...]]>"""

Function-Verbalizer-Pair

from promptzl import *

verbalizer = Vbz({0: ["good"], 1: ["bad"]})
prompt = FnVbzPair(lambda x: f"{x['text']}\nIt was", verbalizer)

When using masked-language-modeling, the masked token must be set manually in the function, which can be done as follows:

from transformers import AutoTokenizer
from promptzl import *

tok = AutoTokenizer.from_pretrained("<an available model>")
verbalizer = Vbz({0: ["good"], 1: ["bad"]})
prompt = FnVbzPair(lambda x: f"{x['text']}\nIt was {tok.mask_token}", verbalizer)

Documentation

class promptzl.prompt.FnVbzPair(prompt_function: Callable[[Dict[str, str]], str], verbalizer: Vbz)

Bases: Prompt

Function-Verbalizer-Pair Class.

Prompt class organizing the prompt-generating function and the verbalizer. The prompt-generating function must return the final prompt forwarded into the tokenizer as a string. The only argument must accept a Dict[str, str] where the keys must refer to the columns in the dataset, and the values are the respective observations from the dataset. For example:

FnVbzPair(
    lambda e: f"NLI-Task. Premise: '{e['premise']}' Hypothesis: '{e['hypothesis']}' Does the premise entail the hypothesis?",
    Vbz({0: ['yes'], 1: ['no']})
)
Parameters:
  • prompt_function (Callable[[Dict[str, str]], str]) – Function to build prompt taking on dict argument where a string is returned where the final prompt is constructed.

  • verbalizer (Vbz) – Verbalizer object.

class promptzl.prompt.Key(key: str = 'text')

Bases: Prompt

Placeholder to Corresponding Key in Data.

This class allows for including key objects in the prompt that the corresponding value in the dataset will replace. For example, Key("text") + Txt(" is ") + Vbz([['good', 'bad']]) requires a column ‘text’ in the dataset and fills the key placeholder with the corresponding text.

Consider a dataset {"text": ["Restaurant X", "Restaurant Y"]}, the final prompts will be "Restaurant X is " and "Restaurant Y is " for causal models. Using MLMs will return: "Restaurant X is [MASK]" and "Restaurant Y is [MASK]"

Parameters:

key (str, optional) – Key. Defaults to “text”.

class promptzl.prompt.Txt(text: str = ' ')

Bases: Prompt

Text Representation in Prompt

This class can be used to add additional text to the prompt: I.e. Txt("Hello ") + Vbz([['World'], ['Mars]]) + Txt('!') will prepend “Hello ” and append “!” to the prompt.

Parameters:

text (str, optional) – Text. Defaults to “ “.

class promptzl.prompt.Vbz(verbalizer: Dict[int | str, List[str]] | None)

Bases: Prompt

Verbalizer Representation in Prompt

A valid prompt must include one verbalizer. For causal models, the verbalizer must be at the end of the prompt while the verbalizer can be at any position in the prompt when using masked models when using prompt-element-objects.

The corresponding (which can be more than one) words for each class must be provided in the form of a list of lists or a dictionary where the key is the class label (ideally referring to the representation in the dataset) and the value is a list of words corresponding to the class semantics. The keys of the dictionary are also used in the predictions list in promptzl.utils.LLM4ClassificationOutput.

Valid verbalizers:

Key() + Txt("Is this good?") + Vbz([["good", "bad"], ["ugly"]])
Key() + Txt("Is this good?") + Vbz({0: ["good", "bad"], 1: ["ugly"]})

The above examples are valid for causal and masked models. The following example is only valid for masked models:

Key("headline") + Txt("[Category:]")  + Vbz([["Politics", "Nature", "Technology"], ["ugly"]]) + Txt("]") + Key("body")
Parameters:

verbalizer (Optional[Dict[Union[int, str], List[str]]]) – List of verbalizers.