Question Answering#

Short description and explenation#

This function enables ad-hoc question answering over documents by ingesting text into a language model and returning formatted responses.
It accepts:

  • A language model

  • Text files with content

  • Questions to answer

  • More inputs can be given for configuration

The model processes the files to build understanding. Questions posed are then answered in one of two modes:

Default mode:
The model directly answers each question using its own capabilities.

Poll mode:
Additional models are included to separately answer each question. An aggregation algorithm determines the best response through consensus between models.
Two options exist for consensus methodology:

Average Answer:
Each model’s answer is scored. The response with the average highest score amongst models is selected. Useful for numeric or ranked responses.

Most Common Answer:
The answer that occurs most frequently across models is selected. Useful for textual responses to avoid outliers.

Using multiple models via the poll mode provides accuracy improvements for questions lacking definitive answers, as it refines responses through an ensemble process.

Background#

At the core, advanced natural language processing (NLP) models called foundation models are being leveraged to read and comprehend the input text files.
Specifically, models such as GPT-3 or Codex from Anthropic are used as the base language model.

When documents are fed into the function, the background process invokes these models to ingest and digest the information.

This provides the knowledge base for the models to then offer informed answers tailored to any queries about the documents.
The parameters controlling model size and computation time provide tradeoffs between cost, speed, and sophistication of comprehension.

Additionally, the poll option expands on a single model by sampling responses from a number of models as mentioned above.

Requirements#

transformers
torch
tqdm

Documentation#

data_path: A path to a directory of text files or a path to a text file to ask questions about.

model_name: The pre-trained model name from the huggingface hub to use for answering questions.

questions: The questions to ask. A list of lists of questions to ask per text file, and devided
by question groups, the groups can be determained by size (in order to
avoid large inputs to the llm) or by questioning method (regular or poll like questioning).

device_map: A map to use for loading the model on multiple devices.

model_kwargs: Keyword arguments to pass for loading the model using HuggingFace’s
transformers.AutoModelForCausalLM.from_pretrained function.

auto_gptq_exllama_max_input_length: For AutoGPTQ models to set and extend the model’s input buffer size.

tokenizer_name: The tokenizer name from the huggingface hub to use. If not given, the given model name will be used.

tokenizer_kwargs: Keyword arguments to pass for loading the tokenizer using HuggingFace’s
transformers.AutoTokenizer.from_pretrained function.

text_wrapper: Must have a placeholder (‘{}’) for the text of the file.

questions_wrapper: A wrapper for the questions received. Will be added after the text wrapper in the prompt template.
Must have a placeholder (‘{}’) for the questions.

generation_config: HuggingFace’s GenerationConfig keyword arguments to pass to the generate method.

questions_config: A dictionary or list of dictionaries containing specific ways to answer questions (using a poll for example),
each dictionary in the list is for corresponding question group and determines the question asking method
for said group.

batch_size: Batch size for inference.

questions_columns: Columns to use for the dataframe returned.

verbose: Whether to present logs of a progress bar and errors. Default: True.

Demo 1#

This is a short and simple example to show the basic use of the function.

(1.) Import the function (import mlrun, set project and import function)#

import mlrun
import transformers
import tempfile
project = mlrun.get_or_create_project(
    name="call-center-demo-1",
    context="./",
    user_project=True,
    parameters={
        "default_image": "mlrun/mlrun",
    })
func = project.set_function(
    "question-answering.py",
    name="question-answering",
    kind="job",
    handler="answer_questions",
)
project.save()

We create a text file that the model can be asked about

def _make_data_dir_for_test():
    data_dir = tempfile.mkdtemp()
    # The information the model will need in order to answer our question
    content = "The apple is red."
    with open(data_dir + "/test_data.txt", "w") as f:
        f.write(content)
    return data_dir

(2.) Usage#

Then we set where to take the path to the text file we want to ask about, the questions, and column name for the answer table.

input_path = _make_data_dir_for_test()
# The question for the model to answer
question = ["What is the color of the apple?"]
# The column of the answer in the data frame returned by the function
column_name = ["color"]

Now we run the function with all the parameters we prepered earlier

demo1_run = func.run(
    handler="answer_questions",
    params={
        "model": "distilgpt2",
        "input_path": input_path,
        "questions": question,
        "questions_columns": column_name,
        "generation_config": {
            "do_sample": True,
            "temperature": 0.8,
            "top_p": 0.9,
            "early_stopping": True,
            "max_new_tokens": 20,
        },
    },
    returns=[
        "question_answering_df: dataset",
        "question_answering_errors: result",
    ],
    local=True,
    artifact_path="./"
)

(3.) Review results#

and after the run is finished we can take a look and see our answer

demo1_run.outputs

Demo 2#

This is a much larger example, we will show how we use this function to analyze a number of calls between agents and customer of a internet company (all the data is generated by Iguazio).
For something like this, we recomend using a strong model, and putting some time into making the prompts.

(1.) Import the function (import mlrun, set project and import function)#

import os
import mlrun
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
> 2023-12-18 10:18:37,490 [warning] Client version with higher version than server version isn't supported, align your client to the server version: {'parsed_server_version': Version(major=1, minor=5, patch=2, prerelease='rc1', build='track'), 'parsed_client_version': Version(major=1, minor=6, patch=0, prerelease='rc11', build=None)}
project = mlrun.get_or_create_project(
    name="call-center-demo-2",
    context="./",
    user_project=True,
    parameters={
        "default_image": "mlrun/mlrun",
    })
> 2023-12-18 10:18:51,651 [info] Project loaded successfully: {'project_name': 'call-center-demo-zeev55'}
func = project.set_function(
    "question-answering.py",
    name="question-answering",
    kind="job",
    handler="answer_questions",
)
project.save()
<mlrun.projects.project.MlrunProject at 0x7f8bc5b0a370>

(2.) Usage#

This example is a bit more complicated as we mentioned, we give the model a list of questions, for some of them we give the model a list of answers to choose from.

QUESTIONS = [
    "1. Write a long summary of the text, focus on the topic (max 50 words).",
    "2. Was the Client's concern addressed, (choose only one) [Yes, No]?",
    ]

qa_questions_columns = [
                        "Summary",
                        "is_fixed",
                        ]

Another thing we give the model this time is answer examples (one/few shot answering), this can be done to show the model how you want the answer to be structured or caculated.

# For every file we ask about, the model will be presented with this example of a call and how we want the answers.
DEMO_CALL = (
    "Agent: Good afternoon, you've reached [Internet Service Provider] customer support. I'm Megan. How can I assist "
    "you today?\n"
    "Customer: Hello, Megan. This is Lisa. I've noticed some billing discrepancies on my last statement.\n"
    "Agent: Thank you, Lisa. Let me pull up your account. I see the billing discrepancies you mentioned. It appears "
    "there was an error in the charges. I apologize for the inconvenience.\n"
    "Customer: Thank you for acknowledging the issue, Megan. Can you please help me get it resolved?\n"
    "Agent: Absolutely, Lisa. I've made note of the discrepancies, and I'll escalate this to our billing department "
    "for investigation and correction. You should see the adjustments on your next statement.\n"
    "Customer: That sounds good, Megan. I appreciate your help.\n"
    "Agent: Not a problem, Lisa. Have a wonderful day, and we'll get this sorted out for you.\n"
)

DEMO_ANSWERS = (
    "1. The customer, contacted the call center regarding billing discrepancies on her statement. The agent, "
    "acknowledged the issue, assured The customer it would be resolved, and escalated it to the billing department for "
    "correction.\n"
    "2. Yes.\n"

Then we need to wrap it all nicely to be given to the model as a single prompt, this is done with a text wrapper, and a question wrapper.
both of them will be concatenated inside the function with the questions and passed to the model.

# The wrappers are built according to the model's convensions to improve result
TEXT_WRAPPER = (
    f"<|im_start|>system: You are an AI assistant that answers questions accurately and shortly<|im_end|>\n"
    f"<|im_start|>user: Given the following text:\n"
    f"{DEMO_CALL}\n"
    f"answer the questions as accurately as you can:\n"
    f"{QUESTIONS}<|im_end|>\n"
    f"<|im_start|>assistant:\n"
    f"{DEMO_ANSWERS}<|im_end|>\n"
    f"<|im_start|>user: Given the following text:\n"
    "{}"
) 
QUESTIONS_WRAPPER = (
    " answer the given questions as accurately as you can, do not write more answers the questions:\n"
    "{}<|im_end|>\n"
    "<|im_start|>assistant:\n"
)

The last few parameters we need to set are the model we will use, the input lenth (no available for all models) and the batch size.
The batch size determains how many files we want procced at each epoch, and the larger we go the faster the proccess will be, as long as our memory is sufficient.

# We like this version of mistral's model, which is small and fast but also gives great results
qa_model = "TheBloke/Mistral-7B-OpenOrca-GPTQ"

Finnaly, we run the function with all the parameters we prepared.

# Question answering:
demo2_run = func.run(
    function="question-answering",
    local=True,
    handler="answer_questions",
    inputs={"data_path": os.path.abspath("./calls")},
    params={
        "model_name": qa_model,
        "device_map": "auto",
        "text_wrapper":TEXT_WRAPPER,
        "questions": QUESTIONS,
        "questions_wrapper": QUESTIONS_WRAPPER,
        "questions_columns": qa_questions_columns,
    },
    returns=[
        "question_answering_df: dataset",
        "question_answering_errors: result",
    ],
)

(3.) Review results#

demo2_run.outputs

Demo 3#

This is also a large example, in this case we use another option of the function to ask questions in the form of a poll.

(1.) Import the function (import mlrun, set project and import function)#

import os
import mlrun
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig, pipeline
> 2023-12-18 10:18:37,490 [warning] Client version with higher version than server version isn't supported, align your client to the server version: {'parsed_server_version': Version(major=1, minor=5, patch=2, prerelease='rc1', build='track'), 'parsed_client_version': Version(major=1, minor=6, patch=0, prerelease='rc11', build=None)}
project = mlrun.get_or_create_project(
    name="call-center-demo-3",
    context="./",
    user_project=True,
    parameters={
        "default_image": "mlrun/mlrun",
    })
> 2023-12-18 10:18:51,651 [info] Project loaded successfully: {'project_name': 'call-center-demo-zeev55'}
func = project.set_function(
    "question-answering.py",
    name="question-answering",
    kind="job",
    handler="answer_questions",
)
project.save()
<mlrun.projects.project.MlrunProject at 0x7f8bc5b0a370>

(2.) Usage#

Like in the second demo, we make a list of questions for the function to answer.

# These questions are harder to answer, as there is no right answer.
# So we want it to be at least consistent, for that we use the poll option.
QUESTIONS = [
    "1. Rate the agent's level of empathy (The ability to understand and share the feelings of others) on a scale of 1-5.",
    "2. Rate the agent's level of professionalism (Conducting oneself in a way that is appropriate for the workplace) on a scale of 1-5.",
]

qa_questions_columns = [
                        "empathy",
                        "professionalism",

                        ]

Another thing we give the model this time is answer examples (one/few shot answering), this can be done to show the model how you want the answer to be structured or caculated.
So for every file we ask about, the model will be presented with this example of a call and how we want the answers.

# For every file we ask about, the model will be presented with this example of a call and how we want the answers.
DEMO_CALL = (
    "Agent: Good afternoon, you've reached [Internet Service Provider] customer support. I'm Megan. How can I assist "
    "you today?\n"
    "Customer: Hello, Megan. This is Lisa. I've noticed some billing discrepancies on my last statement.\n"
    "Agent: Thank you, Lisa. Let me pull up your account. I see the billing discrepancies you mentioned. It appears "
    "there was an error in the charges. I apologize for the inconvenience.\n"
    "Customer: Thank you for acknowledging the issue, Megan. Can you please help me get it resolved?\n"
    "Agent: Absolutely, Lisa. I've made note of the discrepancies, and I'll escalate this to our billing department "
    "for investigation and correction. You should see the adjustments on your next statement.\n"
    "Customer: That sounds good, Megan. I appreciate your help.\n"
    "Agent: Not a problem, Lisa. Have a wonderful day, and we'll get this sorted out for you.\n"
)


DEMO_ANSWERS = (
    "1. 4\n"
    "2. 5\n"

)

Then we need to wrap it all nicely to be given to the model as a single prompt, this is done with a text wrapper, and a question wrapper.
both of them will be concatenated inside the function with the questions and passed to the model.

TEXT_WRAPPER = (
    f"<|im_start|>system: You are an AI assistant that answers questions accurately and shortly<|im_end|>\n"
    f"<|im_start|>user: Given the following text:\n"
    f"{DEMO_CALL}\n"
    f"answer the questions as accurately as you can:\n"
    f"{QUESTIONS}<|im_end|>\n"
    f"<|im_start|>assistant:\n"
    f"{DEMO_ANSWERS}<|im_end|>\n"
    f"<|im_start|>user: Given the following text:\n"
    "{}"
) 

QUESTIONS_WRAPPER = (
    " answer the given questions as accurately as you can, do not write more answers the questions:\n"
    "{}<|im_end|>\n"
    "<|im_start|>assistant:\n"
)

The config is for the second questioning method, we cal “poll”, and in which we need to choose how many voting models we want participating,
and in what way we want do decide the result, we currentlly support average and most_common as show here.

*An explenation about both questioning methods can be found in the begginig of this notebook

questions_config = 
    {
        "type": "poll",
        "poll_count": 3, # How many 'voters'
        "poll_strategy": "most_common"
    }
qa_model = "TheBloke/Mistral-7B-OpenOrca-GPTQ"

Finnaly, we run the function with all the parameters we prepared.

# Question answering:
demo3_run = func.run(
    function="question-answering",
    local=True,
    handler="answer_questions",
    inputs={"data_path": os.path.abspath("./calls")},
    params={
        "model_name": qa_model,
        "device_map": "auto",
        "text_wrapper":TEXT_WRAPPER,
        "questions": QUESTIONS,
        "questions_wrapper": QUESTIONS_WRAPPER,
        "questions_columns": qa_questions_columns,
        "questions_config": questions_config, # This time we add 'questions_config'
    },
    returns=[
        "question_answering_df: dataset",
        "question_answering_errors: result",
    ],
)

(3.) Review results#

demo3_run.outputs