question_answering package#

Submodules#

question_answering.question_answering module#

class question_answering.question_answering.PollQuestionHandler(poll_count: int = 5, poll_strategy: str = 'most_common')[source]#

Bases: question_answering.question_answering.QuestionHandler

Static class to hold all the possible poll question configurations options keys

class ConfigKeys[source]#

Bases: object

A class for handling questions answering for poll type questions. These type of question are answered by asking the same question multiple times and choosing the most common answer or the average answer.

POLL_COUNT = 'poll_count'#

The number of times to ask the same question.

POLL_STRATEGY = 'poll_strategy'#

The strategy to use for choosing the answer from the poll.

class Strategy(value)[source]#

Bases: enum.Enum

An enumeration.

AVERAGE = 'average'#

The average answer strategy.

MOST_COMMON = 'most_common'#

The most common answer strategy.

static average(answers)[source]#

Calculate the average answer for a given list of answers.

do(answers)[source]#

Perform the strategy.

static most_common(answers)[source]#

Calculate the most common answer for a given list of answers.

answer(questions_amount: int, batched_input: List[str], generation_pipeline: transformers.Pipeline, generation_config: transformers.GenerationConfig)List[List[str]][source]#

Answer questions with a context to the given text files contents by a pretrained LLM model in given pipeline.

class question_answering.question_answering.QuestionHandler[source]#

Bases: object

A class for handling questions answering for a given question type. This class is used as a base class for all question types, and for default question type (regular question answering without any special handling).

class ConfigKeys[source]#

Bases: object

answer(questions_amount: int, batched_input: List[str], generation_pipeline: transformers.Pipeline, generation_config: transformers.GenerationConfig)List[List[str]][source]#

Answer questions with a context to the given text files contents by a pretrained LLM model in given pipeline.

class question_answering.question_answering.QuestionTypes[source]#

Bases: object

DEFAULT = 'default'#
POLL = 'poll'#
question_answering.question_answering.answer_questions(data_path: Union[str, List[str]], model_name: str, questions: Union[List[str], List[List[str]]], device_map: Optional[Union[str, dict]] = None, model_kwargs: Optional[dict] = None, auto_gptq_exllama_max_input_length: Optional[int] = None, tokenizer_name: Optional[str] = None, tokenizer_kwargs: Optional[dict] = None, text_wrapper: Union[str, List[str]] = '', questions_wrapper: Union[str, List[str]] = '', generation_config: Optional[Union[Dict, List[Dict]]] = None, questions_config: Optional[Union[Dict, List[Dict]]] = None, batch_size: int = 1, questions_columns: Optional[List[str]] = None, verbose: bool = False)Tuple[pandas.core.frame.DataFrame, dict][source]#

Answer questions with a context to the given text files contents by a pretrained LLM model. Each text file will have the following prompt built:

start of text_wrapper <text file content> end of text_wrapper

start of questions_wrapper 1. <questions[0]> 2. <questions[1]> … n. <questions[n-1]> end of questions_wrapper

Parameters
  • data_path – A path to a directory of text files or a path to a text file to ask questions about.

  • model_name – The pre-trained model name from the huggingface hub to use for asking questions.

  • questions – The questions to ask. A list of lists of questions to ask per text file, and devided by question groups, the groups can be dtermained by size (in order to avoid large inputs to the llm) or by questioning method (regular or poll like questioning).

  • device_map – A map to use for loading the model on multiple devices.

  • model_kwargs – Keyword arguments to pass for loading the model using HuggingFace’s transformers.AutoModelForCausalLM.from_pretrained function.

  • auto_gptq_exllama_max_input_length – For AutoGPTQ models to set and extend the model’s input buffer size.

  • tokenizer_name – The tokenizer name from the huggingface hub to use. If not given, the model name will be used.

  • tokenizer_kwargs – Keyword arguments to pass for loading the tokenizer using HuggingFace’s transformers.AutoTokenizer.from_pretrained function.

  • text_wrapper – A wrapper for the file’s text. Will be added at the start of the prompt. Must have a placeholder (‘{}’) for the text of the file.

  • questions_wrapper – A wrapper for the questions received. Will be added after the text wrapper in the prompt template. Must have a placeholder (‘{}’) for the questions.

  • generation_config – HuggingFace’s GenerationConfig keyword arguments to pass to the generate method.

  • questions_config – A dictionary or list of dictionaries containing specific ways to answer questions (using a poll for example), each dictionary in the list is for corresponding question group and determines the question asking method for said group.

  • batch_size – Batch size for inference.

  • questions_columns – Columns to use for the dataframe returned.

  • verbose – Whether to present logs of a progress bar and errors. Default: True.

Returns

A tuple of:

  • A dataframe dataset of the questions answers.

  • A dictionary of errored files that were not inferred or were not answered properly.

question_answering.question_answering.open_mpi_handler(worker_inputs: List[str], root_worker_inputs: Optional[Dict[str, Any]] = None)[source]#

Module contents#