Summary

Experiments Plan

Step-by-Step Experiment Plan

Step 1: Data Preparation

Collect datasets for diverse tasks: (a) Open-domain QA: Natural Questions, (b) Multi-document summarization: Multi-News, (c) Task-oriented dialogue: MultiWOZ. Split each dataset into train, validation, and test sets.

Step 2: Retrieval System Setup

Implement a basic retrieval system using BM25 or Dense Passage Retrieval (DPR) to retrieve relevant passages for each input query or document.

Step 3: Baseline Implementation

Implement standard RAG baselines using GPT-3.5 and GPT-4 APIs. For each input, retrieve top-k passages and use them as context for generation.

Step 4: SSRD Model Implementation

Implement the SSRD model using GPT-3.5 or GPT-4 API. The model should take retrieved passages and a generated answer as input, and output a relevance score for each passage.

Step 5: Self-Supervised Training

Train the SSRD model using the reconstruction objective. For each training example: (a) Retrieve passages, (b) Generate an initial answer using the baseline RAG model, (c) Use SSRD to select a subset of passages, (d) Reconstruct the answer using only the selected passages, (e) Compute reconstruction loss and update the SSRD model.

Step 6: Contrastive Learning

Implement the contrastive learning module. For each positive example (true retrieved passage), generate negative examples by paraphrasing the passage and changing key details. Train the SSRD model to distinguish between positive and negative examples.

Step 7: Inference Pipeline

Implement the full SSRD inference pipeline: (a) Retrieve passages, (b) Use SSRD to select and rerank passages, (c) Use selected passages as context for final answer generation.

Step 8: Evaluation

Evaluate the SSRD model against baselines on all three tasks. Use task-specific metrics: (a) QA: Exact Match and F1 scores, (b) Summarization: ROUGE scores, (c) Dialogue: Task Completion Rate and BLEU scores. Also conduct human evaluation for relevance and factual consistency on a subset of examples.

Step 9: Analysis

Analyze model performance across different tasks and domains. Investigate how the SSRD model adapts to new domains by testing on out-of-distribution datasets.

Test Case Examples

Baseline Prompt Input

Question: What is the capital of France? Context: Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres. The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017.

Baseline Prompt Expected Output

The capital of France is Paris.

Proposed Prompt Input (SSRD Step 1: Relevance Scoring)

Question: What is the capital of France? Retrieved Passages: [1] Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres. [2] The City of Paris is the centre and seat of government of the region and province of Île-de-France, or Paris Region, which has an estimated population of 12,174,880, or about 18 percent of the population of France as of 2017. [3] France is a country located in Western Europe. It shares borders with Belgium, Luxembourg, Germany, Switzerland, Italy, Monaco, Andorra, and Spain. Generated Answer: The capital of France is Paris. Score the relevance of each passage to answering the question.

Proposed Prompt Expected Output (SSRD Step 1: Relevance Scoring)

Relevance scores: [1] 10/10 - This passage directly states that Paris is the capital of France, which is the exact information needed to answer the question. [2] 5/10 - This passage mentions Paris as a seat of government, which is related to being a capital, but doesn't directly state it's the capital of France. [3] 1/10 - This passage provides general information about France but doesn't mention anything about its capital.

Proposed Prompt Input (SSRD Step 2: Final Answer Generation)

Question: What is the capital of France? Relevant Context: Paris is the capital and most populous city of France, with an estimated population of 2,175,601 residents as of 2018, in an area of more than 105 square kilometres. Generate a concise answer to the question based on this context.

Proposed Prompt Expected Output (SSRD Step 2: Final Answer Generation)

The capital of France is Paris.

Explanation

The SSRD method improves over the baseline by explicitly scoring the relevance of each retrieved passage and selecting the most relevant one for the final answer generation. This helps in focusing on the most pertinent information and potentially reducing noise or irrelevant details in the generation process.

Fallback Plan

If the proposed SSRD method doesn't show significant improvements over the baselines, we can pivot the project in several directions. First, we could conduct a detailed error analysis to understand where and why SSRD is failing. This could involve categorizing errors (e.g., relevance misjudgments, factual inconsistencies) and analyzing patterns across different tasks and domains. Second, we could explore variations of the self-supervised training objective, such as incorporating multiple correct answers or using different reconstruction targets. Third, we could investigate the impact of different contrastive learning strategies, including more sophisticated methods for generating negative examples. Finally, if the self-supervised approach doesn't yield improvements, we could explore a hybrid approach that combines lightweight supervision with self-supervised learning, potentially using a small amount of human-labeled data to guide the relevance model. These analyses and variations could provide valuable insights into the challenges of unsupervised relevance learning and inform future research directions in this area.

Paper ID

Title

Introduction

Problem Statement

Motivation

Proposed Method

Experiments Plan

Step-by-Step Experiment Plan

Test Case Examples

Fallback Plan

References