Summary

Experiments Plan

Step-by-Step Experiment Plan

Step 1: Data Preparation

1.1 Download and preprocess the WikiBio dataset.
1.2 Extract entity-attribute pairs from WikiBio entries.
1.3 Generate a set of true claims based on the extracted pairs.
1.4 Create a set of false claims by swapping entities or modifying attributes.
1.5 Split the data into training and testing sets.

Step 2: Implement CSRF Method

2.1 Develop a function to generate contrastive claims for a given input claim.
2.2 Implement the self-consistency scoring mechanism.
2.3 Design the contrastive loss function.
2.4 Create a prompting template for fact verification.

Step 3: Baseline Implementation

3.1 Implement zero-shot prompting baseline.
3.2 Implement few-shot prompting baseline.
3.3 Implement a simple sampling-based method (e.g., majority voting across multiple generations).

Step 4: Model Selection and API Setup

4.1 Set up API access for GPT-3.5 and GPT-4.
4.2 Implement a wrapper function to handle API calls and rate limiting.

Step 5: Training and Evaluation

5.1 For each claim in the training set:
a) Generate contrastive claims
b) Prompt the LLM to verify original and contrastive claims
c) Calculate self-consistency scores
d) Compute contrastive loss
5.2 Evaluate CSRF and baselines on the test set:
a) Calculate accuracy
b) Measure consistency scores
c) Assess robustness to adversarial examples

Step 6: Analysis and Ablation Studies

6.1 Analyze the impact of the number of contrastive examples on performance.
6.2 Evaluate the effectiveness of the self-consistency scoring mechanism.
6.3 Assess the contribution of the contrastive loss function.
6.4 Analyze model performance across different types of claims and entities.

Step 7: Qualitative Analysis

7.1 Manually review a sample of model outputs to assess reasoning quality.
7.2 Analyze cases where CSRF outperforms baselines and vice versa.
7.3 Evaluate the model's ability to generalize to facts outside the WikiBio domain.

Test Case Examples

Baseline Prompt Input

Verify the following claim: Albert Einstein was born in Ulm, Germany.

Baseline Prompt Expected Output

The claim is true. Albert Einstein was indeed born in Ulm, Germany on March 14, 1879.

Proposed Prompt Input

Verify the following claims:
1. Albert Einstein was born in Ulm, Germany.
2. Albert Einstein was born in Munich, Germany.
3. Isaac Newton was born in Ulm, Germany.

Proposed Prompt Expected Output

True. Albert Einstein was born in Ulm, Germany on March 14, 1879.
False. Albert Einstein was born in Ulm, not Munich.
False. Isaac Newton was born in Woolsthorpe-by-Colsterworth, United Kingdom, not Ulm, Germany.

Explanation

The CSRF method prompts the model with multiple related claims, encouraging it to maintain consistency across its judgments. This approach helps the model to be more robust against potential inconsistencies and to leverage its knowledge more effectively.

Fallback Plan

If CSRF does not significantly outperform baselines, we can pivot the project to an analysis paper. We would focus on understanding why the contrastive approach didn't yield the expected improvements. This could involve: 1) Analyzing the quality and diversity of generated contrastive examples to ensure they're challenging yet relevant. 2) Investigating the model's consistency across different types of claims and entities to identify potential weaknesses. 3) Examining the effectiveness of the self-consistency scoring mechanism and exploring alternative formulations. 4) Conducting a more in-depth error analysis to categorize the types of mistakes made by both CSRF and baselines. 5) Exploring how the model's performance varies with different prompting strategies or input formats. This analysis could provide valuable insights into the challenges of zero-resource fact-checking and guide future research in this area.

Paper ID

Title

Introduction

Problem Statement

Motivation

Proposed Method

Experiments Plan

Step-by-Step Experiment Plan

Test Case Examples

Fallback Plan

References