HARPA Summary

The source paper is "Finding Invariants of Distributed Systems: It's a Small (Enough) World After All" (50 citations, 2021, ID: cf61eb66474f59f7c0a67a77c337b58a3bd87bef). This idea builds on a progression of related work [4d95009229862b2f3d1917f50242e960e1c5fe12, b3224f23941bebab1c1e669eab8051d24f78c37f, 8defb15a763a094876d6d4d670bd643d977b4bbe, 528ebaaace61d47559447c04a800a8a338fb4040, e0b284fa49cea94043b8d7a551c273ed115d4d95, 4df564e3a8d4369a9bf55fb85e4d774324189c29].

The progression of research from the source paper to the related papers shows a clear trajectory towards automating the discovery and verification of invariants in distributed systems. The advancements made in each paper address specific challenges such as scalability, proof burden, and the integration of LLMs for invariant synthesis. However, there remains a gap in effectively combining the strengths of LLMs with traditional static analysis to enhance the accuracy and efficiency of invariant discovery. A research idea that leverages the ASD Agent's capabilities to explore this integration could provide significant advancements in the field.

Hypothesis

Integrating LLM-guided symbolic reasoning with error specification inference will enhance the precision and recall of invariant discovery in Paxos consensus protocols compared to using either method independently.

Research Gap

Existing research has not extensively explored the integration of LLM-guided symbolic reasoning with error specification inference to enhance invariant discovery in Paxos consensus protocols. This gap is significant because it overlooks the potential for symbolic reasoning to refine error specifications, which could improve the precision and recall of invariant discovery.

Hypothesis Elements

Independent variable: Integration of LLM-guided symbolic reasoning with error specification inference

Dependent variable: Precision and recall of invariant discovery in Paxos consensus protocols

Comparison groups: Integrated approach vs. LLM-guided symbolic reasoning alone vs. error specification inference alone

Baseline/control: Using either LLM-guided symbolic reasoning or error specification inference independently

Assumptions: LLM-guided symbolic reasoning and error specification inference can be effectively integrated at the data processing level

Measurement method: Precision (true positives / (true positives + false positives)) and recall (true positives / (true positives + false negatives)) metrics compared against known invariants as ground truth

Overview

This research proposes integrating LLM-guided symbolic reasoning with error specification inference to improve invariant discovery in Paxos consensus protocols. LLM-guided symbolic reasoning leverages the logical expression capabilities of LLMs to guide the symbolic reasoning process, enhancing its adaptability and effectiveness in handling complex scenarios. Error specification inference involves identifying function return values upon error, which aids in program understanding and error-handling bug detection. By combining these methods, the research aims to refine error specifications, thereby improving the precision and recall of invariant discovery. This integration is expected to provide a more comprehensive analysis by using symbolic reasoning to enhance the error specification process, leading to more accurate invariant discovery. The approach addresses the gap in existing research by exploring a novel combination of methods that have not been extensively tested together. The expected outcome is a significant improvement in the accuracy and efficiency of invariant discovery, which is crucial for verifying the safety and reliability of distributed protocols like Paxos.

Background

LLM-guided symbolic reasoning: LLM-guided symbolic reasoning uses large language models to enhance symbolic reasoning by providing domain knowledge and conceptual understanding. This method is particularly useful for complex problems where traditional symbolic reasoning might struggle. In this research, it will be used to guide the error specification inference process, providing insights that enhance the precision of invariant discovery. The expected role of this variable is to improve the adaptability and effectiveness of the error specification process, leading to more accurate invariant discovery.

Error specification inference: Error specification inference identifies the set of values returned by functions upon error, which aids in program understanding and finding error-handling bugs. In this research, it will be used to refine the error specifications in Paxos consensus protocols, enhancing the precision and recall of invariant discovery. The expected role of this variable is to provide a more comprehensive analysis of the protocol, leading to more accurate invariant discovery. The effectiveness of this method will be measured by improvements in precision and recall metrics.

Implementation

The hypothesis will be implemented by integrating LLM-guided symbolic reasoning with error specification inference in the context of Paxos consensus protocols. The process begins with error specification inference, where the system identifies potential error points in the protocol. LLM-guided symbolic reasoning is then applied to refine these error specifications, providing additional insights and logical expressions that enhance the precision of the analysis. The integration occurs at the data processing level, where outputs from the error specification inference are fed into the symbolic reasoning module. This module uses the LLM's capabilities to generate logical expressions that refine the error specifications, leading to more accurate invariant discovery. The system will be evaluated using precision and recall metrics, comparing the integrated approach to baseline models using either method independently. The expected outcome is a significant improvement in the accuracy and efficiency of invariant discovery, demonstrating the effectiveness of the integrated approach.

Operationalization Information

Please implement an experiment to test whether integrating LLM-guided symbolic reasoning with error specification inference enhances the precision and recall of invariant discovery in Paxos consensus protocols compared to using either method independently.

Experiment Overview

This experiment will compare three approaches to invariant discovery in Paxos protocols:
1. Baseline 1: Error specification inference alone
2. Baseline 2: LLM-guided symbolic reasoning alone
3. Experimental: Integrated approach combining both methods

Required Components

1. LLM-guided Symbolic Reasoning Module

Build a module that uses a large language model (GPT-4) to guide symbolic reasoning for invariant discovery. This module should:
- Take as input a Paxos protocol implementation and potential invariants
- Use the LLM to generate logical expressions that represent potential invariants
- Formalize these expressions in a format suitable for verification
- Output refined invariant candidates

2. Error Specification Inference Module

Use the existing error specification inference codeblock to:
- Identify function return values upon error in the Paxos protocol
- Extract potential error conditions and their implications
- Generate initial invariant candidates based on error specifications

3. Integration Module

Create a module that integrates the outputs from both approaches:
- Take error specifications from the Error Specification Inference module
- Feed these specifications to the LLM-guided Symbolic Reasoning module
- Use the LLM to refine and enhance the error specifications
- Generate a final set of invariant candidates

4. Evaluation Framework

Implement an evaluation framework that:
- Uses a set of known invariants for Paxos protocols as ground truth
- Calculates precision (true positives / (true positives + false positives))
- Calculates recall (true positives / (true positives + false negatives))
- Compares the performance of all three approaches

Pilot Mode Implementation

Implement a global variable PILOT_MODE with three possible settings: 'MINI_PILOT', 'PILOT', or 'FULL_EXPERIMENT'.

MINI_PILOT Configuration

Use a simplified Paxos implementation with minimal components
Test with only 5 known invariants as ground truth
Run each approach on just 2-3 protocol instances
Limit LLM calls to 10 per approach
Expected runtime: 10-15 minutes

PILOT Configuration

Use a standard Paxos implementation with core components
Test with 20 known invariants as ground truth
Run each approach on 10 different protocol instances
Limit LLM calls to 50 per approach
Expected runtime: 1-2 hours

FULL_EXPERIMENT Configuration

Use a comprehensive Paxos implementation with all components
Test with all available known invariants as ground truth (50+)
Run each approach on 30+ different protocol instances
No limit on LLM calls
Expected runtime: 8+ hours

Implementation Details

Start by running the MINI_PILOT configuration to verify the code works correctly.
If successful, proceed to the PILOT configuration.
After the PILOT completes, stop and do not run the FULL_EXPERIMENT (this will be manually triggered after human verification).

For the LLM-guided symbolic reasoning module:
Use GPT-4 as the base model
Provide the model with context about Paxos protocols and invariant discovery
Structure prompts to elicit logical expressions that represent invariants
Parse and formalize the LLM outputs into verifiable invariants

For the error specification inference:
Analyze the Paxos protocol implementation to identify error return values
Extract patterns in error handling
Generate potential invariants based on these patterns

For the integrated approach:
First run the error specification inference
Then use the LLM to refine and enhance these specifications
Generate a final set of invariant candidates

For evaluation:
Compare discovered invariants against ground truth
Calculate precision and recall for each approach
Use bootstrap resampling to determine statistical significance of differences
Generate visualizations comparing the three approaches

Data Collection and Reporting

For each approach, record:
The set of discovered invariants
Precision and recall metrics
Runtime performance
Number of LLM calls (for the approaches using LLMs)

Generate a comprehensive report including:
Summary statistics for each approach
Comparative analysis of the three approaches
Statistical significance of any differences
Visualizations of precision and recall
Examples of correctly and incorrectly identified invariants

Log all intermediate results, including:
Raw LLM outputs
Error specifications identified
Invariant candidates before and after refinement

Please implement this experiment following the structure outlined above, starting with the MINI_PILOT configuration.

Paper ID

Motivation