Summary

Experiments Plan

Step-by-Step Experiment Plan

Step 1: Dataset Preparation

Use the MovieLens-25M dataset with temporal information. Preprocess the data to include user demographic attributes (age, gender, occupation) and movie attributes (genre, release year, etc.). Split the data into training, validation, and test sets while preserving temporal order.

Step 2: Baseline LLM Recommender

Fine-tune a pre-trained LLM (e.g., GPT-3.5) on the MovieLens dataset to create a baseline recommender system. Use prompts like 'Recommend a movie for a [age] [gender] who likes [favorite genres].' Evaluate the baseline recommender using standard metrics (NDCG, MAP) and fairness measures (demographic parity, equal opportunity) computed over sliding time windows.

Step 3: Fairness Perception Elicitation

Design a set of prompts for the LLM to engage users in dialogues about fairness in movie recommendations. Example prompt: 'How important is it to you that movie recommendations are balanced across different genres? What about representation of diverse actors and directors?' Extract structured fairness criteria from user responses using named entity recognition and sentiment analysis.

Step 4: Adaptive Fairness Aggregation

Implement an aggregation mechanism that combines individual user fairness criteria into a global fairness objective. Use techniques like weighted majority voting or Borda count to balance different user preferences. Update this aggregated objective periodically (e.g., daily or weekly) based on new user feedback.

Step 5: Online Fairness Fine-tuning

Develop an online learning algorithm that fine-tunes the LLM's outputs based on the evolving fairness objective. Use a multi-armed bandit approach to balance exploration of new fairness criteria with exploitation of known good practices. Implement this as a prompt augmentation technique that dynamically adjusts the LLM's recommendation generation process.

Step 6: Evaluation

Compare DFA against the baseline recommender and static fairness approaches. Use a sliding window approach to compute recommendation quality metrics (NDCG, MAP) and fairness measures (demographic parity, equal opportunity) over time. Analyze how these metrics change as user preferences and the fairness objective evolve.

Step 7: User Study

Conduct a small-scale user study with 50-100 participants. Ask users to interact with both the baseline and DFA-enhanced recommender systems over a period of 2-4 weeks. Collect feedback on perceived fairness, recommendation quality, and overall satisfaction. Use qualitative analysis techniques to identify themes in user responses.

Step 8: Ablation Studies

Perform ablation studies to assess the impact of each component of DFA. Compare performance with and without the fairness perception elicitation, adaptive aggregation, and online fine-tuning components. This will help identify which aspects of the framework contribute most to improved fairness and recommendation quality.

Test Case Examples

Baseline Prompt Input

Recommend a movie for a 35-year-old female who likes action and sci-fi movies.

Baseline Prompt Expected Output

I recommend 'The Matrix' (1999). It's a groundbreaking sci-fi action film that combines intense action sequences with mind-bending concepts. The movie features strong performances, innovative special effects, and a compelling storyline that should appeal to fans of both action and science fiction genres.

Proposed Prompt Input (Fairness Perception Elicitation)

How important is it to you that movie recommendations include diverse representation in terms of directors, actors, and cultural perspectives? Please explain your thoughts on balancing this with your genre preferences.

Proposed Prompt Expected Output (Fairness Perception Elicitation)

Diversity in movie recommendations is quite important to me. I believe it's valuable to see different perspectives and experiences in film. However, I wouldn't want this to completely override my genre preferences. Ideally, I'd like to see a mix of movies that includes diverse representations within the action and sci-fi genres I enjoy. For example, recommending sci-fi films from different countries or action movies with female directors would be great.

Proposed Prompt Input (DFA-enhanced Recommendation)

Recommend a movie for a 35-year-old female who likes action and sci-fi movies, taking into account her preference for diverse representation while maintaining genre alignment.

Proposed Prompt Expected Output (DFA-enhanced Recommendation)

I recommend 'Black Panther' (2018) directed by Ryan Coogler. This film brilliantly combines elements of action and sci-fi, aligning with your genre preferences. It offers groundbreaking representation with a predominantly Black cast and explores themes of African culture and technology. The movie features strong female characters, innovative special effects, and a compelling storyline that should appeal to fans of both action and science fiction genres while also providing the diverse representation you value.

Explanation

The DFA-enhanced recommendation takes into account both the user's genre preferences and their expressed interest in diverse representation. It suggests a movie that not only fits the action and sci-fi genres but also offers diversity in terms of cast, cultural perspective, and strong female characters. This demonstrates how DFA can adapt recommendations to evolving fairness criteria while maintaining relevance to user preferences.

Fallback Plan

If the proposed DFA method doesn't significantly improve fairness metrics or user satisfaction compared to baselines, we can pivot the project to focus on analyzing why dynamic fairness adaptation is challenging in this context. We could conduct a more in-depth analysis of how user fairness perceptions change over time and how these changes correlate with recommendation quality and satisfaction. This could involve developing a taxonomy of fairness criteria in movie recommendations and tracking their prevalence and importance to users over time. Additionally, we could investigate potential barriers to implementing dynamic fairness in recommender systems, such as conflicting user preferences or limitations in LLM understanding of complex fairness concepts. This analysis could provide valuable insights for future work on adaptive fairness in AI systems and help identify specific challenges that need to be addressed in the next generation of fair recommender systems.

Paper ID

Title

Introduction

Problem Statement

Motivation

Proposed Method

Experiments Plan

Step-by-Step Experiment Plan

Test Case Examples

Fallback Plan

References