Paper ID

0de0a44b859a3719d11834479112314b4caba669


Title

AttentionFlow: A Dynamic Visualization Tool for Interpreting Transformer Models


Introduction

Problem Statement

Current attention visualization tools for transformer models like BERT and GPT-2 often focus on static heatmaps, which fail to capture the dynamic nature of attention across layers and don't provide insights into how information propagates through the model. This limits our understanding of how these models process and integrate information, potentially obscuring important patterns or bottlenecks in the model's reasoning process.

Motivation

Existing visualization tools typically use heatmaps or line graphs to visualize attention weights for individual layers or heads. While some advanced tools allow for interactive exploration of attention patterns, they don't provide a comprehensive view of information flow through the entire model. By visualizing the flow of attention and information across layers, we can gain deeper insights into how transformer models process and integrate information, potentially uncovering unexpected patterns or bottlenecks in the model's reasoning process. This approach is inspired by the dynamic nature of information processing in neural networks and aims to provide a more intuitive and comprehensive understanding of transformer models' inner workings.


Proposed Method

We propose 'AttentionFlow', a novel visualization tool that treats attention as a dynamic process flowing through the transformer layers. The core idea is to represent each token as a node in a graph, with edges representing attention connections. As we move through layers, the graph evolves, showing how attention shifts and information propagates. We use a force-directed graph layout algorithm that updates in real-time as the user explores different layers. The size of each node represents the amount of 'information' it holds, calculated using a novel metric combining attention scores and value vector magnitudes. Edge thickness represents attention strength, and color gradients show the direction of information flow. Users can 'play' the attention flow like a video, seeing how it evolves across layers. Additionally, we implement a 'token tracing' feature that allows users to select a specific token and visualize its influence throughout the network, highlighting potential long-range dependencies.


Experiments Plan

Step-by-Step Experiment Plan

Step 1: Data Preparation

Select a set of diverse input sequences from standard NLP datasets such as GLUE, SQuAD, and WikiText-103. Ensure a mix of short and long sequences, as well as different types of tasks (e.g., classification, question answering, language modeling).

Step 2: Model Selection

Choose BERT-base and GPT-2-medium as our target models for visualization. Use the Hugging Face Transformers library to load pre-trained versions of these models.

Step 3: Attention Extraction

Implement a function to extract attention weights and value vectors from all layers of the selected models for a given input sequence. Use the transformers library's model hooks to access these intermediate outputs.

Step 4: Information Flow Metric

Develop a novel metric to quantify the 'information' held by each token at each layer. This metric should combine attention scores and value vector magnitudes. For example: Info(token_i, layer_l) = sum(attention_scores_i * ||value_vectors_i||) across all attention heads.

Step 5: Graph Construction

For each layer, construct a graph where nodes represent tokens and edges represent attention connections. Use the NetworkX library for graph operations. Node size should be proportional to the information metric, edge thickness to attention strength, and edge color to indicate direction of information flow.

Step 6: Visualization Implementation

Use D3.js to create an interactive web-based visualization. Implement the force-directed graph layout and animation between layers. Add controls for playing through layers, adjusting animation speed, and selecting specific tokens for tracing.

Step 7: Token Tracing Feature

Implement a feature that allows users to select a specific token and highlight its connections and influence across all layers. This should visually emphasize the selected token's node and all its incoming and outgoing edges throughout the network.

Step 8: User Interface Design

Design an intuitive user interface that allows users to input custom sequences, select models, and control the visualization. Include options for adjusting graph layout parameters and exporting visualizations.

Step 9: Performance Optimization

Optimize the visualization for performance, especially for longer sequences. This may involve techniques like edge pruning (only showing strongest connections) or using WebGL for rendering.

Step 10: Evaluation

Conduct a user study with 20 NLP researchers and practitioners. Have them use both AttentionFlow and a traditional heatmap visualization tool (e.g., BertViz) for analyzing attention in specific NLP tasks. Collect quantitative feedback on tool usability and qualitative feedback on insights gained.

Step 11: Comparative Analysis

Compare insights gained using AttentionFlow versus traditional heatmap visualizations on tasks such as sentiment analysis and named entity recognition. Document specific cases where AttentionFlow revealed patterns not apparent in static visualizations.

Step 12: Quantitative Evaluation

Measure correlations between our 'information flow' metric and traditional interpretability measures like integrated gradients. Analyze how well our metric predicts important tokens for model decision-making.

Test Case Examples

Baseline Prompt Input (Traditional Heatmap)

Visualize attention weights for the sentence 'The cat sat on the mat.' using BERT-base model.

Baseline Prompt Expected Output (Traditional Heatmap)

A static heatmap showing attention weights for each layer and head, with rows and columns representing tokens.

Proposed Prompt Input (AttentionFlow)

Visualize attention flow for the sentence 'The cat sat on the mat.' using BERT-base model.

Proposed Prompt Expected Output (AttentionFlow)

An interactive graph visualization showing tokens as nodes and attention as edges. The graph evolves through layers, with node sizes changing based on information content and edge thicknesses representing attention strength. Users can play through layers, trace specific tokens, and interact with the graph.

Explanation

AttentionFlow provides a dynamic, intuitive visualization of how attention and information flow through the model. Unlike static heatmaps, it allows users to see how attention patterns evolve across layers and trace the influence of specific tokens, potentially revealing long-range dependencies and complex reasoning patterns that are not apparent in traditional visualizations.

Fallback Plan

If the proposed AttentionFlow visualization does not provide significant improvements in interpretability over traditional methods, we can pivot the project in several ways. First, we could focus on analyzing why the dynamic visualization is not as effective as expected, which could provide insights into the limitations of current attention-based interpretability methods. This analysis could involve comparing our information flow metric with other interpretability measures across a wide range of NLP tasks, potentially revealing task-specific patterns in how transformers process information. Alternatively, we could extend the tool to visualize not just attention, but also other aspects of the transformer architecture, such as feed-forward network activations or residual connections. This could provide a more comprehensive view of information flow in these models. Finally, we could shift focus to using our visualization method for model debugging and improvement, analyzing how attention patterns change during fine-tuning or how they differ between well-performing and poorly-performing models on specific tasks.


References

  1. Attention is not Explanation (2019)
  2. Seq2seq-Vis: A Visual Debugging Tool for Sequence-to-Sequence Models (2018)
  3. Analyzing the Structure of Attention in a Transformer Language Model (2019)
  4. Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models (2025)
  5. Toward Practical Usage of the Attention Mechanism as a Tool for Interpretability (2022)
  6. Who Reasons in the Large Language Models? (2025)
  7. Visual Interrogation of Attention-Based Models for Natural Language Inference and Machine Comprehension (2018)
  8. Interactive Visualization and Manipulation of Attention-based Neural Machine Translation (2017)
  9. Naturalness of Attention: Revisiting Attention in Code Language Models (2023)
  10. Understanding Matching Mechanisms in Cross-Encoders (2025)