Paper ID

ad5c9772c273eabe06401bb0d4375b345ea81993


Title

Integrating Gated Attention with Two-layer LSTM to enhance Roman Urdu sentiment analysis.


Introduction

Motivation

The source paper is Paper 0: Sentiment analysis for Urdu online reviews using deep learning models (31 citations, 2021). This idea builds on a progression of related work Paper 1.

The progression from the source paper to Paper 0 highlights a significant advancement in addressing the challenges of Urdu sentiment analysis by introducing a meta-learning ensemble model. This model effectively combines multiple classifiers to improve accuracy and reduce overfitting, which were limitations in the source paper. However, both papers focus primarily on model performance and accuracy. There is an opportunity to explore the interpretability of these models, which is crucial for understanding how sentiment is derived from complex linguistic structures in Urdu. By focusing on model interpretability, we can provide deeper insights into the decision-making process of these models, which can be valuable for further improving sentiment analysis techniques in under-resourced languages like Urdu.

Hypothesis

Integrating a Gated Attention Layer with a Two-layer LSTM in a CNN-LSTM architecture will improve the interpretability and classification accuracy of Roman Urdu sentiment analysis compared to models without this integration.

Research Gap

Existing research has extensively explored attention mechanisms in CNN-LSTM models for sentiment analysis, but the specific combination of a Gated Attention Layer with a Two-layer LSTM for Roman Urdu sentiment analysis remains unexplored. This gap is critical as it could enhance the model's ability to focus on key sentences, improving interpretability and classification accuracy.

Hypothesis Elements

Independent variable: Integrating a Gated Attention Layer with a Two-layer LSTM in a CNN-LSTM architecture

Dependent variable: interpretability and classification accuracy of Roman Urdu sentiment analysis

Comparison groups: CNN-LSTM architecture with Gated Attention Layer and Two-layer LSTM vs. models without this integration (CNN-LSTM Model and Two-layer LSTM Model)

Baseline/control: Standard CNN-LSTM architecture without the Gated Attention Layer and Two-layer LSTM Model without the Gated Attention mechanism

Context/setting: Roman Urdu sentiment analysis

Assumptions: The Gated Attention Layer can dynamically measure sentence importance and the Two-layer LSTM can capture long-term dependencies in text

Relationship type: Causation (integration will improve performance)

Population: Roman Urdu text data

Timeframe: Not specified

Measurement method: Accuracy, Precision, Recall, F1 Score, Confusion Matrix, and visualization of attention weights


Proposed Method

Overview

This research aims to enhance sentiment analysis in Roman Urdu by integrating a Gated Attention Layer with a Two-layer LSTM within a CNN-LSTM architecture. The Gated Attention Layer will dynamically measure sentence importance, allowing the model to focus on key sentences that contribute significantly to sentiment classification. This is achieved by applying a gating mechanism over sentence representations, which selectively filters out less relevant information. The Two-layer LSTM will capture long-term dependencies in the text, preserving the context necessary for accurate sentiment analysis. This combination is expected to improve interpretability by highlighting key sentences and enhance classification accuracy by focusing on the most informative features. The approach addresses the gap in existing research by exploring a novel configuration that leverages the strengths of both attention mechanisms and deep learning architectures. The evaluation will be conducted using Roman Urdu datasets, with metrics such as precision, recall, and F1 score to assess performance improvements.

Background

Gated Attention Layer: The Gated Attention Layer is implemented to measure sentence importance by applying a gating mechanism over sentence representations. This layer selectively filters out less relevant information, allowing the model to focus on key sentences that contribute to sentiment classification. The gating mechanism is integrated with the CNN component to enhance local feature extraction. This specific value was selected for its ability to improve interpretability by highlighting key sentences, which is crucial for understanding sentiment expressions in Roman Urdu. The expected role of this variable is to enhance the model's ability to classify sentiment polarities by focusing on the most informative features within a sentence.

Two-layer LSTM: The Two-layer LSTM is designed to capture long-term dependencies in sentiment analysis tasks. This architecture involves stacking two LSTM layers, which enhances the model's ability to capture complex temporal patterns in the data. The LSTM layers are configured with a specific number of hidden units, which can be adjusted based on the dataset size and complexity. This specific value was selected for its effectiveness in handling the morphological complexities of Roman Urdu, as demonstrated by its superior performance in sentiment classification tasks. The expected role of this variable is to preserve long-term dependencies, providing context necessary for accurate sentiment analysis.

Implementation

The hypothesis will be implemented by integrating a Gated Attention Layer with a Two-layer LSTM within a CNN-LSTM architecture. The Gated Attention Layer will be implemented as a module that applies a gating mechanism over sentence representations, dynamically adjusting attention weights based on input data. This layer will be positioned after the CNN component, which extracts local features from the text. The Two-layer LSTM will be configured to capture long-term dependencies, with the first LSTM layer processing the input sequence and the second layer refining the context representation. The outputs from the Gated Attention Layer will be fed into the Two-layer LSTM, allowing the model to focus on key sentences while preserving context. The final classification will be performed by a dense layer, which outputs sentiment predictions. The integration of these components will be achieved by linking the outputs of the Gated Attention Layer to the inputs of the Two-layer LSTM, ensuring that the attention weights influence the LSTM's processing of the text. The implementation will involve setting up the model architecture in a Python-based deep learning framework, such as TensorFlow or PyTorch, and training it on Roman Urdu datasets. The evaluation will involve comparing the model's performance against baseline models without the Gated Attention Layer, using metrics such as precision, recall, and F1 score.


Experiments Plan

Operationalization Information

Please implement an experiment to test whether integrating a Gated Attention Layer with a Two-layer LSTM in a CNN-LSTM architecture improves the interpretability and classification accuracy of Roman Urdu sentiment analysis compared to models without this integration.

Dataset

Please use a Roman Urdu sentiment analysis dataset. The Roman Urdu Dataset for Sentiment Analysis (RUDSA) or similar datasets can be used. If not readily available, you may need to:
1. Find and download a Roman Urdu sentiment dataset from sources like Kaggle, GitHub, or academic repositories
2. Preprocess the dataset to ensure it's in a suitable format for sentiment analysis (text and sentiment labels)
3. Split the dataset into training (70%), validation (15%), and test (15%) sets

Models Implementation

Implement the following models for comparison:

Baseline Models

  1. CNN-LSTM Model: Implement a standard CNN-LSTM architecture without the Gated Attention Layer
  2. CNN component for local feature extraction
  3. Single-layer LSTM for sequence modeling
  4. Dense layer for final classification

  1. Two-layer LSTM Model: Implement a model with two stacked LSTM layers but without the Gated Attention mechanism
  2. Two stacked LSTM layers (128 units each)
  3. Dense layer for final classification

Experimental Model

Gated Attention CNN-LSTM Model: Implement the proposed architecture with the following components:
1. CNN component for local feature extraction
2. Gated Attention Layer that:
- Takes sentence representations from the CNN
- Applies a gating mechanism to selectively filter information
- Dynamically adjusts attention weights based on input data
3. Two-layer LSTM (128 units each) that:
- Processes the outputs from the Gated Attention Layer
- Captures long-term dependencies in the text
4. Dense layer for final classification

Gated Attention Layer Implementation

Implement a custom Gated Attention Layer with the following characteristics:
1. The layer should take sentence representations as input
2. Apply a gating mechanism using sigmoid activation to determine importance weights
3. The gating mechanism should be defined as: g = σ(W_g * x + b_g), where σ is the sigmoid function
4. The attention mechanism should be defined as: a = tanh(W_a * x + b_a)
5. The final output should be: output = g * a, where * represents element-wise multiplication
6. The layer should be configurable with parameters for the hidden dimension

Model Training

Train all models with the following configuration:
1. Use binary cross-entropy loss for binary sentiment classification (positive/negative)
2. Use Adam optimizer with learning rate of 0.001
3. Implement early stopping based on validation loss with patience of 5 epochs
4. Use batch size of 32
5. Train for a maximum of 20 epochs

Evaluation

Evaluate all models using the following metrics:
1. Accuracy
2. Precision
3. Recall
4. F1 Score
5. Confusion Matrix

Also, implement a visualization of attention weights from the Gated Attention Layer to demonstrate interpretability. This should show which parts of the input text the model is focusing on for its predictions.

Pilot Experiment Settings

Implement a global variable PILOT_MODE with three possible settings: MINI_PILOT, PILOT, or FULL_EXPERIMENT.

  1. MINI_PILOT:
  2. Use only 100 samples from the training set
  3. Train for maximum 5 epochs
  4. Use 20 validation samples
  5. Test on 20 samples
  6. Purpose: Quick code verification and debugging

  1. PILOT:
  2. Use 1000 samples from the training set
  3. Train for maximum 10 epochs
  4. Use 200 validation samples
  5. Test on 200 samples from the validation set (not the test set)
  6. Purpose: Verify if the approach shows promising results

  1. FULL_EXPERIMENT:
  2. Use the entire training dataset
  3. Train for maximum 20 epochs
  4. Use the entire validation set
  5. Test on the entire test set
  6. Perform hyperparameter tuning if needed
  7. Purpose: Complete experiment for final results

Start by running the MINI_PILOT first. If everything looks good, proceed to the PILOT. After the PILOT completes, stop and do not run the FULL_EXPERIMENT (a human will manually verify the results and make the change to FULL_EXPERIMENT if appropriate).

Statistical Analysis

Perform the following statistical analyses:
1. Calculate mean and standard deviation of all metrics across 5 independent runs with different random seeds
2. Perform paired t-tests to determine if differences between models are statistically significant (p < 0.05)
3. Generate box plots to visualize the distribution of performance metrics across runs

Required Output

  1. Training and validation loss curves for all models
  2. Evaluation metrics (accuracy, precision, recall, F1) for all models on the test set
  3. Confusion matrices for all models
  4. Visualization of attention weights for sample inputs
  5. Statistical significance test results
  6. Summary report comparing the performance of baseline and experimental models

Please implement this experiment using TensorFlow or PyTorch, with a preference for TensorFlow if both are available.


References

  1. Sentiment analysis for Urdu online reviews using deep learning models (2021). Paper ID: ad5c9772c273eabe06401bb0d4375b345ea81993

  2. Contextually Enriched Meta-Learning Ensemble Model for Urdu Sentiment Analysis (2023). Paper ID: 127002d6266efe58048300ebe35cadb7965a3d24

  3. Fine-Grained Feature Extraction in Key Sentence Selection for Explainable Sentiment Classification Using BERT and CNN (2025). Paper ID: e67fc9bec879cb8dd31aec5b51e83b9dd8949576

  4. Deep Sentiment Analysis Using CNN-LSTM Architecture of English and Roman Urdu Text Shared in Social Media (2022). Paper ID: b52958224f11c5a3940ef44228f0008ed3df66b0