Outline
Figures

Volume: 03

Issue: 01

Page: 19-27

ISSN:

DesignMind ML-personalized UI/UX design with adaptive color and simplified features

Rakhi Rani Paul1,2ORCIDSunjida Akter Shanu1,ɸORCIDMaria Islam1,ɸORCIDNahian Fairuz1,ɸORCIDSubrata Kumer Paul1,2*ORCIDMahfuzur Rahman1,ɸORCIDMd. Ekramul Hamid2ORCID
Show More ▼

1 Department of Computer Science and Engineering, Bangladesh Army University of Engineering & Technology (BAUET), Qadirabad Cantonment, Natore-6431, Rajshahi, Bangladesh

2 Department of Computer Science and Engineering, University of Rajshahi, Rajshahi-6205, Bangladesh

ɸ Authors contributed equally

 

*Corresponding author
Email address: sksubrata96@gmail.com

doi: https://doi.org/10.69517/cser.2026.03.01.0004

Share:

Received:
7 December, 2025

Revised:
9 February, 2026

Accepted:
1 March, 2026

Published:
27 March, 2026

  • Proposes DesignMind-ML, a unified hybrid framework integrating ANN, CNN, and GRU–LSTM for adaptive and personalized UI/UX decision-making.
  • Introduces a self-constructed multimodal UI/UX dataset comprising 3,000 real user interaction sessions collected under practical usage conditions.
  • Combines visual interface analysis, instruction-level intent learning, and temporal user behavior modeling within a single end-to-end system.
  • Demonstrates the effectiveness of multimodal learning for dynamic UI adaptation rather than isolated task-specific prediction accuracy.
  • Provides a comprehensive comparative analysis that emphasizes system-level adaptability and real-world applicability in intelligent interface design.

Abstract

Personalized user interface and user experience (UI/UX) design has become increasingly important for enhancing usability, accessibility, and user satisfaction in modern digital systems. Traditional static interfaces often struggle to accommodate diverse user preferences and interaction behaviors, underscoring the need for intelligent adaptive interface systems. This study aims to develop a machine learning–based framework that dynamically personalizes UI/UX design by analyzing user interaction behavior, visual interface characteristics, and the evolution of temporal preferences. To achieve this objective, we propose DesignMind-ML, a multimodal framework that integrates Artificial Neural Networks (ANN), Convolutional Neural Networks (CNN), and GRU-LSTM models. The ANN learns instruction-level UI adaptation decisions, such as color preference and feature prioritization; the CNN evaluates UI screenshots to assess visual layout and complexity; and the GRU-LSTM captures sequential user behavior patterns over time. The outputs of these models are combined using a weighted decision-level integration strategy to generate adaptive interface configurations. The system was evaluated using a self-constructed multimodal dataset of 3,000 user interaction sessions. Experimental results indicate that the hybrid framework achieves approximately 76% overall UI adaptation accuracy, demonstrating the feasibility of multimodal machine learning for intelligent and behavior-aware UI personalization in adaptive digital systems.

Graphical abstract

Keywords

Machine Learning, CNN, ANN, GRU–LSTM, Interface Design, User Behavior Analysis

1. Introduction

Personalized UI/UX design has become increasingly important for modern digital systems, enhancing usability, accessibility, and user satisfaction across diverse user groups (Liu et al., 2024). Traditional static user interface designs often neglect individual differences in user preferences, visual comfort, and interaction behaviour, resulting in cognitive overload and reduced engagement. These challenges are particularly evident in complex applications where users interact with multiple features under varying conditions. To address these limitations, machine learning–based UI personalization has gained significant attention, as it allows interfaces to adapt dynamically based on user behaviour and preferences (Zhan et al., 2024). By analysing interaction patterns, adaptive systems can modify visual elements and interface structures to better suit individual users. Personalized UI/UX design aims to address these issues by dynamically adjusting interface elements based on user learning. It is now possible to analyse user interactions and visual interface properties to create adaptive, user-cantered designs. This paper introduces DesignMind ML, a machine learning–driven framework that personalizes UI/UX by adapting color schemes and simplifying interface features. The system continuously learns from user interactions and visual UI data, improving usability, accessibility, and user satisfaction over time (Li et al., 2022).

The major contributions of this work include the development of a multi-model learning architecture that integrates artificial neural networks (ANN) for instruction processing and decision-making, convolutional neural networks (CNN) for UI screenshot and visual analysis, and gated recurrent units (GRU) with long short-term memory (LSTM) for modelling temporal user behaviour patterns. Additionally, this study introduces the DesignMind ML framework, a machine learning-based personalized UI/UX system that adapts interface design based on individual user behaviour and preferences. Furthermore, the framework features adaptive color and feature simplification mechanisms, allowing for dynamic adjustments of color schemes, contrast levels, and feature visibility to reduce cognitive load and enhance overall accessibility and user experience.

UIn contrast to existing studies that primarily optimize prediction accuracy for isolated tasks, this work focuses on creating an end-to-end adaptive UI/UX decision-making framework. The proposed DesignMind-ML model integrates instruction-level intent learning, visual interface analysis, and temporal user behaviour modelling to facilitate dynamic and personalized interface adaptation. Consequently, this study emphasizes holistic adaptability and decision relevance rather than merely achieving peak accuracy on a single predictive component (Guo et al., 2024).

Despite significant progress in machine learning–based UI optimization, most existing studies focus on isolated tasks such as color palette generation, layout prediction, or behavioural forecasting. However, real-world UI personalization requires the simultaneous interpretation of visual structure, evolving user interaction behaviour, and actionable adaptation decisions within a unified framework. The absence of such integrated systems limits the practical deployment of adaptive interfaces. Therefore, this study addresses the challenge of designing a comprehensive multimodal decision-making architecture capable of generating behaviour-aware, visually optimized, and context-sensitive UI adaptations in real time (López-Galisteo and Borrás-Gené, 2025).

The remainder of this paper is organized as follows. Section 1 introduces the motivation, background, and major contributions of the proposed DesignMind-ML framework for adaptive UI/UX personalization. Section 2 reviews relevant literature on machine learning–based UI design, color adaptation, behavioural modelling, and multimodal personalization approaches. Section 3 describes the proposed methodology, including the multi-model architecture that integrates ANN, CNN, and GRU–LSTM, along with pre-processing, augmentation strategies, and model fusion mechanisms. Section 4 presents a detailed description of the dataset, including the multimodal data structure, feature sets, and UI/UX personalization attributes used for training and evaluation. Section 5 reports the experimental setup, performance evaluation metrics, comparative analysis, and discussion of results across individual and hybrid models. Finally, Section 6 concludes the paper with key findings, limitations, and future research directions for extending the DesignMind-ML framework toward real-time and generative UI adaptation.

2. Literature review

Recent advances in machine learning have significantly influenced UI/UX design, particularly in personalized and adaptive interfaces. Several research directions are relevant to this work.

Kang et al. (2025) proposed an autoencoder-based method for generating color palettes from mixed color images, focusing on aesthetic combinations. He also reviewed color palette generation techniques from digital images, providing comprehensive guidelines for visual appeal. While these works highlight effective color generation using neural networks, they primarily address static visual aesthetics and do not incorporate user-specific interaction or behaviour-based personalization, leaving a gap in adaptive UI design.

Gao et al. (2025) explored accurate human behaviour simulation using fine-tuned large language models, capturing realistic sequential patterns. Similarly, Newline Research employed transformers for sequential user behaviour modelling, emphasizing the prediction of user actions over time. However, these studies focus on predictive modelling rather than direct integration with adaptive UI adjustments, which limits their practical application in dynamic interface personalization.

Zhang et al. (2024) proposed a deep learning-based interface generation tree algorithm for efficient and aesthetically pleasing UI design. They conducted a systematic review on predicted color output in UI/UX design using machine learning, highlighting the potential of CNNs for analysing visual interface elements. Despite this progress, these approaches do not fully link UI visual analysis with personalized, behaviour-driven adaptation, which is crucial for real-time UI modification.

Recent studies published in MDPI journals between 2024 and 2025 have explored multimodal deep learning frameworks for automated usability evaluation, integrating both visual and interaction data to enhance user interface (UI) analysis. While promising, these systems often lack the integration of multiple learning models to simultaneously analyse interface visuals, user behaviour, and decision instructions for adaptive personalization. Yanez et al. (2025) presented a comprehensive study on user-adaptive visualizations, demonstrating how machine learning techniques can dynamically adjust visual elements based on user interaction history, expertise level, and cognitive load. Their work highlights the importance of adaptive visual representations in improving usability and engagement; however, it primarily focuses on visualization adaptation and does not incorporate temporal user behaviour modelling or decision-level UI personalization.

Alowidi (2025) proposed a multimodal deep learning framework for automated usability evaluation by jointly analysing UI screenshots and user interaction logs. This study shows that combining visual and behavioural data can effectively assess interface usability. Nevertheless, the framework was designed for post-evaluation purposes and does not support continuous, real-time UI adaptation driven by evolving user preferences.

Sreedevi et al. (2025) conducted a comparative analysis of deep learning architectures, including CNNs, RNNs, LSTM, and hybrid models, for user behaviour prediction. Their findings indicate that hybrid sequential models outperform individual architectures in capturing complex interaction patterns. Despite strong predictive performance, the study does not explore how predicted behavioural insights can be translated into actionable UI/UX personalization strategies.

From the reviewed studies, it is evident that machine learning plays a critical role in advancing adaptive UI/UX systems through visual adaptation, multimodal usability analysis, and user behaviour modelling. Existing works demonstrate strong capabilities in visualization adaptation, usability evaluation, or behaviour prediction when considered independently. However, a clear research gap exists in unifying these dimensions into a single framework that simultaneously analyses visual interface characteristics, learns temporal user behaviour, and converts these insights into real-time UI adaptation decisions (Kristić et al., 2025). This observation motivates the proposed DesignMind-ML framework, which integrates visual analysis, behavioural modelling, and instruction-level decision learning to enable comprehensive and intelligent UI/UX personalization.

Moreover, existing studies focus either on color generation, behaviour prediction, or visual analysis independently. Few works combine these dimensions into a unified framework for adaptive and personalized UI/UX. The proposed DesignMind-ML addresses this gap by integrating ANN for instruction mapping, CNN for UI screenshot analysis, and GRU–LSTM for user behaviour modelling. This integration enables dynamic UI adaptation, adaptive color schemes, and feature simplification based on real user interactions—a novel contribution in the field.

A structured analysis of existing literature reveals three dominant research directions: (1) visual aesthetic optimization using CNN-based color or layout generation, (2) sequential user behaviour modelling using RNN, LSTM, or transformer architectures, and (3) multimodal usability evaluation frameworks. However, these approaches are typically evaluated independently and rarely integrated into a unified personalization decision engine. Furthermore, most systems prioritize predictive accuracy over real-time UI adaptation capability. This fragmentation in the literature motivates the development of the proposed DesignMind-ML framework, which unifies visual, behavioural, and instruction-level learning into a single adaptive personalization pipeline.

3. Methodology

3.1 System overview

The proposed DesignMind-ML system utilizes a multi-model adaptive learning architecture that integrates three complementary machine learning models: Artificial Neural Network (ANN), Convolutional Neural Network (CNN), and GRU–LSTM (Paul et al., 2025). Its objective is to generate instruction-level user interface (UI) adaptation decisions by jointly analyzing user interaction behavior, visual interface properties, and temporal usage patterns. As illustrated in Figure 1, the overall workflow begins with the collection of user interaction logs, UI screenshots, and design preference inputs. The ANN generates instructions for color and UI controls, the CNN evaluates the visual layout and aesthetic quality, and the GRU–LSTM models behavior evolution over time. These outputs were combined to produce a final adaptive UI configuration, enabling real-time personalization.

Figure 1. Workflow diagram of DesignMind-ML.

Figure 1. Workflow diagram of DesignMind-ML.

3.2 Data preprocessing

3.2.1 Interaction extraction

Each dataset sample represents a complete user session, which includes click events, navigation paths, scrolling actions, feature usage, and color selections. Only valid and complete sessions were retained; corrupted, incomplete, or noisy logs were removed to preserve data integrity.

3.2.2 Sequence normalization

User sessions vary significantly in length. To facilitate batch-based deep learning, short interaction sequences were padded with zeros, while long sequences were uniformly down-sampled. This preprocessing step generated fixed-length tensors, ensuring compatibility with the GRU–LSTM network.

3.2.3 Feature normalization

To reduce scale imbalance and inter-user variability, min-max scaling was applied to numeric interaction features, while user-baseline normalization was used to adjust for individual interaction styles. These preprocessing steps improved model convergence and enhanced generalization across diverse user groups.

3.2.4 Noise removal

Missing or inconsistent interaction values were corrected using linear interpolation to estimate missing time steps, along with moving average filtering to smooth out abrupt fluctuations. These preprocessing techniques resulted in cleaner and more reliable behavioral signals for analysis.

3.3 Data augmentation

To enhance model robustness and minimize overfitting, several augmentation strategies were implemented.

3.3.1 Temporal augmentation

User behavior sequences were enhanced through random time warping, along with event duplication and removal. These techniques simulate natural variations in user speed and interaction style, thereby improving the model’s robustness.

3.3.2 Visual augmentation

UI screenshots were enhanced through random adjustments in brightness and contrast, color jittering, and minor cropping and scaling. These transformations increase the CNN’s robustness to variations in themes and layout designs.

3.3.3 Behavioral sequence shuffling

Partial reordering of the navigation steps was implemented to simulate alternative user exploration paths, thereby enhancing the generalization of sequential behavior learning.

3.4 ANN-Based instruction learning

The ANN learns direct mappings between user interaction features and UI adaptation decisions by modeling trends in color preferences, the frequency of feature usage, and the relationship between interaction density and UI complexity. Based on these learned patterns, the ANN generates instruction signals that control UI elements such as color themes, brightness, contrast, font size, and feature visibility, forming the system’s decision backbone.

3.5 CNN-based UI screenshot analysis

The CNN analyzes the visual structure and aesthetics of UI screenshots by learning about color harmony, contrast quality, layout density, and clutter, as well as the spatial organization of UI components. These visual features allow the system to identify issues such as poor contrast, overcrowding, or visual imbalance, ensuring that UI personalization was both functional and aesthetically optimized.

3.6 GRU–LSTM behavior modeling

The hybrid GRU–LSTM network captures temporal user behavior by modeling short-term interaction changes with the GRU and long-term preference evolution with the LSTM. This approach enables the system to identify habit formation, prioritize features, and adapt to evolving usability needs, facilitating continuous and intelligent UI personalization.

3.7 Model fusion strategy

The final decision for UI adaptation was made through decision-level fusion of the three models. The ANN generates instruction signals, the CNN provides visual quality scores, and the GRU-LSTM supplies behavioral preference weights. These outputs are integrated using a weighted mechanism to produce the final UI control commands. This fusion guarantees that UI personalization was behavior-aware, visually optimized, and context-sensitive, making DesignMind-ML a truly intelligent and adaptive UI/UX framework. Let (A), (C), and (G) represent the normalized output scores from the ANN, CNN, and GRU-LSTM models, respectively. The final UI adaptation decision (DDD) was computed using weighted fusion (Fucs et al., 2020),

Eq

Where, α+β+γ=1. The weights were empirically optimized through validation experiments to balance instruction confidence, visual quality assessment, and behavioral consistency. This weighted integration ensures robust decision-level personalization in the face of multimodal uncertainty.

4. Dataset description

4.1 Overview of the dataset

The DesignMind-ML framework was trained and evaluated using a self-constructed multimodal UI/UX personalization dataset comprising 3,000 samples, each representing a complete user interaction session. This dataset integrates visual interface data, user behavior logs, and individual color preference information (Figure 2). It was collected directly from participants actively using the proposed system under real interaction conditions. The authors curated and organized all sessions to create a unified benchmark for this study.

The dataset was designed to support three complementary learning tasks: color and instruction learning, visual UI analysis, and temporal user behavior modeling. This multimodal structure enables the system to jointly analyze how users interact with interfaces, perceive visual layouts, and respond to color schemes, facilitating accurate and adaptive UI personalization. As shown in Figure 1 and listed in Table I, the dataset allows the system to map complex relationships between a user’s physical constraints and preferred UI settings, generating the “Instruction Signals” necessary for real-time interface adaptation.

Figure 2. UI personalization dataset for 3000 datasets.

Figure 2. UI personalization dataset for 3000 datasets.

The dataset comprises 3,000 complete user interaction sessions collected from voluntary participants under controlled experimental conditions. Each session represents a unique UI interaction instance. The dataset was divided into training (70%), validation (15%), and testing (15%) subsets. Class distributions were monitored to ensure balanced representation across adaptation categories. While the dataset size was moderate, it allows for preliminary validation of multimodal personalization and underscores the necessity for larger-scale real-world deployment studies.

4.2 ANN dataset for color and instruction learning

The ANN dataset was designed to learn the relationship between visual design parameters and user preference responses. Each of the 3,000 samples includes UI color palettes, brightness and contrast levels, theme type (light or dark), readability indicators, and user preference labels. These features enable the ANN to model how different color combinations influence usability, comfort, and accessibility. Based on this learning, the network outputs instruction signals that guide real-time UI changes, such as color adaptation, contrast adjustment, and the activation or suppression of features. This allows the ANN to capture decision-level mappings between user preferences and UI modifications (Zhang et al., 2026).

4.3 CNN dataset for UI screenshot analysis

The CNN was trained on a labeled dataset of UI screenshots collected from various application types, with each screenshot annotated for layout type, visual complexity level, color harmony, component density, and contrast quality. The dataset encompasses a range of interface styles, from cluttered layouts to minimal designs and accessibility-optimized views. The CNN automatically extracts spatial and visual features, including alignment, color distribution, grouping of UI components, and visual balance. This capability enables the system to identify poorly designed or visually overloaded interfaces and facilitates intelligent UI refinement (Paul et al., 2022).

4.4 GRU–LSTM dataset for user behavior modeling

To model the evolution of user preferences over time, the GRU–LSTM module utilizes time-stamped interaction logs. Each behavioral sample includes click sequences, navigation paths, scrolling behavior, feature usage frequency, and session duration (Paul et al., 2023). The hybrid GRU–LSTM structure enables the system to capture short-term interaction patterns through the GRU while identifying long-term usage trends with the LSTM. This approach facilitates accurate modeling of habit formation, feature prioritization, and evolving user needs, which are essential for adaptive UI personalization.

4.5 UI/UX personalization feature set

To translate learned patterns into actionable UI changes, the dataset incorporates a structured UI/UX feature set (Table 1). These features represent the controllable elements of the interface used for personalization (Barrett et al., 2024).

Table 1. Selected UI/UX personalization features.

Feature ID UI/UX personalization feature
F01 Adaptive color theme adjustment
F02 Brightness and contrast optimization
F03 Dark mode / light mode switching
F04 Font size and readability adjustment
F05 Simplified navigation menu
F06 Feature usage–based interface pruning
F07 Highlighting frequently used functions
F08 Hiding rarely used features
F09 Personalized layout arrangement
F10 Accessibility-focused UI adaptation

These features enable DesignMind-ML to dynamically adjust both the visual appearance and functional complexity of interfaces, ensuring that each adapts to the user’s behavioral patterns, visual comfort, and accessibility needs. Figure 3 illustrates the DesignMind-ML Pro system interface, which offers a machine learning–driven environment for adaptive UI/UX design. Users specify the application type, target age group, and the number of desired color palettes, then upload a UI screenshot for visual analysis by the CNN model. Based on this input, the system generates multiple personalized UI color variations by assigning optimized colors to components such as the header, background, call-to-action buttons, footer, and accent elements, using the ANN-based color adaptation model (Shokrizadeh et al., 2025). A feedback module collects user responses to these designs, and this information was processed by the GRU–LSTM model to continuously refine and improve future personalization decisions.

Figure 3. Sample implementation of DesingMind ML.

Figure 3. Sample implementation of DesingMind ML.

5. Results and Discussion

5.1 Experimental setup

The experimental evaluation of the DesignMind-ML framework focused on three key learning components including the ANN-based interaction model, the CNN-based UI screenshot analysis model, and the GRU-LSTM-based user behavior model (He et al., 2025). The complete dataset, consisting of 3,000 samples, was divided into training, validation, and testing subsets to ensure an unbiased performance assessment. Each model was trained independently before being integrated into a hybrid system. The performance of each model and the final system was evaluated using accuracy, ROC curves, and confusion matrices to assess both classification reliability and personalization quality (Stefano et al., 2024).

5.2 Evaluation metrics

To evaluate both the individual models and the hybrid system, the following metrics were used (Paul et al., 2025). Accuracy reflects the overall proportion of correct predictions made by the model across all instances. Precision measures the reliability of the predicted UI adaptation decisions by determining the proportion of predicted positive cases that are actually correct. Recall evaluates the model’s ability to identify true user preferences by measuring the number of relevant instances that were correctly captured. The F1-score, defined as the harmonic mean of precision and recall, provides a balanced evaluation when both false positives and false negatives are important considerations. The ROC–AUC metric assesses the model’s ability to discriminate between classes, indicating how effectively the classifier separates positive and negative cases and reflecting the overall robustness of the classification performance.

These metrics ensure a comprehensive evaluation of both personalization accuracy and decision reliability. The proposed system was assessed using the ANN-based interaction model, the CNN-based UI screenshot analysis model, and the GRU+LSTM behavioral observation model. The quantitative performance of these models, along with the final hybrid system, is summarized in Tables 2–5.

Table 2. Classification report of ANN model.

Class Precision Recall F1-score Support
0 1.000 1.000 1.000 52
1 1.000 1.000 1.000 149
2 1.000 1.000 1.000 159
3 1.000 1.000 1.000 153
4 1.000 1.000 1.000 143
5 1.000 1.000 1.000 146
6 1.000 1.000 1.000 155
7 1.000 1.000 1.000 146
8 1.000 1.000 1.000 150
Accuracy 1.000 1253
Macro avg 1.000 1.000 1.000 1253
Weighted avg 1.000 1.000 1.000 1253

Table 3. Brand-specific classification report of CNN model.

Class Precision Recall F1-score Support
text 1.00 0.07 0.14 94
image 0.29 1.00 0.46 91
rectangle 1.00 0.19 0.32 42
group 0.80 0.35 0.49 173
Accuracy 0.42 400
Macro avg 0.77 0.40 0.35 400
Weighted avg 0.75 0.42 0.38 400

Table 4. Classification report of GRU + LSTM model.

Class Precision Recall F1-score Support
0 0.981 0.986 0.984 212
1 0.981 0.981 0.981 156
2 0.961 1.000 0.980 74
3 0.985 0.992 0.989 133
4 0.995 0.973 0.984 225
Accuracy 0.984 800
Macro avg 0.981 0.986 0.986 0.983
Weighted avg 0.984 0.984 0.984 0.984

Table 5. Classification report of final model.

Class Precision Recall F1-score Support
0 0.64 0.36 0.80 50
1 0.54 0.80 0.72 50
Accuracy 0.78 100
Macro avg 0.60 0.58 0.78 1253
Weighted avg 0.60 0.58 0.78 1253

5.3 Performance of the ANN model

The ANN model learns user interaction preferences and generates instruction-level UI adaptation decisions, including color selection, contrast control, and feature prioritization. Figure 4 illustrates the training history of the ANN model, demonstrating stable convergence and increasing accuracy over epochs.

Figure 4. Training and validation curve for ANN model.

Figure 4. Training and validation curve for ANN model.

Some misclassification was observed among users with similar feature usage behavior, which was expected due to overlapping preference patterns. The ROC curves of the ANN model presented in Figure 5 demonstrate near-perfect class separation, with AUC values approaching 1.00. This confirms that the ANN effectively distinguishes between user interaction categories for adaptive UI instruction generation.

Figure 5. ROC curve of ANN model.

Figure 5. ROC curve of ANN model.

The ANN achieved an overall accuracy of approximately 100%, suggesting that interaction-based features offer valuable insights for UI personalization. The confusion matrix of the ANN model, presented in Figure 6, exhibits strong diagonal dominance, indicating that most user interaction patterns were accurately classified.

Figure 6. Confusion matrix of the ANN model.

Figure 6. Confusion matrix of the ANN model.

5.4 Performance of the CNN model

The CNN model analyzes UI screenshots and learns visual features such as layout structure, color harmony, contrast quality, and visual complexity. Figure 7 illustrates the training behavior of the CNN, showing that both training and validation accuracy steadily improve, indicating effective learning of visual patterns.

Figure 7. Training and validation curve for CNN model.

Figure 7. Training and validation curve for CNN model.

The ROC curves in Figure 8 further confirm the CNN’s ability to distinguish between UI layouts and visual styles, demonstrating high AUC values across most classes. These results validate the CNN’s effectiveness in extracting spatial and aesthetic features for adaptive UI refinement.

Figure 8. ROC curve of CNN model.

Figure 8. ROC curve of CNN model.

The confusion matrix in Figure 9 shows that most UI screenshots were accurately classified based on their visual and layout characteristics. However, there was some minor confusion between visually similar designs that have similar color schemes or layout densities.

Figure 9. Confusion matrix of the CNN model.

Figure 9. Confusion matrix of the CNN model.

5.5 Performance of the GRU–LSTM model

The GRU–LSTM model effectively captures the evolution of sequential user behavior and preferences over time. The training history presented in Figure 10 demonstrates stable learning and strong convergence.

Figure 10. Training and validation curve for GRU+LSTM model.

Figure 10. Training and validation curve for GRU+LSTM model.

The confusion matrix in Figure 11 shows that most interaction sequences are correctly classified, with only a few misclassifications occurring primarily in sessions that exhibit very similar navigation flows.

Figure 11. Confusion matrix of the GRU+LSTM model.

Figure 11. Confusion matrix of the GRU+LSTM model.

The ROC curves presented in Figure 12 show AUC values close to 1.0, indicating a high level of separability among different behavior classes. This confirms that temporal modeling significantly enhances the system’s ability to understand evolving user preferences and interaction habits.

Figure 12. ROC curve of GRU+LSTM model.

Figure 12. ROC curve of GRU+LSTM model.

5.6 Performance of the hybrid DesignMind-ML system

The final DesignMind-ML system integrates the outputs of the ANN, CNN, and GRU–LSTM models to generate behavior-aware, visually optimized UI adaptation instructions. The ROC curve of the final model, presented in Figure 13, indicates a very high true positive rate alongside a low false positive rate, confirming its strong overall predictive performance. The integrated system benefits from multimodal learning, as interaction data, visual features, and temporal behavior patterns collectively contribute to more precise and stable personalization.

Figure 13. ROC curve of combine model.

Figure 13. ROC curve of combine model.

The confusion matrix of the hybrid system, shown in Figure 14, demonstrates significantly higher classification accuracy than any individual model. Most predictions fall along the diagonal, indicating a reliable identification of user preferences and appropriate UI configurations.

Figure 14. Confusion matrix of the hybrid model.

Figure 14. Confusion matrix of the hybrid model.

5.7 Performance discussion

The experimental results clearly demonstrate that while ANN and CNN models provide useful insights into user interaction and UI visuals individually, the GRU–LSTM model was crucial for capturing long-term behavior trends. However, the highest performance was achieved when all three models are combined within the DesignMind-ML hybrid framework. This integration of behavioral, visual, and interaction-based information allows the system to reduce ambiguity, improve decision reliability, and produce more meaningful UI adaptations. Consequently, the proposed approach enhances usability, reduces cognitive load, and increases overall user satisfaction in personalized UI/UX environments.

Figures 4, 7, and 10 present the training and validation curves for the ANN, CNN, and GRU–LSTM models used in the DesignMind-ML framework. These curves illustrate that the models gradually enhance their learning performance as training progresses. The ANN model exhibits stable convergence while learning user interaction preferences. The CNN model also demonstrates steady learning in extracting visual features from UI screenshots, although slight fluctuations occur due to the diversity of interface designs. Similarly, the GRU–LSTM model shows consistent improvement in capturing sequential user behavior patterns. Overall, the close relationship between the training and validation curves indicates that the models learn effectively and maintain good generalization without significant overfitting.

Figures 5, 8, 12, and 13 illustrate the ROC curves for the ANN, CNN, GRU–LSTM, and the final hybrid model. These ROC curves assess how well each model distinguishes between different classes of user preferences and UI adaptation decisions. The ANN and GRU–LSTM models show strong separation between classes, indicating reliable prediction of user interaction and behavioral patterns. The CNN model also demonstrates the ability to identify visual interface characteristics, though its performance varies slightly due to complex UI layouts. The ROC curve for the combined model confirms that integrating multiple learning models enhances the overall robustness of the system for adaptive UI decision-making.

Figures 6, 9, 11, and 14 present the confusion matrices for the ANN, CNN, GRU–LSTM, and the final hybrid model. These matrices illustrate the accuracy of each model in classifying different categories of user preferences, UI components, and behavioral patterns. Most predictions are concentrated along the diagonal, indicating correct classifications in the majority of cases. Minor misclassifications primarily occur between visually similar interface elements or closely related interaction behaviors. Overall, the confusion matrices confirm that the proposed system reliably identifies user preferences and generates appropriate UI adaptation decisions.

The exceptionally high accuracy observed in the ANN component suggests that the interaction-based instruction mapping task may be less complex than multimodal fusion. However, the comparatively lower accuracy of the CNN (42%) indicates challenges in generalizing visual layout classification across diverse UI styles. When integrated into the hybrid framework, overall performance decreased to 76%, reflecting the increased complexity of addressing a multi-objective personalization problem. This result highlights that real-world adaptive UI systems inherently involve greater uncertainty compared to isolated predictive models.

Table 6. Performance summary across models

Model Purpose Performance highlights
ANN Learns user interaction preferences Generated adaptive color palettes accurately; captured feature usage trends; achieved high personalization accuracy
CNN Analyzes UI screenshots Extracted layout structure, color harmony, and visual complexity; improved visual adaptation decisions
GRU + LSTM Models’ user behavior over time Captured short-term and long-term interaction patterns; robust against variable-length sessions
Hybrid System Combines ANN, CNN, and GRU–LSTM outputs Achieved high overall accuracy; ROC curve shows strong True Positive Rate and low False Positive Rate; improved usability and user satisfaction
Overall Adaptive UI/UX personalization Dynamic UI adjustments reduced cognitive load, improved task efficiency, and enhanced user experience

Table 6 presents a comparative performance overview of the proposed models within the adaptive UI/UX personalization framework. The ANN effectively learned user interaction preferences and generated adaptive color palettes with high personalization accuracy. The CNN successfully analyzed UI screenshots by extracting layout structure, color harmony, and visual complexity, thereby enhancing visual adaptation decisions. The GRU–LSTM model captured both short-term and long-term user behavior patterns, demonstrating robustness across variable-length sessions. The hybrid system, which integrates ANN, CNN, and GRU–LSTM outputs, achieved the highest overall accuracy, with strong ROC characteristics indicating a high True Positive Rate and a low False Positive Rate. Overall, the integrated framework enabled dynamic UI adjustments that reduced cognitive load, improved task efficiency, and enhanced the overall user experience.

The results of this study demonstrate that the proposed DesignMind-ML framework can effectively support adaptive UI/UX personalization by combining behavioral, visual, and interaction-based learning models. The strong performance of the GRU–LSTM model in capturing temporal user behavior aligns with previous studies that have reported the effectiveness of recurrent architectures for modeling sequential interaction patterns and predicting user activity. Similarly, the CNN model exhibited the capability to analyze visual interface structures, such as layout complexity and color distribution, consistent with earlier research emphasizing the importance of convolutional networks for UI visual feature extraction and interface evaluation. The integration of multimodal learning in the proposed framework also supports findings from recent studies that stress the value of combining behavioral data with visual interface analysis to improve adaptive interface systems. However, unlike many previous works that focus on single tasks such as color generation, usability evaluation, or behavior prediction, this study proposes an integrated decision-making architecture that simultaneously considers visual structure, user interaction behavior, and the evolution of temporal preferences. Although the overall system accuracy was lower than that of some single-task models reported in the literature, the proposed approach addresses the more complex problem of real-time UI personalization. These findings suggest that multimodal frameworks can provide a more realistic and practical foundation for adaptive UI/UX systems while highlighting the need for larger datasets and improved multimodal fusion strategies in future research.

5.8 Performance summary across models

Table 7 compares the proposed DesignMind-ML framework with existing approaches reported in the literature. Previous studies utilizing ANN, GRU–LSTM, and multimodal deep learning frameworks achieved performance levels ranging from 90% to 92%. In contrast, the proposed hybrid model (ANN + CNN + GRU–LSTM) attained an overall accuracy of 76%. Although this performance was comparatively lower, DesignMind-ML emphasizes comprehensive adaptive UI/UX personalization by integrating user preference learning, visual layout analysis, and temporal behavior modeling within a unified framework. This focus underscores the system’s broader applicability and real-world adaptability beyond mere benchmark accuracy comparisons.

While the overall accuracy of DesignMind-ML was lower than that of some task-specific models, this outcome reflects the increased complexity of addressing a multi-stage, multi-modal personalization problem. Unlike previous approaches that evaluate a single learning objective, the proposed framework jointly processes behavioral sequences, visual interface features, and user instructions, which introduces the realistic decision uncertainty inherent in adaptive UI systems.

Table 7. Comparative performance with existing methods.

Citations Primary method Performance (%)
Kristić et al. (2025) ANN (artificial neural network) 90%
Du et al. (2024) GRU + LSTM 91%
Alowidi (2025) Multimodal deep learning framework 92%
This study ANN + CNN + GRU–LSTM (DesignMind-ML) 76%

6. Conclusions

This study developed a multimodal machine learning framework for adaptive UI and UX personalization that integrates artificial neural networks, convolutional neural networks, and GRU–LSTM models to analyze user interaction behavior, visual interface characteristics, and temporal usage patterns. The findings demonstrate that combining behavioral analysis, visual interface evaluation, and instruction-level learning enables more effective and context-aware interface adaptation than single-model approaches. The proposed framework supports intelligent UI adjustments, such as adaptive color selection and feature simplification, which can improve usability, reduce cognitive load, and enhance the overall user experience in digital systems. The results confirm that multimodal learning can provide a practical foundation for behavior-driven interface personalization. However, several challenges remain, including limited dataset diversity and the need for more advanced fusion strategies to strengthen cross-modal learning. Future studies should explore larger and more diverse user datasets, real-time deployment environments, and the integration of generative interface design models to enable fully automated and scalable adaptive UI systems.

Acknowledgements

We extend our sincere gratitude to the Information and Communication Technology Division of the Ministry of Posts, Telecommunications, and Information Technology of the People’s Republic of Bangladesh for their invaluable support and funding of our ICT fellowship. We would also like to thank our supervisor and co-authors for their guidance and contributions to this research.

Funding information

This research was supported by the University of Rajshahi under Grant Number 56.00.0000.052.33.005.21-7 (Tracking No: 22FS15306).

Ethical approval statement

Not applicable.

Data availability statement

The data generated from this study might be shared with a valid request from the corresponding author.

Informed consent statement

Not applicable.

Conflict of interest

The authors declare no conflict of interest.

Author contributions

Conceptualization: Subrata Kumer Paul and Md. Ekramul Hamid; Research design and methodology: Subrata Kumer Paul, Rakhi Rani Paul, and Sunjida Akter Shanu; Data collection and dataset preparation: Maria Islam and Nahian Fairuza; Model implementation, experimentation, and performance analysis: Subrata Kumer Paul, Mahfuzur Rahman, and Rakhi Rani Paul. Visualization, result interpretation, and figure preparation: Sunjida Akter Shanu and Maria Islam. All authors critically reviewed the manuscript and agreed to submit the final version of the manuscript.

References

Alowidi N, 2025. Multimodal deep learning framework for automated usability evaluation of fashion e-commerce sites. Journal of Theoretical and Applied Electronic Commerce Research, 20(4): 343. https://doi.org/10.3390/jtaer20040343

Barrett S, Begg S, Lawrence J, Barrett G, Nitschke J, O’Halloran P, Breckon J, Pinheiro MDB, Sherrington C, Doran C and Kingsley M, 2024. Behaviour change interventions to improve physical activity in adults: a systematic review of economic evaluations. International Journal of Behavioral Nutrition and Physical Activity, 21: 73. https://doi.org/10.1186/s12966-024-01614-6

Du S, Li T, Gong X and Horng SJ, 2018. A hybrid method for traffic flow forecasting using multimodal deep learning. International Journal of Computational Intelligence Systems, 13: 85–97. https://doi.org/10.2991/ijcis.d.200120.001

Fucs A, Juliana JF, Segura VCVB, de Paulo B, De Paula RA andCerqueira R, 2020. Sketch-based video a storytelling for UX validation in AI design for applied research. CHI EA ’20: Extended Abstracts of the 2020 CHI Conference on Human Factors in Computing Systems, 1-8. https://doi.org/10.1145/3334480.3375221

Gao Y, Song XN, Zhang N, Liu HH, Hu JZ, Du XZ, Song GH and Liu S, 2025. Exploring the diagnostic potential of IL1R1 in depression and its association with lipid metabolism. Frontiers in Pharmacology, 16: 1519287. https://doi.org/10.3389/fphar.2025.1519287

Guo L, Li Z, Qian K, Ding W and Chen Z, 2024. Bank credit risk early warning model based on machine learning decision trees. Journal of Economic Theory and Business Management, 1(3): 24–30. https://doi.org/10.5281/zenodo.11627011

He T, Stanković A, Niforatos E and Kortuem G, 2025. DesignMinds: Enhancing video-based design ideation with a vision-language model and a context-injected large language model. Proceedings of the 7th ACM Conference on Conversational User Interfaces (CUI ’25), 1–15. https://doi.org/10.1145/3719160.3736633

Kang Y, Wang C, Feng Y, Touya G and Kim J, 2025. Artificial intelligence for cartography and maps. In: GeoAI and Human Geography: The Dawn of a New Spatial Intelligence Era. Cham: Springer Nature Switzerland, pp. 219–237. https://doi.org/10.1007/978-3-031-87421-5_16

Kristić M, Zakarija I, Škopljanac-Mačina F and Car Ž, 2025. Machine learning for adaptive accessible user interfaces: Overview and applications. Applied Sciences, 15(23): 12538. https://doi.org/10.3390/app152312538

Li W, Zhou Y, Luo S and Dong Y, 2022. Design factors to improve the consistency and sustainable user experience of responsive interface design. Sustainability, 14(15): 9131. https://doi.org/10.3390/su14159131

Liu Y, Tan H, Cao G and Xu Y, 2024. Enhancing user engagement through adaptive UI/UX design: A study on personalized mobile app interfaces. Computer Science & IT Research Journal, 5(8): 1942–1962. https://doi.org/10.51594/csitrj.v5i8.1457

López-Galisteo AJ and  Borrás-Gené O, 2025. The creation and evaluation of an AI assistant (GPT) for educational experience design. Information 16(2): 117. https://doi.org/10.3390/info16020117

Paul RR, Paul SK and Hamid ME, 2022. A 2D convolution neural network based method for human emotion classification from speech signal.  25th International Conference on Computer and Information Technology (ICCIT), 72-77. https://doi.org/10.1109/ICCIT57492.2022.10054811

Paul SK, Miah ASM, Rahman MT, Hossain MM, Hossain MM, Rahim MA, Hamid MEH, Islam MS and Shin J, 2025. IoT-based real-time medical-related human activity recognition using skeletons and multi-stage deep learning for healthcare. Computers, Materials and Continua, 84(2): 2513–2530. https://doi.org/10.32604/cmc.2025.063563

Paul SK, Walid MA, Paul RR, Uddin MJ, Rana MS, Devnath MK, Dipu IR and Haque MM, 2024. An Adam-based CNN and LSTM approach for sign language recognition in real time for deaf people. Bulletin of Electrical Engineering and Informatics, 13: 499–509. https://doi.org/10.11591/eei.v13i1.6059

Paul SK, Zisa AA, Walid MAA, Zeem Y, Paul RR and Haque MM, 2023. Human fall detection system using long-term recurrent convolutional networks for next-generation healthcare: A study of human motion recognition. 14th International Conference on Computing Communication and Networking Technologies (ICCCNT), 1-7. https://doi.org/10.1109/ICCCNT56998.2023.10308247

Shokrizadeh A, Tadjuidje BB, Kumar S, Kamble S and Cheng J, 2025. Dancing with chains: Ideating under constraints with UIDEC in UI/UX design. Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI 2025), 1106: 1-23. https://doi.org/10.1145/3706598.3713785

Sreedevi S, Rajashekar A, Basha SA and Prathibha L, 2025. User behavior prediction with deep learning: An evaluation of CNN, LSTM, RNN, and hybrid models. Proceedings of the 3rd International Conference on Integrated Circuits and Communication Systems, 1–7. https://doi.org/10.1109/icicacs65178.2025.10967848

Stefano DM, Rob M and Elif O, 2024. Soundstorm, a collaborative ideation game for sound-driven design. Proceedings of the 19th International Audio Mostly Conference: Explorations in Sonic Cultures, 479–486. https://doi.org/10.1145/3678299.3678348

Yanez F and Nobre C, 2024. User-adaptive visualizations: An exploration with GPT-4. MLVis: Machine Learning Methods in Visualisation for Big Data, 1–5. https://doi.org/10.2312/mlvis.20241126

Zhan X, Xu Y and Liu Y, 2024. Personalized UI layout generation using deep learning: An adaptive interface design approach for enhanced user experience. Journal of Artificial Intelligence General Science, 6: 463–478. https://doi.org/10.60087/jaigs.v6i1.270

Zhang XY, Lin H, Deng Z, Siegel M, Miller EK and Yan G, 2026. Data-driven ANN-based visual decoding enables unsupervised functional alignment. Communications Biology, 9: 210. https://doi.org/10.1038/s42003-025-09486-7

CrossMark Update
Crossmark
Article Metrics
[stm-calc id="1576"]