Personalized UI/UX design has become increasingly important for modern digital systems, enhancing usability, accessibility, and user satisfaction across diverse user groups (Liu et al., 2024). Traditional static user interface designs often neglect individual differences in user preferences, visual comfort, and interaction behaviour, resulting in cognitive overload and reduced engagement. These challenges are particularly evident in complex applications where users interact with multiple features under varying conditions. To address these limitations, machine learning–based UI personalization has gained significant attention, as it allows interfaces to adapt dynamically based on user behaviour and preferences (Zhan et al., 2024). By analysing interaction patterns, adaptive systems can modify visual elements and interface structures to better suit individual users. Personalized UI/UX design aims to address these issues by dynamically adjusting interface elements based on user learning. It is now possible to analyse user interactions and visual interface properties to create adaptive, user-cantered designs. This paper introduces DesignMind ML, a machine learning–driven framework that personalizes UI/UX by adapting color schemes and simplifying interface features. The system continuously learns from user interactions and visual UI data, improving usability, accessibility, and user satisfaction over time (Li et al., 2022).
The major contributions of this work include the development of a multi-model learning architecture that integrates artificial neural networks (ANN) for instruction processing and decision-making, convolutional neural networks (CNN) for UI screenshot and visual analysis, and gated recurrent units (GRU) with long short-term memory (LSTM) for modelling temporal user behaviour patterns. Additionally, this study introduces the DesignMind ML framework, a machine learning-based personalized UI/UX system that adapts interface design based on individual user behaviour and preferences. Furthermore, the framework features adaptive color and feature simplification mechanisms, allowing for dynamic adjustments of color schemes, contrast levels, and feature visibility to reduce cognitive load and enhance overall accessibility and user experience.
UIn contrast to existing studies that primarily optimize prediction accuracy for isolated tasks, this work focuses on creating an end-to-end adaptive UI/UX decision-making framework. The proposed DesignMind-ML model integrates instruction-level intent learning, visual interface analysis, and temporal user behaviour modelling to facilitate dynamic and personalized interface adaptation. Consequently, this study emphasizes holistic adaptability and decision relevance rather than merely achieving peak accuracy on a single predictive component (Guo et al., 2024).
Despite significant progress in machine learning–based UI optimization, most existing studies focus on isolated tasks such as color palette generation, layout prediction, or behavioural forecasting. However, real-world UI personalization requires the simultaneous interpretation of visual structure, evolving user interaction behaviour, and actionable adaptation decisions within a unified framework. The absence of such integrated systems limits the practical deployment of adaptive interfaces. Therefore, this study addresses the challenge of designing a comprehensive multimodal decision-making architecture capable of generating behaviour-aware, visually optimized, and context-sensitive UI adaptations in real time (López-Galisteo and Borrás-Gené, 2025).
The remainder of this paper is organized as follows. Section 1 introduces the motivation, background, and major contributions of the proposed DesignMind-ML framework for adaptive UI/UX personalization. Section 2 reviews relevant literature on machine learning–based UI design, color adaptation, behavioural modelling, and multimodal personalization approaches. Section 3 describes the proposed methodology, including the multi-model architecture that integrates ANN, CNN, and GRU–LSTM, along with pre-processing, augmentation strategies, and model fusion mechanisms. Section 4 presents a detailed description of the dataset, including the multimodal data structure, feature sets, and UI/UX personalization attributes used for training and evaluation. Section 5 reports the experimental setup, performance evaluation metrics, comparative analysis, and discussion of results across individual and hybrid models. Finally, Section 6 concludes the paper with key findings, limitations, and future research directions for extending the DesignMind-ML framework toward real-time and generative UI adaptation.
2. Literature review
Recent advances in machine learning have significantly influenced UI/UX design, particularly in personalized and adaptive interfaces. Several research directions are relevant to this work.
Kang et al. (2025) proposed an autoencoder-based method for generating color palettes from mixed color images, focusing on aesthetic combinations. He also reviewed color palette generation techniques from digital images, providing comprehensive guidelines for visual appeal. While these works highlight effective color generation using neural networks, they primarily address static visual aesthetics and do not incorporate user-specific interaction or behaviour-based personalization, leaving a gap in adaptive UI design.
Gao et al. (2025) explored accurate human behaviour simulation using fine-tuned large language models, capturing realistic sequential patterns. Similarly, Newline Research employed transformers for sequential user behaviour modelling, emphasizing the prediction of user actions over time. However, these studies focus on predictive modelling rather than direct integration with adaptive UI adjustments, which limits their practical application in dynamic interface personalization.
Zhang et al. (2024) proposed a deep learning-based interface generation tree algorithm for efficient and aesthetically pleasing UI design. They conducted a systematic review on predicted color output in UI/UX design using machine learning, highlighting the potential of CNNs for analysing visual interface elements. Despite this progress, these approaches do not fully link UI visual analysis with personalized, behaviour-driven adaptation, which is crucial for real-time UI modification.
Recent studies published in MDPI journals between 2024 and 2025 have explored multimodal deep learning frameworks for automated usability evaluation, integrating both visual and interaction data to enhance user interface (UI) analysis. While promising, these systems often lack the integration of multiple learning models to simultaneously analyse interface visuals, user behaviour, and decision instructions for adaptive personalization. Yanez et al. (2025) presented a comprehensive study on user-adaptive visualizations, demonstrating how machine learning techniques can dynamically adjust visual elements based on user interaction history, expertise level, and cognitive load. Their work highlights the importance of adaptive visual representations in improving usability and engagement; however, it primarily focuses on visualization adaptation and does not incorporate temporal user behaviour modelling or decision-level UI personalization.
Alowidi (2025) proposed a multimodal deep learning framework for automated usability evaluation by jointly analysing UI screenshots and user interaction logs. This study shows that combining visual and behavioural data can effectively assess interface usability. Nevertheless, the framework was designed for post-evaluation purposes and does not support continuous, real-time UI adaptation driven by evolving user preferences.
Sreedevi et al. (2025) conducted a comparative analysis of deep learning architectures, including CNNs, RNNs, LSTM, and hybrid models, for user behaviour prediction. Their findings indicate that hybrid sequential models outperform individual architectures in capturing complex interaction patterns. Despite strong predictive performance, the study does not explore how predicted behavioural insights can be translated into actionable UI/UX personalization strategies.
From the reviewed studies, it is evident that machine learning plays a critical role in advancing adaptive UI/UX systems through visual adaptation, multimodal usability analysis, and user behaviour modelling. Existing works demonstrate strong capabilities in visualization adaptation, usability evaluation, or behaviour prediction when considered independently. However, a clear research gap exists in unifying these dimensions into a single framework that simultaneously analyses visual interface characteristics, learns temporal user behaviour, and converts these insights into real-time UI adaptation decisions (Kristić et al., 2025). This observation motivates the proposed DesignMind-ML framework, which integrates visual analysis, behavioural modelling, and instruction-level decision learning to enable comprehensive and intelligent UI/UX personalization.
Moreover, existing studies focus either on color generation, behaviour prediction, or visual analysis independently. Few works combine these dimensions into a unified framework for adaptive and personalized UI/UX. The proposed DesignMind-ML addresses this gap by integrating ANN for instruction mapping, CNN for UI screenshot analysis, and GRU–LSTM for user behaviour modelling. This integration enables dynamic UI adaptation, adaptive color schemes, and feature simplification based on real user interactions—a novel contribution in the field.
A structured analysis of existing literature reveals three dominant research directions: (1) visual aesthetic optimization using CNN-based color or layout generation, (2) sequential user behaviour modelling using RNN, LSTM, or transformer architectures, and (3) multimodal usability evaluation frameworks. However, these approaches are typically evaluated independently and rarely integrated into a unified personalization decision engine. Furthermore, most systems prioritize predictive accuracy over real-time UI adaptation capability. This fragmentation in the literature motivates the development of the proposed DesignMind-ML framework, which unifies visual, behavioural, and instruction-level learning into a single adaptive personalization pipeline.
3. Methodology
3.1 System overview
The proposed DesignMind-ML system utilizes a multi-model adaptive learning architecture that integrates three complementary machine learning models: Artificial Neural Network (ANN), Convolutional Neural Network (CNN), and GRU–LSTM (Paul et al., 2025). Its objective is to generate instruction-level user interface (UI) adaptation decisions by jointly analyzing user interaction behavior, visual interface properties, and temporal usage patterns. As illustrated in Figure 1, the overall workflow begins with the collection of user interaction logs, UI screenshots, and design preference inputs. The ANN generates instructions for color and UI controls, the CNN evaluates the visual layout and aesthetic quality, and the GRU–LSTM models behavior evolution over time. These outputs were combined to produce a final adaptive UI configuration, enabling real-time personalization.

Figure 1. Workflow diagram of DesignMind-ML.
3.2 Data preprocessing
3.2.1 Interaction extraction
Each dataset sample represents a complete user session, which includes click events, navigation paths, scrolling actions, feature usage, and color selections. Only valid and complete sessions were retained; corrupted, incomplete, or noisy logs were removed to preserve data integrity.
3.2.2 Sequence normalization
User sessions vary significantly in length. To facilitate batch-based deep learning, short interaction sequences were padded with zeros, while long sequences were uniformly down-sampled. This preprocessing step generated fixed-length tensors, ensuring compatibility with the GRU–LSTM network.
3.2.3 Feature normalization
To reduce scale imbalance and inter-user variability, min-max scaling was applied to numeric interaction features, while user-baseline normalization was used to adjust for individual interaction styles. These preprocessing steps improved model convergence and enhanced generalization across diverse user groups.
3.2.4 Noise removal
Missing or inconsistent interaction values were corrected using linear interpolation to estimate missing time steps, along with moving average filtering to smooth out abrupt fluctuations. These preprocessing techniques resulted in cleaner and more reliable behavioral signals for analysis.
3.3 Data augmentation
To enhance model robustness and minimize overfitting, several augmentation strategies were implemented.
3.3.1 Temporal augmentation
User behavior sequences were enhanced through random time warping, along with event duplication and removal. These techniques simulate natural variations in user speed and interaction style, thereby improving the model’s robustness.
3.3.2 Visual augmentation
UI screenshots were enhanced through random adjustments in brightness and contrast, color jittering, and minor cropping and scaling. These transformations increase the CNN’s robustness to variations in themes and layout designs.
3.3.3 Behavioral sequence shuffling
Partial reordering of the navigation steps was implemented to simulate alternative user exploration paths, thereby enhancing the generalization of sequential behavior learning.
3.4 ANN-Based instruction learning
The ANN learns direct mappings between user interaction features and UI adaptation decisions by modeling trends in color preferences, the frequency of feature usage, and the relationship between interaction density and UI complexity. Based on these learned patterns, the ANN generates instruction signals that control UI elements such as color themes, brightness, contrast, font size, and feature visibility, forming the system’s decision backbone.
3.5 CNN-based UI screenshot analysis
The CNN analyzes the visual structure and aesthetics of UI screenshots by learning about color harmony, contrast quality, layout density, and clutter, as well as the spatial organization of UI components. These visual features allow the system to identify issues such as poor contrast, overcrowding, or visual imbalance, ensuring that UI personalization was both functional and aesthetically optimized.
3.6 GRU–LSTM behavior modeling
The hybrid GRU–LSTM network captures temporal user behavior by modeling short-term interaction changes with the GRU and long-term preference evolution with the LSTM. This approach enables the system to identify habit formation, prioritize features, and adapt to evolving usability needs, facilitating continuous and intelligent UI personalization.
3.7 Model fusion strategy
The final decision for UI adaptation was made through decision-level fusion of the three models. The ANN generates instruction signals, the CNN provides visual quality scores, and the GRU-LSTM supplies behavioral preference weights. These outputs are integrated using a weighted mechanism to produce the final UI control commands. This fusion guarantees that UI personalization was behavior-aware, visually optimized, and context-sensitive, making DesignMind-ML a truly intelligent and adaptive UI/UX framework. Let (A), (C), and (G) represent the normalized output scores from the ANN, CNN, and GRU-LSTM models, respectively. The final UI adaptation decision (DDD) was computed using weighted fusion (Fucs et al., 2020),

Where, α+β+γ=1. The weights were empirically optimized through validation experiments to balance instruction confidence, visual quality assessment, and behavioral consistency. This weighted integration ensures robust decision-level personalization in the face of multimodal uncertainty.
4. Dataset description
4.1 Overview of the dataset
The DesignMind-ML framework was trained and evaluated using a self-constructed multimodal UI/UX personalization dataset comprising 3,000 samples, each representing a complete user interaction session. This dataset integrates visual interface data, user behavior logs, and individual color preference information (Figure 2). It was collected directly from participants actively using the proposed system under real interaction conditions. The authors curated and organized all sessions to create a unified benchmark for this study.
The dataset was designed to support three complementary learning tasks: color and instruction learning, visual UI analysis, and temporal user behavior modeling. This multimodal structure enables the system to jointly analyze how users interact with interfaces, perceive visual layouts, and respond to color schemes, facilitating accurate and adaptive UI personalization. As shown in Figure 1 and listed in Table I, the dataset allows the system to map complex relationships between a user’s physical constraints and preferred UI settings, generating the “Instruction Signals” necessary for real-time interface adaptation.

Figure 2. UI personalization dataset for 3000 datasets.
The dataset comprises 3,000 complete user interaction sessions collected from voluntary participants under controlled experimental conditions. Each session represents a unique UI interaction instance. The dataset was divided into training (70%), validation (15%), and testing (15%) subsets. Class distributions were monitored to ensure balanced representation across adaptation categories. While the dataset size was moderate, it allows for preliminary validation of multimodal personalization and underscores the necessity for larger-scale real-world deployment studies.
4.2 ANN dataset for color and instruction learning
The ANN dataset was designed to learn the relationship between visual design parameters and user preference responses. Each of the 3,000 samples includes UI color palettes, brightness and contrast levels, theme type (light or dark), readability indicators, and user preference labels. These features enable the ANN to model how different color combinations influence usability, comfort, and accessibility. Based on this learning, the network outputs instruction signals that guide real-time UI changes, such as color adaptation, contrast adjustment, and the activation or suppression of features. This allows the ANN to capture decision-level mappings between user preferences and UI modifications (Zhang et al., 2026).
4.3 CNN dataset for UI screenshot analysis
The CNN was trained on a labeled dataset of UI screenshots collected from various application types, with each screenshot annotated for layout type, visual complexity level, color harmony, component density, and contrast quality. The dataset encompasses a range of interface styles, from cluttered layouts to minimal designs and accessibility-optimized views. The CNN automatically extracts spatial and visual features, including alignment, color distribution, grouping of UI components, and visual balance. This capability enables the system to identify poorly designed or visually overloaded interfaces and facilitates intelligent UI refinement (Paul et al., 2022).
4.4 GRU–LSTM dataset for user behavior modeling
To model the evolution of user preferences over time, the GRU–LSTM module utilizes time-stamped interaction logs. Each behavioral sample includes click sequences, navigation paths, scrolling behavior, feature usage frequency, and session duration (Paul et al., 2023). The hybrid GRU–LSTM structure enables the system to capture short-term interaction patterns through the GRU while identifying long-term usage trends with the LSTM. This approach facilitates accurate modeling of habit formation, feature prioritization, and evolving user needs, which are essential for adaptive UI personalization.
4.5 UI/UX personalization feature set
To translate learned patterns into actionable UI changes, the dataset incorporates a structured UI/UX feature set (Table 1). These features represent the controllable elements of the interface used for personalization (Barrett et al., 2024).
Table 1. Selected UI/UX personalization features.
| Feature ID | UI/UX personalization feature |
| F01 | Adaptive color theme adjustment |
| F02 | Brightness and contrast optimization |
| F03 | Dark mode / light mode switching |
| F04 | Font size and readability adjustment |
| F05 | Simplified navigation menu |
| F06 | Feature usage–based interface pruning |
| F07 | Highlighting frequently used functions |
| F08 | Hiding rarely used features |
| F09 | Personalized layout arrangement |
| F10 | Accessibility-focused UI adaptation |
These features enable DesignMind-ML to dynamically adjust both the visual appearance and functional complexity of interfaces, ensuring that each adapts to the user’s behavioral patterns, visual comfort, and accessibility needs. Figure 3 illustrates the DesignMind-ML Pro system interface, which offers a machine learning–driven environment for adaptive UI/UX design. Users specify the application type, target age group, and the number of desired color palettes, then upload a UI screenshot for visual analysis by the CNN model. Based on this input, the system generates multiple personalized UI color variations by assigning optimized colors to components such as the header, background, call-to-action buttons, footer, and accent elements, using the ANN-based color adaptation model (Shokrizadeh et al., 2025). A feedback module collects user responses to these designs, and this information was processed by the GRU–LSTM model to continuously refine and improve future personalization decisions.

Figure 3. Sample implementation of DesingMind ML.
5. Results and Discussion
5.1 Experimental setup
The experimental evaluation of the DesignMind-ML framework focused on three key learning components including the ANN-based interaction model, the CNN-based UI screenshot analysis model, and the GRU-LSTM-based user behavior model (He et al., 2025). The complete dataset, consisting of 3,000 samples, was divided into training, validation, and testing subsets to ensure an unbiased performance assessment. Each model was trained independently before being integrated into a hybrid system. The performance of each model and the final system was evaluated using accuracy, ROC curves, and confusion matrices to assess both classification reliability and personalization quality (Stefano et al., 2024).
5.2 Evaluation metrics
To evaluate both the individual models and the hybrid system, the following metrics were used (Paul et al., 2025). Accuracy reflects the overall proportion of correct predictions made by the model across all instances. Precision measures the reliability of the predicted UI adaptation decisions by determining the proportion of predicted positive cases that are actually correct. Recall evaluates the model’s ability to identify true user preferences by measuring the number of relevant instances that were correctly captured. The F1-score, defined as the harmonic mean of precision and recall, provides a balanced evaluation when both false positives and false negatives are important considerations. The ROC–AUC metric assesses the model’s ability to discriminate between classes, indicating how effectively the classifier separates positive and negative cases and reflecting the overall robustness of the classification performance.
These metrics ensure a comprehensive evaluation of both personalization accuracy and decision reliability. The proposed system was assessed using the ANN-based interaction model, the CNN-based UI screenshot analysis model, and the GRU+LSTM behavioral observation model. The quantitative performance of these models, along with the final hybrid system, is summarized in Tables 2–5.
Table 2. Classification report of ANN model.
| Class | Precision | Recall | F1-score | Support |
| 0 | 1.000 | 1.000 | 1.000 | 52 |
| 1 | 1.000 | 1.000 | 1.000 | 149 |
| 2 | 1.000 | 1.000 | 1.000 | 159 |
| 3 | 1.000 | 1.000 | 1.000 | 153 |
| 4 | 1.000 | 1.000 | 1.000 | 143 |
| 5 | 1.000 | 1.000 | 1.000 | 146 |
| 6 | 1.000 | 1.000 | 1.000 | 155 |
| 7 | 1.000 | 1.000 | 1.000 | 146 |
| 8 | 1.000 | 1.000 | 1.000 | 150 |
| Accuracy | 1.000 | 1253 | ||
| Macro avg | 1.000 | 1.000 | 1.000 | 1253 |
| Weighted avg | 1.000 | 1.000 | 1.000 | 1253 |
Table 3. Brand-specific classification report of CNN model.
| Class | Precision | Recall | F1-score | Support |
| text | 1.00 | 0.07 | 0.14 | 94 |
| image | 0.29 | 1.00 | 0.46 | 91 |
| rectangle | 1.00 | 0.19 | 0.32 | 42 |
| group | 0.80 | 0.35 | 0.49 | 173 |
| Accuracy | 0.42 | 400 | ||
| Macro avg | 0.77 | 0.40 | 0.35 | 400 |
| Weighted avg | 0.75 | 0.42 | 0.38 | 400 |
Table 4. Classification report of GRU + LSTM model.
| Class | Precision | Recall | F1-score | Support |
| 0 | 0.981 | 0.986 | 0.984 | 212 |
| 1 | 0.981 | 0.981 | 0.981 | 156 |
| 2 | 0.961 | 1.000 | 0.980 | 74 |
| 3 | 0.985 | 0.992 | 0.989 | 133 |
| 4 | 0.995 | 0.973 | 0.984 | 225 |
| Accuracy | 0.984 | 800 | ||
| Macro avg | 0.981 | 0.986 | 0.986 | 0.983 |
| Weighted avg | 0.984 | 0.984 | 0.984 | 0.984 |
Table 5. Classification report of final model.
| Class | Precision | Recall | F1-score | Support |
| 0 | 0.64 | 0.36 | 0.80 | 50 |
| 1 | 0.54 | 0.80 | 0.72 | 50 |
| Accuracy | 0.78 | 100 | ||
| Macro avg | 0.60 | 0.58 | 0.78 | 1253 |
| Weighted avg | 0.60 | 0.58 | 0.78 | 1253 |
5.3 Performance of the ANN model
The ANN model learns user interaction preferences and generates instruction-level UI adaptation decisions, including color selection, contrast control, and feature prioritization. Figure 4 illustrates the training history of the ANN model, demonstrating stable convergence and increasing accuracy over epochs.

Figure 4. Training and validation curve for ANN model.
Some misclassification was observed among users with similar feature usage behavior, which was expected due to overlapping preference patterns. The ROC curves of the ANN model presented in Figure 5 demonstrate near-perfect class separation, with AUC values approaching 1.00. This confirms that the ANN effectively distinguishes between user interaction categories for adaptive UI instruction generation.

Figure 5. ROC curve of ANN model.
The ANN achieved an overall accuracy of approximately 100%, suggesting that interaction-based features offer valuable insights for UI personalization. The confusion matrix of the ANN model, presented in Figure 6, exhibits strong diagonal dominance, indicating that most user interaction patterns were accurately classified.

Figure 6. Confusion matrix of the ANN model.
5.4 Performance of the CNN model
The CNN model analyzes UI screenshots and learns visual features such as layout structure, color harmony, contrast quality, and visual complexity. Figure 7 illustrates the training behavior of the CNN, showing that both training and validation accuracy steadily improve, indicating effective learning of visual patterns.

Figure 7. Training and validation curve for CNN model.
The ROC curves in Figure 8 further confirm the CNN’s ability to distinguish between UI layouts and visual styles, demonstrating high AUC values across most classes. These results validate the CNN’s effectiveness in extracting spatial and aesthetic features for adaptive UI refinement.

Figure 8. ROC curve of CNN model.
The confusion matrix in Figure 9 shows that most UI screenshots were accurately classified based on their visual and layout characteristics. However, there was some minor confusion between visually similar designs that have similar color schemes or layout densities.

Figure 9. Confusion matrix of the CNN model.
5.5 Performance of the GRU–LSTM model
The GRU–LSTM model effectively captures the evolution of sequential user behavior and preferences over time. The training history presented in Figure 10 demonstrates stable learning and strong convergence.

Figure 10. Training and validation curve for GRU+LSTM model.
The confusion matrix in Figure 11 shows that most interaction sequences are correctly classified, with only a few misclassifications occurring primarily in sessions that exhibit very similar navigation flows.

Figure 11. Confusion matrix of the GRU+LSTM model.
The ROC curves presented in Figure 12 show AUC values close to 1.0, indicating a high level of separability among different behavior classes. This confirms that temporal modeling significantly enhances the system’s ability to understand evolving user preferences and interaction habits.

Figure 12. ROC curve of GRU+LSTM model.
5.6 Performance of the hybrid DesignMind-ML system
The final DesignMind-ML system integrates the outputs of the ANN, CNN, and GRU–LSTM models to generate behavior-aware, visually optimized UI adaptation instructions. The ROC curve of the final model, presented in Figure 13, indicates a very high true positive rate alongside a low false positive rate, confirming its strong overall predictive performance. The integrated system benefits from multimodal learning, as interaction data, visual features, and temporal behavior patterns collectively contribute to more precise and stable personalization.

Figure 13. ROC curve of combine model.
The confusion matrix of the hybrid system, shown in Figure 14, demonstrates significantly higher classification accuracy than any individual model. Most predictions fall along the diagonal, indicating a reliable identification of user preferences and appropriate UI configurations.

Figure 14. Confusion matrix of the hybrid model.
5.7 Performance discussion
The experimental results clearly demonstrate that while ANN and CNN models provide useful insights into user interaction and UI visuals individually, the GRU–LSTM model was crucial for capturing long-term behavior trends. However, the highest performance was achieved when all three models are combined within the DesignMind-ML hybrid framework. This integration of behavioral, visual, and interaction-based information allows the system to reduce ambiguity, improve decision reliability, and produce more meaningful UI adaptations. Consequently, the proposed approach enhances usability, reduces cognitive load, and increases overall user satisfaction in personalized UI/UX environments.
Figures 4, 7, and 10 present the training and validation curves for the ANN, CNN, and GRU–LSTM models used in the DesignMind-ML framework. These curves illustrate that the models gradually enhance their learning performance as training progresses. The ANN model exhibits stable convergence while learning user interaction preferences. The CNN model also demonstrates steady learning in extracting visual features from UI screenshots, although slight fluctuations occur due to the diversity of interface designs. Similarly, the GRU–LSTM model shows consistent improvement in capturing sequential user behavior patterns. Overall, the close relationship between the training and validation curves indicates that the models learn effectively and maintain good generalization without significant overfitting.
Figures 5, 8, 12, and 13 illustrate the ROC curves for the ANN, CNN, GRU–LSTM, and the final hybrid model. These ROC curves assess how well each model distinguishes between different classes of user preferences and UI adaptation decisions. The ANN and GRU–LSTM models show strong separation between classes, indicating reliable prediction of user interaction and behavioral patterns. The CNN model also demonstrates the ability to identify visual interface characteristics, though its performance varies slightly due to complex UI layouts. The ROC curve for the combined model confirms that integrating multiple learning models enhances the overall robustness of the system for adaptive UI decision-making.
Figures 6, 9, 11, and 14 present the confusion matrices for the ANN, CNN, GRU–LSTM, and the final hybrid model. These matrices illustrate the accuracy of each model in classifying different categories of user preferences, UI components, and behavioral patterns. Most predictions are concentrated along the diagonal, indicating correct classifications in the majority of cases. Minor misclassifications primarily occur between visually similar interface elements or closely related interaction behaviors. Overall, the confusion matrices confirm that the proposed system reliably identifies user preferences and generates appropriate UI adaptation decisions.
The exceptionally high accuracy observed in the ANN component suggests that the interaction-based instruction mapping task may be less complex than multimodal fusion. However, the comparatively lower accuracy of the CNN (42%) indicates challenges in generalizing visual layout classification across diverse UI styles. When integrated into the hybrid framework, overall performance decreased to 76%, reflecting the increased complexity of addressing a multi-objective personalization problem. This result highlights that real-world adaptive UI systems inherently involve greater uncertainty compared to isolated predictive models.
Table 6. Performance summary across models
| Model | Purpose | Performance highlights |
| ANN | Learns user interaction preferences | Generated adaptive color palettes accurately; captured feature usage trends; achieved high personalization accuracy |
| CNN | Analyzes UI screenshots | Extracted layout structure, color harmony, and visual complexity; improved visual adaptation decisions |
| GRU + LSTM | Models’ user behavior over time | Captured short-term and long-term interaction patterns; robust against variable-length sessions |
| Hybrid System | Combines ANN, CNN, and GRU–LSTM outputs | Achieved high overall accuracy; ROC curve shows strong True Positive Rate and low False Positive Rate; improved usability and user satisfaction |
| Overall | Adaptive UI/UX personalization | Dynamic UI adjustments reduced cognitive load, improved task efficiency, and enhanced user experience |
Table 6 presents a comparative performance overview of the proposed models within the adaptive UI/UX personalization framework. The ANN effectively learned user interaction preferences and generated adaptive color palettes with high personalization accuracy. The CNN successfully analyzed UI screenshots by extracting layout structure, color harmony, and visual complexity, thereby enhancing visual adaptation decisions. The GRU–LSTM model captured both short-term and long-term user behavior patterns, demonstrating robustness across variable-length sessions. The hybrid system, which integrates ANN, CNN, and GRU–LSTM outputs, achieved the highest overall accuracy, with strong ROC characteristics indicating a high True Positive Rate and a low False Positive Rate. Overall, the integrated framework enabled dynamic UI adjustments that reduced cognitive load, improved task efficiency, and enhanced the overall user experience.
The results of this study demonstrate that the proposed DesignMind-ML framework can effectively support adaptive UI/UX personalization by combining behavioral, visual, and interaction-based learning models. The strong performance of the GRU–LSTM model in capturing temporal user behavior aligns with previous studies that have reported the effectiveness of recurrent architectures for modeling sequential interaction patterns and predicting user activity. Similarly, the CNN model exhibited the capability to analyze visual interface structures, such as layout complexity and color distribution, consistent with earlier research emphasizing the importance of convolutional networks for UI visual feature extraction and interface evaluation. The integration of multimodal learning in the proposed framework also supports findings from recent studies that stress the value of combining behavioral data with visual interface analysis to improve adaptive interface systems. However, unlike many previous works that focus on single tasks such as color generation, usability evaluation, or behavior prediction, this study proposes an integrated decision-making architecture that simultaneously considers visual structure, user interaction behavior, and the evolution of temporal preferences. Although the overall system accuracy was lower than that of some single-task models reported in the literature, the proposed approach addresses the more complex problem of real-time UI personalization. These findings suggest that multimodal frameworks can provide a more realistic and practical foundation for adaptive UI/UX systems while highlighting the need for larger datasets and improved multimodal fusion strategies in future research.
5.8 Performance summary across models
Table 7 compares the proposed DesignMind-ML framework with existing approaches reported in the literature. Previous studies utilizing ANN, GRU–LSTM, and multimodal deep learning frameworks achieved performance levels ranging from 90% to 92%. In contrast, the proposed hybrid model (ANN + CNN + GRU–LSTM) attained an overall accuracy of 76%. Although this performance was comparatively lower, DesignMind-ML emphasizes comprehensive adaptive UI/UX personalization by integrating user preference learning, visual layout analysis, and temporal behavior modeling within a unified framework. This focus underscores the system’s broader applicability and real-world adaptability beyond mere benchmark accuracy comparisons.
While the overall accuracy of DesignMind-ML was lower than that of some task-specific models, this outcome reflects the increased complexity of addressing a multi-stage, multi-modal personalization problem. Unlike previous approaches that evaluate a single learning objective, the proposed framework jointly processes behavioral sequences, visual interface features, and user instructions, which introduces the realistic decision uncertainty inherent in adaptive UI systems.
Table 7. Comparative performance with existing methods.
| Citations | Primary method | Performance (%) |
| Kristić et al. (2025) | ANN (artificial neural network) | 90% |
| Du et al. (2024) | GRU + LSTM | 91% |
| Alowidi (2025) | Multimodal deep learning framework | 92% |
| This study | ANN + CNN + GRU–LSTM (DesignMind-ML) | 76% |
6. Conclusions
This study developed a multimodal machine learning framework for adaptive UI and UX personalization that integrates artificial neural networks, convolutional neural networks, and GRU–LSTM models to analyze user interaction behavior, visual interface characteristics, and temporal usage patterns. The findings demonstrate that combining behavioral analysis, visual interface evaluation, and instruction-level learning enables more effective and context-aware interface adaptation than single-model approaches. The proposed framework supports intelligent UI adjustments, such as adaptive color selection and feature simplification, which can improve usability, reduce cognitive load, and enhance the overall user experience in digital systems. The results confirm that multimodal learning can provide a practical foundation for behavior-driven interface personalization. However, several challenges remain, including limited dataset diversity and the need for more advanced fusion strategies to strengthen cross-modal learning. Future studies should explore larger and more diverse user datasets, real-time deployment environments, and the integration of generative interface design models to enable fully automated and scalable adaptive UI systems.