- Systematic Review
- Open access
- Published:
A review on exploration–exploitation trade-off in psychiatric disorders
BMC Psychiatry volume 25, Article number: 420 (2025)
Abstract
Balancing exploration and exploitation is a crucial aspect of adaptive decision-making, but psychiatric disorders can disrupt this balance in various ways, shedding light on their neurocognitive roots and guiding targeted interventions. In this systematic review, we aimed to delineate potential exploration–exploitation impairments across psychiatric disorders. Through a thorough search on PubMed, we identified forty-six relevant studies employing tasks probing exploration–exploitation balances, which we synthesized to reveal distinct patterns. These disorders are clustered into three categories: addictive patterns, emotional/cognitive disturbances, and neurological (neurodevelopmental and neurodegenerative) disorders. Our findings show that anxiety and mood disorders often enhance exploratory behaviors, while depression impact decision stability and reward sensitivity. In contrast, schizophrenia, OCD (Obsessive–Compulsive Disorder), and ADHD (Attention-Deficit/Hyperactivity Disorder) are characterized by excessive switching and difficulties in balancing exploration and exploitation, leading to impaired learning and adaptability. Additionally, disorders with addictive-like features disrupt optimal decision-making strategies by either heightening exploration or causing maladaptive persistence, thus skewing the balance away from effective decision-making. Individuals exhibiting addiction-like or compulsive behaviors often demonstrate imbalances in the explore-exploit trade-off, resulting in suboptimal decision-making characterized by reduced exploration, flawed foraging strategies, and impulsive or perseverative choices despite adverse outcomes. This suggests that such disorders may originate from dysfunctional foraging processes applied to decision-making. In sum, different patterns of exploration–exploitation balance in different disorders are crucial in understanding the difficulties in learning and decision making of neuropsychiatric disorders. This suggests that such disorders may stem from dysregulated decision-making processes, where uncertainty plays a central role. Dysfunctions in dopaminergic and noradrenergic pathways appear to disrupt the brain's representation of uncertainty, thereby altering exploratory behavior. In sum, the varying patterns of exploration–exploitation balance across different disorders are critical for understanding the challenges in learning and decision-making associated with neuropsychiatric conditions.
Introduction
Throughout life, humans are faced with countless decisions, ranging from simple choices about mundane matters to more complex dilemmas with serious implications. To thrive in dynamic environments with fluctuating rewards, individuals must skillfully balance exploiting known options with exploring uncertain alternatives. This delicate trade-off between exploitation and exploration represents a pivotal facet of mature decision-making [72]. In naturalistic foraging scenarios, rewards diminish as resources are consumed, necessitating decisions about when to leave depleted patches in search of new bounty. The omnipresent question is when to risk switching strategies to maximize one's rewards [23, 62]. Thus, properly balancing the exploitation of guaranteed resources and exploration of uncertain options is critical for survival and success [7]. The exploration–exploitation trade-off is the basic rule of decision-making that people are confronted with when they have to decide between staying with proven exploitative opportunities and exploring new untapped ones with the promise of higher returns in the future. Exploration is seeking new knowledge or experimenting with new behavior at the cost of less short-term reward, and exploitation is applying learned knowledge or proven behavior to achieve maximum current rewards. This balance is a fundamental element of a wide range of decision-making problems, from animal foraging to human learning and artificial intelligence such as reinforcement learning. Closely related is stay-leave behavior, which is the selection of staying in a current option or leaving it for another, typically observed in contexts such as foraging or multi-armed bandit tasks. The stay-or-leave decision is based on factors such as expected reward, risk aversion, and uncertainty, and it indicates the ability to calculate the opportunity cost of resources and time. In the context of neuropsychiatric illness, decision-making impairments can lead to cognitive rigidity, which can promote maladaptive behavior such as compulsive gambling, drug use disorders, or failure to adapt in dynamic situations. These decision-making systems are influenced by neural activity, particularly in areas that are involved in reward processing and cognitive control, including the dopamine system. Understanding the exploration–exploitation trade-off dynamics and stay-leave behavior is crucial to explore the cognitive and neural bases of heterogeneous psychiatric and neurological disorders [11, 23, 24, 27, 76]. Psychiatric disorders have been increasingly investigated for potential impacts on this delicate exploration–exploitation equilibrium, manifesting as deficits in flexibly switching between choices and optimizing behavior. Many studies have employed specialized decision-making tasks like n-armed bandit [5, 27, 37, 41, 42], foraging [24, 49, 53, 54, 70], reversal learning, and Iowa Gambling [12,13,14] to probe these facets across various mental disorders [18, 36, 71, 82]. Various disorders significantly impact cognitive processes, manifesting through distinct but sometimes overlapping patterns of impairment. In addiction-related disorders, such as pathological gambling (PG), alcohol dependence, and cocaine use, individuals typically exhibit deficits in decision-making, feedback processing, and the ability to make advantageous choices [25, 39]. These impairments often stem from difficulties in focusing attention, switching between cognitive patterns, and controlling impulsive behaviors [65]. Neurodegenerative disorders, including Parkinson's disease (PD) and neurodevelopmental disorders, including Autism Spectrum Disorder (ASD), present unique cognitive challenges. PD patients experience selective deficits in reward processing and novelty seeking, which are modulated by dopaminergic drugs. These medications can enhance reward processing but often disrupt punishment processing, indicating a complex interplay between neurochemical treatment and cognitive function [17, 70]. Individuals with ASD show distinctive decision-making patterns, characterized by slower learning from positive feedback and a tendency to constantly shift between choices, indicating a broader issue with integrating feedback over time [63, 89]. In the realm of mood and psychiatric disorders, conditions such as depression, anxiety, and schizophrenia (SZ) further illustrate the diverse impact on cognitive processes. Depression is associated with impaired decision-making, particularly in dynamic environments, due to altered sensitivity to reward and punishment [16, 20]. High trait anxiety leads to an attentional bias towards aversive stimuli, affecting overall cognitive functioning [28]. Schizophrenia patients exhibit reinforcement learning abnormalities, particularly in their ability to integrate positive decision outcomes over time [77, 78],[85]. ADHD, meanwhile, is characterized by deficiencies in reward prediction and decision-making, often linked to impaired learning processes [45]. An emerging field of computational psychiatry seeks to understand underlying computations in the brain and examines what components are altered in psychiatric disorders. One of the main components in decision-making and learning is exploration–exploitation balance.
The present systematic review aimed to elucidate patterns of exploration–exploitation impairments across psychiatric disorders, examining:
-
1.
which precise capacities were disrupted in each illness?
-
2.
Similarities and differences between disorders.
-
3.
How diseases affected decision-making balancing.
Clarifying these effects will advance our comprehension of the neurocognitive bases of adaptive choice behavior. Elucidating common and distinct influences on the dynamics of exploring uncertain options versus exploiting known rewards provides fundamental insights into underlying decision processes. This knowledge can inform interventions to restore the delicate yet vital equilibrium between exploitation and exploration.
Method
Search strategy
Articles up to April 12, 2023, were included in the search, an electronic database (PubMed) without limits on the year of publication was searched for relevant journal articles. The database was queried with keywords including one of these three dimensions: 1) foraging task, Patch leaving, and opportunity cost; 2) explore-exploit and Bandit task; or 3) Mental Disorders and Psychiatry Disorders.
The keywords, with the following syntax:
(forag* OR"patch leaving"OR patch-leav* OR"Explore/Exploit"OR"explore-exploit"OR"bandit"OR"opportunity cost"OR"opportunity-cost"OR"exploration–exploitation"OR"exploration exploitation"OR"Reinforcement learning"OR"exploratory behavior"OR"IGT"OR"Iowa gambling"OR"Reversal learning") AND ("Mental Disorders"[Mesh] OR"mental"OR Tourette OR dyslexia OR psychiatr* OR"Anxiety"OR"Panic"OR MDD OR BMD OR"major depression"OR"major depressive"OR"Alzheimer"OR Parkinson OR"brain injury"OR stroke OR TBI OR"minor depression"OR"minor depressive"OR"Bipolar"OR Gambl* OR"Dissociative"OR"Eating Disorders"OR"Anorexia Nervosa"OR"Binge-Eating"OR"Depressive Disorder"OR"Amnesia"OR"Dementia"OR schizo* OR"ADHD"OR"OCD"OR"obsessive"OR"compulsive"OR"attention-deficit"OR"attention deficit"OR"hyper-activity"OR"hyperactivity"OR Addic* OR"Tic Disorder"OR"PTSD"OR"Post-traumatic stress"OR"Learning Disability"OR"Learning Disabilities"OR"Sadism"OR autis* OR abuse OR"personality disorder"OR borderline OR sleep OR phobia OR alcohol OR substanc* OR smoke OR cigarette OR"dependent personality"OR attachment).
Eligibility criteria
Selected studies had to fulfill the following inclusion criteria:
-
1.
The study included human subjects.
-
2.
The publications were in English and peer-reviewed.
-
3.
Full-text availability.
-
4.
The study included at least one group with a psychiatric disorder and a healthy control group.
-
5.
Studies had to incorporate a task or paradigm to assess an aspect of the explore-exploit dilemma, foraging task, Patch leaving, or stay/switch.
In this systematic review, the first two authors conducted the study selection process independently. Their screening of titles and abstracts for entry was based on a pre-defined set of inclusion criteria. Disagreements and uncertainties were solved through discussion while the third reviewer, Author Third, was involved in resolving persistent disputes. This ensured that only appropriate studies were included for full-text review fairly and independently.
The reference lists of all included articles, including those that did not meet our inclusion criteria, were manually screened to identify additional eligible primary studies that may have been missed in the initial database search.
The complete list of articles was exported to EndNote X7 and was subsequently for the title and abstract screening. The titles and abstracts obtained with the search criteria in the database were independently screened. Then, each full text was screened for inclusion criteria. Reasons for exclusion were recorded. Additional articles were then added when appropriate.
The inclusion criteria required studies with both a clinical population and an experimental task that included aspects of the exploration–exploitation trade-off. We used fairly flexible inclusion and exclusion criteria, keeping several studies that looked at disorder-related traits such as anxiety, stress, and externalizing behaviors, rather than diagnosed disorders themselves. These additional studies, shown in Table 1, were kept because they offered useful insights relevant to this review.
Results
Study identification and selection
A total of 17,878 unique abstracts were included in this review (Fig. 1). After screening titles and abstracts, 264 articles were identified as potentially relevant. The full texts of these articles were reviewed to determine inclusion or exclusion, and 13 additional records were identified through other sources. Of the 277 articles that underwent full-text screening, 231 studies were excluded for the following reasons: not being human studies, lacking appropriate tasks, having healthy volunteers or no patient groups, not examining exploration–exploitation or foraging concepts, or being inconsistent with the study aim. Ultimately, 46 studies met all inclusion criteria and were selected. It is worth noting that a considerable number of articles on reinforcement learning were not included here, either because they did not directly address the topic of interest or only tangentially examined it.
Study characteristics
The characteristics of each included study are presented in Table 1. A total of 2468 patients were compared to 1881 healthy controls across the 46 studies. The inclusion criteria were as follows: 15 studies utilized the n-armed bandit task, 5 studies used a foraging task paradigm, 14 studies employed probabilistic reversal learning (PRL), 11 studies used the Iowa Gambling Task (IGT), one study used a choice task, one study included a risk task and one study used a continuous performance Temporal Utility Integration Task. Notably, over 456 articles examined PRL and IGT, but mostly those assessing exploration–exploitation dilemmas, patch leaving/foraging concepts, or stay/switch behavior were eligible for inclusion. Thus, a limited number of PRL and IGT studies met the criteria.
Outcome measures and results
In this study, three categories of behavioral outcome measures were examined to assess the exploration–exploitation trade-off:
-
1.
Stay-Leave or Stay-Switch Behavior: This behavior was assessed using the foraging task paradigm. In this task, participants had to decide whether to stay in an existing resource or search for new resources.
-
2.
Exploration–Exploitation Patterns: These patterns were evaluated using the n-armed bandit task. In this task, participants had to choose between different options with probabilistic rewards and maintain a balance between exploring new options and exploiting known options.
-
3.
Shifting and Switching: Assessment of shifting and switching behaviors was conducted through the IGT and PRL. In these tasks, participants had to be able to timely change their strategies and switch to new options.
This categorization was made due to the differences in the nature and characteristics of each of these tasks. Each task examined different aspects of the exploration–exploitation trade-off. The foraging task focused on the decision to stay or leave a resource. The n-armed bandit task focused on choosing between different options with probabilistic rewards. The IGT and probabilistic reversal learning assessed the ability to switch and change strategies.
It is worth noting that the stay-leave or switch behavior may have also been examined in other studies using reinforcement learning approaches, but in this systematic review, the stay-leave behavior was considered only from the perspective of the foraging task paradigm. The 4-armed bandit task and the patch-leaving task are both experimental paradigms used in psychology and behavioral economics to study decision-making, reinforcement learning, and exploration–exploitation trade-offs.
While they share some similarities, they have distinct characteristics: The n-armed bandit task typically involves discrete choices between a small number of options with probabilistic rewards, while the patch-leaving task involves continuous decision-making in spatial environments with uncertain resource distributions.
While both tasks involve exploring and exploiting options, the nature of exploration and exploitation differs. In the n-armed bandit task, exploration involves trying out different options to learn their reward probabilities, while in the patch-leaving task, exploration involves searching for new resource patches with potentially higher rates. The patch-leaving task often involves spatial navigation and memory processes, as participants must remember the locations and qualities of different patches. This aspect is less prominent in the n-armed bandit task.
We used measures such as stay-leave, win-stay, lose-shift, choice consistency, inverse temperature of softmax in models, switch after reversal, later switch in IGT after large punishments, model-based dissociation between random exploration and directed exploration and similar measures as an indicator of exploration–exploitation trade-off. In some studies, the results were so clear and in some studies, we inferred from results and switching patterns that were reported.
In summary, while all three tasks study decision-making under uncertainty, they differ in their specific setups, feedback mechanisms, temporal dynamics, and cognitive processes involved. The 4-armed bandit task focuses on discrete choice with immediate feedback, the patch-leaving task involves spatial navigation and resource management, and the IGTexamines decision-making in a risk-reward context with abstract options.
Tasks
Foraging task
In foraging tasks, individuals collect rewards by searching patches that vary in their available number of rewards (Fig. 2) [15, 88]. People can make decisions based on the trade-off between exploiting a particular patch and exploring to find a new patch with potentially higher available rewards. For instance, consider the scenario of berry collection, where an individual first harvests a specific blackberry plant. As time passes, most of the blackberries on that particular plant will be picked, making it increasingly time-consuming to find the remaining ones. At this stage, the picker faces a choice: continue searching for blackberries on the same plant or venture elsewhere, even if it means spending more time on travel and uncertainty regarding the quality of the next plant. The exploration–exploitation dilemma in foraging has been extensively studied in various animal species [22, 46].
The foraging task involves two visual layers. The aerial view displays a game environment with a movable character aiming to find hidden resource patches. Upon approach, patches become partly visible. The patch view provides a zoomed-in perspective after entering a patch. Resources (berries) are randomly distributed within patches. The goal is to collect as many berries as possible within a time limit. This two-layered structure examines how individuals balance patch exploitation with aerial exploration to optimize resource collection [81]
The bandit task
Another task widely used for probing how people trade off exploration and exploitation is the—n-armed bandit task, here, participants are asked to make repeated choices between multiple-choice options (“bandits”) to acquire rewards. Exploitation involves chasing each bandit’s expected value and selecting the best. In contrast, exploration can be undirected because of the stochastic selection of bandits (“random exploration”) [23, 27, 84]. Two important methods for exploration–exploitation are epsilon-greedy and softmax methods: in the first one in the epsilon fraction of trials, a non-best option is chosen randomly. In the softmax method, options are chosen with the probability that are in line with their value after passing through a Boltzmann (softmax) function.
To illustrate, consider a"four-armed bandit"task, which consists of repeated choices among four slot machines (Fig. 3). These slot machines provide payoffs in the form of points that can be exchanged for money. Importantly, the payoffs of the slot machines fluctuate noisily around four different means. Unlike traditional slot machines, the mean payoffs of these bandit machines change randomly and independently from trial to trial. Participants actively sample these machines to gather information about the current value of each slot. This experimental design, coupled with model-based analysis, allows researchers to study both exploratory and exploitative decisions within a single task [27].
In the experimental"four-armed bandit task,"participants repeatedly select one of four slot machines. Each slot machine awards points that can be converted into cash, and the average payouts of the machines vary at random and independently from one trial to the next. To learn about their current value and make choices based on their perceptions of which slot machine offers the best payouts, participants must actively sample the machines. The goal of the task is to investigate exploratory and exploitative decision-making under ambiguous circumstances [27]
Probabilistic reinforcement learning (PRL)
Probabilistic reversal learning (PRL) serves as a robust behavioral task employed to evaluate the delicate balance between factors such as cognitive flexibility, impulsivity, and compulsivity. This task enables the investigation of the impact of positive or negative feedback on learning within various neurological, neurodevelopmental and psychiatric conditions, including Huntington's disease, ASD and schizophrenia, in both patient populations [51, 61].
The Iowa Gambling Task (IGT)
The task involves four decks of cards, and participants are asked to draw cards from these decks, to maximize their winnings over a series of trials. Some decks are advantageous in the long run (leading to overall positive outcomes but occasional losses), while others are disadvantageous (leading to immediate gains but long-term losses). The challenge for participants is to figure out which decks are more beneficial and to adjust their choices accordingly [12].
Findings
We categorized the disorders observed in the reviewed studies into three proposed groups to aid in better conceptual organization: A) Addiction-Related and Compulsive Conditions, B) Mood and Psychiatric Conditions, C) Neurological (Neurodevelopmental and neurodegenerative) Conditions.
Addiction-related and compulsive conditions
A category encloses the disorders characterized by dysregulated reward circuits and impaired behavioral inhibition, where decision-making processes are fundamentally altered by maladaptive reward sensitivity and compulsive behavioral patterns. Gambling disorder (GD) is an addiction characterized by difficulties in decision-making based on value and behavioral adaptability.
Wiehler and colleagues performed a four-armed bandit task during functional magnetic resonance imaging (fMRI) to quantify and compare exploration behavior between the two groups. Participants’ choices were analyzed using computational modeling with reinforcement learning algorithms of varying complexity. The computational modeling demonstrated reduced directed exploration; directed exploration is dissociated from random exploration in this study, and hints towards an exploration that purposefully search for new information and choose in way that improve the information content. However, fMRI analysis showed no significant differences in brain activity between gamblers and controls, except for decreased activation during direct exploration specifically in the substantia nigra/ventral tegmental area which are dopaminergic areas. So this can relate the directed exploration to dopaminergic activity in the brain [87].
In contrast, Addicott and colleagues hypothesized that gambling may utilize similar reward circuitry in the brain as addictive substances. To test this, they employed four-armed bandit and patch foraging tasks. Their findings revealed a positive correlation between gambling frequency/beliefs and exploratory decision-making. Individuals who gambled weekly exhibited poorer performance compared to those who gambled yearly on both tasks. Additionally, stronger gambling perceptions were associated with prematurely exiting patch foraging. Overall, the authors conclude frequent gamblers show suboptimal abilities to maximize rewards, suggesting they may ineffectively search and forage natural resources.
Meanwhile, using a probabilistic reinforcement learning paradigm, Perandrés-Gómez, and colleagues discovered that individuals with gambling though were prone to shifting choices after single instances of reward or punishment, yet also persisted in the same decision despite multiple consecutive penalties. This implies a greater tendency to persevere with choices in the face of accumulated negative feedback [66]. This finding is consistent with [87] that show a reduction in directed exploration.
In another study, Abram et al. found that attractive rewards or incentives related to taking high risks and potentially incurring losses can reliably predict an individual's subsequent decision to pursue additional rewards. In other words, the choices people make after being enticed by high-risk but potentially rewarding situations can be anticipated based on the initial appealing incentives. Conversely, in the absence of such losses, there appears to be no discernible impact on the processing of reward value or the pursuit of rewards. Notably, individuals with higher externalizing traits that are at a higher risk of developing addiction, exhibited no decline in their propensity to select subsequent risky options. These findings strongly imply that the inability to learn from errors may represent a significant contributing factor to the risk of addiction [3].
Furthermore, Morris et al., employing a clock arm rotating at 5 s per revolution, found that patients diagnosed with Alcohol Use Disorder (AUD) exhibited reduced exploratory behavior in both gain and loss domains, leading to a decrease in the efficiency of exploitative choices. Interestingly, obese subjects with and without Binge Eating Disorder (BED) were not significantly different. However, when compared to each other or AUD participants, individuals with BED displayed heightened exploratory behaviors, particularly in the loss domain [62]. Lastly, Goudriaan et al. investigated decision-making deficits in pathological gambling (PG) and alcohol dependence (AD) groups, comparing them to control. Their findings indicated that both the PG and AD groups exhibited deficits in decision-making processes, attributed to deficiencies in feedback processing after losses. Interestingly, this pattern did not extend to the Tourette syndrome (TS) group. Furthermore, subgroup analyses revealed larger decision-making deficits in pathological slot machine gamblers than in pathological casino gamblers. Additionally, the results indicated that both the PG and AD groups selected fewer advantageous decks than the control group, with only the PG group exhibiting faster decision-making than the control group and deficits in decision-making strategies, persistence in seeking rewards, and shortcomings in learning discrimination with reward and loss scenarios each have unique yet cumulative impacts on the emergence and persistence of gambling issues. Subsequent analyses indicated that the healthy control group shifted more after losses than after rewards, while this effect was almost absent in the PG group. In conclusion, PG individuals show an exploitative bias, persisting in suboptimal choices due to poor feedback adaptation [39].
Individuals with substance use disorders experience a disrupted explore-exploit balance due to impaired learning and feedback processing. Smith et al., through the application of computational modeling, multiple regression, and hierarchical Bayesian group analyses, delved into the investigation of how individuals with Substance Use Disorders (SUDs) compared to controls in resolving uncertainty within a gambling task. While SUD participants won less frequently than controls most of the time, both groups exhibited similar reward sensitivity and insensitivity to data, influencing their exploration strategies. However, SUD participants showed distinct characteristics, including low action precision, high learning rates for rewards, and low learning rates overall. Consequently, this suggests that poor performance in explore/exploit decisions in SUDs arises not only from incompatible selections, especially in the face of positive outcomes, but also from suboptimal learning rates for rewarding vs. non-rewarding outcomes [74].
In another study, Harle et al., employing a two-armed bandit task, embarked on a comparison of learning and decision-making in individuals addicted to methamphetamine (MDI) with controls. Their findings highlighted significant differences between HV and MDI individuals in their rate of reward learning and the subsequent use of that information to make decisions. HV participants demonstrated the tendency to learn from feedback but gave disproportionate weight to recent data. Consequently, their decision strategies were best fit as proposed version of Softmax model. Conversely, MDI individuals appeared to pursue an ordinary and independent learning policy, akin to a Win-stay/Lose-shift (WSLS) strategy. Despite this, their choices had a low probability of aligning with the optimal option that provided the most reward [42].
Additionally, Addicott et al.'s investigating explore/exploit behavior in cigarette smokers and its neural correlates emphasized the cognitive effort required for exploratory decision-making. Their results indicated that increased self-reported smoking behavior correlated with enhanced brain activation during exploratory decision-making, suggesting that exploratory decision-making necessitates more cognitive effort [4]. Moreover, Addicott et al. revealed that smokers engaged in a bandit task exhibited lower exploratory behavior than non-smokers and made fewer switches between bandit arms [8].
Patzelt, E. H. et al., utilizing a reversal-learning task, illuminated that individuals addicted to cocaine displayed more frequent switching behavior in response to false feedback and spontaneous shifts in the reversal-learning task [65].
Robinson et al. displayed that people with methamphetamine use disorder (PwMUD) experience a breakdown in the exploration–exploitation dilemma due to deficits in reinforcement learning rather than inflexibility. Their difficulty in capitalizing on stable reward contingencies stems from poor learning performance and weak action-reward associations. Instead of committing to beneficial choices, PwMUD exhibits excessive exploration, frequently switching responses based on recent feedback rather than relying on accumulated experience. However, the lack of heightened perseveration suggests that these deficits arise from unstable learning processes rather than compulsive behavior. These findings indicate that treatment strategies should focus on enhancing reinforcement learning processes rather than solely targeting compulsivity [68].
Cordovil et al. investigated the impact of alcohol withdrawal on affective selective attention and executive functions (EFs) in alcohol-dependent patients. In comparison to control subjects, alcohol-dependent patients exhibited reduced abilities to focus on relevant information and poor switching performance in making advantageous choices [25].
Collectively, these studies highlighted that Individuals with substance use disorders display varied explore-exploit behaviors, with some showing excessive exploration due to unstable learning (e.g., methamphetamine users) and others favoring rigid exploitation (e.g., smokers). These deficits stem from impaired reinforcement learning rather than pure compulsivity, highlighting the need for treatment strategies that enhance adaptive decision-making.
Eating disorders, like other compulsive behaviors and addictions, disrupt decision-making by altering the balance between exploring new options and rigidly sticking to familiar choices. Kristjánsson and colleagues utilized a foraging task paradigm to reveal differences in switch costs between an eating disorder symptom group and a control group. The symptom group exhibited higher switch costs [49].
Supporting this, Verharen and colleagues computationally modeled data from anorexia nervosa patients and controls performing the IGTbased on prospect utility theory tenets. Compared to controls, anorexia nervosa patients showed intact reward sensitivity, value learning, and exploration–exploitation balances. However, they displayed reduced loss aversion typically seen in healthy populations [82].
Giannunzio and colleagues investigated decision-making in a substantial sample of adolescent and adult anorexia nervosa patients. They analyzed data using two models that assess exploratory abilities/reward-punishment sensitivity versus learning from feedback to optimize long-term gains once rules are inferred. On average, adult anorexia nervosa patients exhibited poorer decision-making, with deficits in exploration and reversal learning, alongside a bias toward disadvantageous choices. Both age groups showed heightened loss avoidance compared to controls [36]. To sum up, Individuals with eating disorders exhibit impaired explore-exploit trade-offs, with anorexia nervosa patients showing deficits in exploration, heightened loss avoidance, and a bias toward disadvantageous choices. These decision-making impairments highlight the need for interventions that enhance cognitive flexibility and adaptive learning.
Mood and psychiatric disorders
In this section we investigate the mood and psychiatric disorders that fundamentally disrupt cognitive adaptability. We reveal how emotional and cognitive variations modulate exploration–exploitation dynamics through altered uncertainty perception and adaptive learning mechanisms. Anxiety, like other mood disorders, disrupts decision-making by altering how individuals balance exploring new possibilities and committing to familiar choices, often leading to maladaptive behaviors.
Marzuki and colleagues utilized a PRL task to reveal increased reward pursuit and exploratory choices in young OCD patients regardless of medication status, alongside decreased sensitivity to punishments [58].
Similar to Marzuki’s findings, Aberg and colleagues employed a multi-armed bandit task, discovering increased exploratory choices in those with higher trait anxiety, regardless of gain or loss contexts. They also found anxiety reduces exploratory drives arising from uncertainty [1].
In line with these findings, Aylward and colleagues computationally investigated learning under uncertainty in both healthy and mood/anxiety disorder individuals. Their findings suggested high anxiety and mood symptoms shift the exploration–exploitation balance toward heightened exploration [10].
In their findings, Visser and colleagues linked trait anxiety to gender differences in complex decision-making, assessing healthy volunteers with the Iowa Gambling Task. Anxiety disrupted men's early exploratory performance but impaired women in later exploitative stages [28].
Finally, Lamba and colleagues hypothesized anxious individuals may have more difficulty learning in uncertain social settings. Their bandit task revealed control group subjects swiftly identified when to cease investing in exploitative partners under social uncertainty. However, anxiety led to excessive investment due to discounting learning from negative social outcomes [50]. Anxiety generally shifts the explore-exploit trade-off toward increased exploration, with individuals displaying heightened exploratory choices and difficulty learning from negative outcomes. However, uncertainty can drive more exploitative behavior, suggesting that anxiety’s impact on decision-making depends on context and individual differences. While anxiety pushes individuals toward constant exploration, depression often locks them into a cycle of indecision, reducing the drive to explore new opportunities and amplifying the tendency to avoid change. Harle and colleagues investigated how variations in affective measurements relate to different facets of reward-based learning and decision-making in depression. Utilizing a two-armed bandit task, they discovered increased stochasticity and exploratory tendencies in depressive individuals with high anhedonia. It appears anhedonia is associated with reduced valuation of instant rewards. Additionally, the researchers found depressive individuals with anhedonia were less inclined toward reward maximization, while anxious participants displayed greater reliance on independent win-stay/lose-shift learning strategies [41].
In a comparable finding, Cella and colleagues assessed and compared flexibility in decision-making between medicated unipolar major depressive disorder patients and controls using the Iowa Gambling Task. Their findings revealed deficits in depressive patients during both standard and contingency-shift phases, implying greater dependence on previous positive emotional outcomes and reduced adaptability in transition toward new advantageous behaviors [20].
In contrast, Dombrovski and colleagues examined reversal learning in elderly individuals with a history of suicide attempts and depression. Their results indicated no significant differences emerged in reward learning rates or exploratory drive between these depressed participants and controls [30]. In conclusion, individuals with depression, particularly those with high anhedonia, show increased exploration but reduced reward maximization and adaptability, highlighting difficulties in adjusting decision-making strategies and learning from past outcomes.
Additionally, Urošević and colleagues studied reward and threat sensitivity correlates of probabilistic decision-making in teenagers with and without bipolar disorder. Relative to controls, teens with bipolar disorder displayed a lower tendency to persist with prior positively reinforced options. Comparison between groups suggested enhancing win-stay rates constitutes a more optimal learning approach [80].
While anxiety and depression disrupt decision-making by skewing the explore-exploit balance, schizophrenia exacerbates this imbalance, causing unpredictable shifts and an inability to consistently leverage past rewards. Saperia and colleagues utilized the IGT to examine win-stay/lose-shift behavior and evaluate the effects of rewards and penalties guiding real-world decisions in schizophrenia and depression. They revealed unaltered lose-shift yet reduced win-stay rates in schizophrenia patients compared to controls, significantly linked to motivational and cognitive deficits. In contrast, no differences emerged in depressed patients [71]. In agreement with Saperia’s results, Culbreth and colleagues uncovered inefficient win-stay/lose-shift strategies and reduced feedback learning in schizophrenia patients using a PRL task [26]. Moreover, Cathomas and colleagues found performance reductions in schizophrenia patients on a bandit task attributable to over-switching from exploitation to random exploration relative to controls [19].
Similarly, Martinelli and colleagues revealed schizophrenia patients were attracted to new options over learned high-value choices in a multi-armed bandit task, potentially disrupting balanced exploration–exploitation tradeoffs [57]. Additionally, Reddy and colleagues identified more frequent switching in response to both positive and negative feedback among schizophrenia outpatients compared to control subjects on a PRL task, reflecting indiscriminate shifting [67].
Waltz and colleagues also demonstrated early learning stage instability in schizophrenia patients, with excessive switching regardless of feedback on a similar PRL task [86]. Furthermore, Matsuzawa and colleagues revealed failures to shift from disadvantageous to beneficial decks in schizophrenia patients despite large penalties on the Iowa Gambling Task [60].
Moreover, Strauss and colleagues found schizophrenia (SZ) patients were reluctant to explore potentially superior choices. SZ patients demonstrated a notable and consistent decrease in their inclination to make exploratory behavioral changes toward responses that might produce greater expected values compared to those achieved by maintaining the current situation [77, 78]
Finally, Vinckier and colleagues mimicked the psychosis with low dose ketamine and proposed uncertainty in psychosis explains failures to leverage environmental patterns, consistent with evidence showing poor reversal learning. They investigated confidence-driven transition towards the exploitation of learned contingencies and showed that ketamine as a proxy to psychosis reduces this confidence dependent weight and so increase the exploration [83].
In summary, Schizophrenia patients exhibit significant deficits in decision-making, including excessive switching between options and a reduced ability to exploit learned choices, reflecting both cognitive and motivational challenges. These patterns of erratic exploration and poor feedback learning highlight the disruption of the explore-exploit trade-off in schizophrenia, distinguishing it from other mood disorders.
Neurological (Neurodevelopmental and Neurodegenerative) Disorders
Investigation of neurologically-rooted disorders that systematically impair social learning, adaptive decision-making, and behavioral regulation, highlighting neurocognitive mechanisms underlying exploration–exploitation trade-offs. ASD is a neurodevelopmental condition marked by ongoing difficulties in social communication and interaction, along with the existence of limited, repetitive behaviors, interests, or activities.
Addicott and their research team investigated the impact of ADHD status and the administration of methylphenidate, a common medication for ADHD, on the decision-making process involving exploration and exploitation. This was achieved through the utilization of a 6-armed bandit task. Their results showed that people with ADHD tended to explore more, but their exploratory behavior was suboptimal. Notably, they displayed a recurring pattern of selecting low-value options, even at the expense of optimizing their rewards, ultimately resulting in fewer overall gains [6].
In a similar vein, Hauser and their colleagues reported a phenomenon of amplified yet relatively simplistic exploratory behaviors among individuals diagnosed with ADHD when compared to a control group in a probabilistic reinforcement learning task [44].
In another study, Hauser and colleagues shed light on the heightened variability observed within the ADHD population, which could be indicative of an altered balance between exploration and exploitation. Their use of a reversal-learning task brought to light that only a minority of individuals with ADHD demonstrated the ability to effectively acquire knowledge through exploratory approaches [45]. In sum can be concluded that Individuals with ADHD exhibit a tendency to engage in excessive and often ineffective exploration, selecting low-value options and demonstrating high variability in their decision-making. This disrupted balance between exploration and exploitation highlights challenges in optimizing rewards, underscoring the need for strategies to improve decision-making efficiency.
Individuals with ASD face distinct challenges in decision-making, often displaying rigid behaviors that impact their ability to balance exploration and exploitation. Carlisi and colleagues conducted a comparative analysis of decision-making in individuals with ASD and Obsessive–Compulsive Disorder (OCD). Both ASD and OCD groups displayed a reduction in choice consistency and lower reinforcement learning abilities compared to the control group. Additionally, individuals with ASD showed specific abnormalities in choice perseverance, and ASD adolescents exhibited increased switching behavior. Interestingly, OCD patients demonstrated similar heightened exploration tendencies to those observed in individuals with ASD, regardless of outcome sensitivity [18].
Building upon this, Mussey and their team utilized the IGTand reported that patients with ASD, in contrast to the control group, exhibited a higher frequency of deck-switching, which led to suboptimal decision-making. Furthermore, these patients displayed a slower learning curve in identifying advantageous decks, aligning with the notion of reduced reinforcement learning in ASD individuals [63].
In contrast to these patterns, Yechiam et al. utilized the IGT to reveal a unique adaptive learning style among certain individuals with ASD. This style was characterized by reduced sensitivity to immediate motivational structures and an intensive exploratory search for available alternatives. It's noteworthy that this distinct learning approach, advantageous in specific contexts, may be perceived as abnormal in social settings [89].
Finally, Solomon et al., using Probabilistic Reinforcement Learning, delved into the performance of adults with Autism ASD. Their research corroborated earlier findings, indicating that individuals with ASD encounter challenges in utilizing positive feedback to exploit reward choices, resulting in a slower learning rate. This pattern reinforces the overarching theme observed in individuals with ASD across various decision-making tasks [75].
It could concluded that for individuals with ASD, the balance between exploration and exploitation seems to be affected, frequently favoring increased exploration or diminished exploitation. This disparity is shown in actions like heightened switching among choices, diminished responsiveness to rewards, and difficulties in using feedback to enhance decision-making. These patterns indicate that people with ASD might focus on acquiring information rather than optimizing known rewards, possibly detracting from effective learning and adaptive results.
Parkinson’s Disease (PD) is a neurodegenerative disorder that caused by deficiency in dopaminergic pathway. Studies that investigate the exploration–exploitation balance in this disease can reveal the role of dopaminergic system in exploration. Constantino et al. investigated the role of tonic dopamine (DA) replacement medication in PD patients, focusing on both ON and OFF medication states using the patch-foraging task. They aimed to understand the influence of dopaminergic opportunity cost signals on successive decision-making. Their results revealed that the absence of DA reduced the mental opportunity cost of time. In contrast, PD patients in the OFF medication state exhibited a lower threshold for rewards compared to the control group. However, the administration of tonic dopamine DA substitution medication can make a relief on this deficit.
Expanding upon this research, Rutledge et al. [70] explored the impact of dopaminergic drugs on learning rates and perseveration in PD patients, employing a foraging task. Their investigation found that dopaminergic medication increased learning rates in PD patients. Interestingly, learning rates in PD patients off dopamine medication resembled those of the aged control group. Furthermore, a comparative analysis of learning rates in PD patients with individuals experiencing milder disease progression or the elderly demonstrated that PD patients exhibited diminished learning rates. Notably, the medication appeared to selectively enhance learning rates for positive outcomes while having no such effect on negative outcomes.
Bodi et al. [17] studied the effect of dopamine agonists on reward and punishment processing in never-medicated, young Parkinson's disease (PD) patients. They demonstrated that never-medicated PD patients had decreased novelty-seeking and reward processing. Dopamine agonists (pramipexole and ropinirole) ameliorated novelty seeking and reward processing, augmenting the interaction between both behaviors and interfering with punishment learning without influencing harm avoidance. Most importantly, the study highlighted that PD patients, due to dopaminergic dysfunction, have difficulty in achieving a balance between exploration–exploitation trade-offs. In the unmedicated state, they were biased toward exploration (novelty seeking) but lacked efficiency in reward processing to enact exploitation (exploiting current rewards). Dopamine agonists corrected the imbalance by improving reward-based learning and thus facilitating a more effective exploration–exploitation strategy.
In a distinct context, Djamshidian et al. [29] assessed novelty-seeking behavior in PD patients using a three-armed bandit task. They categorized patients into non-impulsive and impulsive-compulsive behavior (ICB) groups and assessed their behavior in both ON and OFF dopaminergic medication states, comparing it to a control group. The findings revealed that PD patients with ICB displayed a significantly stronger inclination toward choosing novel options, irrespective of medication status, in contrast to non-impulsive PD patients or controls.
In another related study focusing on reward and punishment processing in PD patients, Bo´di et al. [17] employed a probabilistic classification task. Their investigation centered on young, never-medicated PD patients and identified a significant reduction in novelty-seeking and reward processing, both of which were interrelated.
In conclusion, Parkinson's disease disrupts the balance between exploration and exploitation due to dopaminergic dysfunction, leading to excessive exploration and impaired reward processing. However, dopaminergic treatment can alleviate these deficits, enhancing learning rates and improving the ability to balance exploration and exploitation effectively.
Table 2 illustrates the impact of various disorders on decision-making, particularly regarding whether people are inclined to prefer exploitation (remaining with familiar rewards) or exploration (pursuing new alternatives). The Cohen’s d effect sizes reveal the intensity and orientation of these patterns when data is present. Although the variety of measures and tasks employed in different studies varies, we endeavored to standardize the results by computing effect sizes whenever feasible. In Table 2, we display the Cohen’s d values along with the direction of the noted differences between individuals having different disorders and the control groups. This method facilitates relative comparisons among disorders, aiding in the recognition of wider patterns in decision-making habits. Based on the gathered information, we notice a significant trend in specific conditions. ADHD, Autism Spectrum Disorder (ASD), and Schizophrenia show a higher inclination for over-exploration, indicating that people with these conditions tend to pursue new alternatives instead of remaining with familiar options. This behavior may be associated with problems in cognitive flexibility, increased sensitivity to uncertainty, or variations in dopamine regulation that impact reinforcement learning.
In contrast, Parkinson’s disease, pathological gambling, and specific addictions reveal a tendency for over-exploitation, as people often continue to pursue choices that were previously rewarding instead of seeking out new options. This might indicate a diminished capacity to adapt to evolving surroundings, possibly caused by deficiencies in reward learning, habit development, or compromised cognitive control functions. For anxiety, eating disorders, and certain types of addiction or substance use disorders, the results are less consistent and dependent on context. Certain research indicates a propensity for greater exploration, possibly fueled by an increased sensitivity to uncertainty or avoidance behaviors, whereas other studies reveal a greater dependence on exploitation, which may stem from inflexible thought processes or compulsive behaviors. These varied outcomes underscore the intricacy of these conditions and indicate that elements like task structure, severity of symptoms, and personal differences could greatly impact decision-making.
Discussion
This systematic review aimed to clarify how psychiatric conditions affect the intricate interplay between exploration and exploitation within decision-making processes and reinforcement learning processes. The results highlight that different mental health conditions alter this balance in distinct manners, often reflecting underlying neurocognitive and neurochemical disruptions. A useful framework organizes these conditions into three broad categories: addiction-related, emotional and cognitive, and neurodevelopmental disorders. These patterns are driven by heightened sensitivity to uncertainty, impaired reward processing, and dysfunctional cognitive control, particularly mediated by dopamine dysfunction and frontostriatal circuitry. The updated findings underscore the critical role of these mechanisms in modulating the exploration–exploitation trade-off, offering insights for targeted interventions.
-
1.
Addiction-Related Conditions: Encompassing disorders defined by compulsive tendencies and reduced mastery over reward-oriented actions, this group reveals a disrupted equilibrium between exploratory and exploitative tendencies, accompanied by flawed coordination of these decision-making components.
-
2.
Mood and Psychiatric Conditions: This category includes disorders distinguished by irregularities in emotional regulation, cognitive functioning, and thought processes, which shape exploration and exploitation through influences such as gender differences, immediate stress responses, and motivational triggers.
-
3.
Neurological (Neurodevelopmental and Neurodegenerative) Conditions: neurodevelopmental disorders, stemming from atypical neurological maturation, these disorders are characterized by excessive, yet frequently unproductive, exploratory behavior, which impedes learning processes and the capacity to maximize beneficial outcomes. In other hand, neurodegenerative disorders such as Parkinson that reveals the role of neural structures especially dopaminergic pathways on exploratory behavior.
-
1. Addiction-Related Conditions
Disorders exhibiting addiction-like traits, coupled with weakened control over reward-focused behaviors, often present a skewed balance between exploratory and exploitative inclinations. Examples include gambling disorder, substance dependency, and behavioral compulsions. These conditions are typified by an intense focus on exploiting rewards, where persistent pursuit of gratification overshadows flexible exploration. This overexploitation is linked to a dysregulated dopaminergic system that reinforces habitual behaviors while diminishing uncertainty-driven exploration, as evidenced by impaired dopamine-related neural circuits affecting decision-making in risky contexts [4, 73, 86]. The ineffective reconciliation of exploration and exploitation in these cases is closely tied to compromised dopaminergic pathways and an amplified response to reward signals. For instance, gambling disorder exemplifies excessive exploitative tendencies fueled by flawed reward-learning mechanisms. Notably, interventions targeting neural structures such as the striatum and amygdala have demonstrated promise in recalibrating decision-making equilibrium [34, 35].
-
2. Mood and Psychiatric Conditions
Conditions characterized by disruptions in emotional stability, cognitive processes, and thought organization—such as depression, anxiety, obsessive–compulsive disorder (OCD), and schizophrenia—demonstrate a sophisticated relationship among emotional regulation, motivational influences, and decision-making dynamics. These disorders alter the balance between exploration and exploitation in subtle ways, shaped by factors including response to chronic or acute stress, gender distinctions, and sensitivity to rewards. Psychiatric disorders that increase exploration, such as anxiety disorders, enhance exploratory behavior due to an aversion to uncertainty, linked to hyperactivity in the anterior insula (aINS) and dorsal anterior cingulate cortex (dACC), which modulate exploratory decisions [2]. Depression is associated with increased exploration, as diminished reward sensitivity leads individuals to undervalue potential gains, prompting a shift toward seeking new information [16]. Similarly, ADHD promotes heightened, yet often random, exploration due to reduced dopamine transporter availability and impaired mesocorticolimbic function, resulting in impulsive, inconsistent decision-making [44]. Schizophrenia is characterized by deficits in uncertainty-driven directed exploration, linked to dopamine dysfunction and altered novelty processing in prefrontal and striatal regions [19, 57]. OCD and ASD also share increased exploratory behavior, possibly due to deficits in cognitive flexibility and reinforcement learning, contributing to excessive switching and poor adaptation to changing contingencies [18].
Conversely, some emotional and cognitive conditions increase exploitation by biasing individuals toward known choices. In schizophrenia, individuals with negative symptoms exhibit decreased exploration, potentially due to motivational deficits linked to impaired ventromedial prefrontal cortex (vmPFC) and striatal function [71]. In OCD, malfunctioning fronto-striatal networks contribute to repetitive actions and a pronounced dependence on exploitative approaches [59]. These findings highlight the diverse neural bases influencing the exploration–exploitation trade-off in this category.
-
3. Neurological (Neurodevelopmental and Neurodegenerative) Conditions
Disorders rooted in neurodevelopmental irregularities, such as ADHD and ASD, frequently arise from dysfunctions in dopaminergic, reduced dopamine transporter availability and impaired mesocorticolimbic function. These conditions are distinguished by an abundance of exploratory activity that often proves inefficient, obstructing effective learning and resulting in impulsive, inconsistent decision-making [44]. In ADHD, for example, individuals exhibit overly active exploration, which manifests as impulsiveness and challenges in maintaining focus on tasks requiring sustained exploitation. Conversely, ASD may be characterized by repetitive patterns or narrowly defined interests that restrict the scope of flexible exploration. Age plays a modulating role in these patterns: younger individuals with neurodevelopmental conditions often display pronounced novelty-seeking tendencies linked to ongoing brain maturation. ASD) increased exploratory behavior, possibly due to deficits in cognitive flexibility and reinforcement learning, contributing to excessive switching and poor adaptation to changing contingencies [18]. From perspective of age, it is interesting that older adults experiencing cognitive decline may lean heavily on exploitative strategies [55]. This contrast hint toward an important confounding factor, age, on the findings. Though most studies have age-matched control group, but always the caution on age for any conclusion should be taken [see theoretical integration section below].
In neurodegenerative disorders, PD is mostly investigated and considering the low dopaminergic activity, patients show more exploitative behavior and reduced novelty seeking, while dopaminergic treatment compensate this deficit. This shows an overemphasized role of dopamine in exploratory behavior.
Systematic task comparisons and goals: conceptual framework
Central to the exploration–exploitation dilemma is the inherent conflict between pursuing new possibilities and capitalizing on established rewards—a delicate equilibrium frequently disturbed in psychiatric conditions. Experimental approaches, such as bandit tasks, reversal learning exercises, and foraging scenarios, provide effective means to investigate these dynamics, particularly when paired with computational frameworks like Markov decision processes (MDPs). Future studies could leverage advanced computational tools such as Bayesian hierarchical modeling and deep reinforcement learning to capture individual variability in exploration–exploitation patterns, offering greater precision over traditional reinforcement learning models [9, 91].
These models illuminate how rising uncertainty tends to steer behavior toward exploration, whereas shrinking timeframes favor exploitation, a pattern well-documented in reinforcement learning studies [9]. Such theoretical developments anchor experimental observations within a unified decision-making perspective, underscoring the pivotal influence of altered dopamine signaling across diverse psychiatric conditions [52, 70].
The experimental tasks employed—bandit tasks, reversal learning, and foraging exercises—each probe distinct dimensions of this equilibrium. Bandit tasks are tailored to evaluate decision-making amid uncertainty, requiring individuals to balance the benefits of exploiting familiar rewards against the potential of exploring uncharted options, rendering them highly responsive to fluctuations in reward predictability. Reversal learning tasks, by contrast, emphasize cognitive adaptability, as participants must adjust their choices in response to changing reward patterns, thereby assessing their capacity to draw on prior knowledge while remaining flexible in new situations. Foraging scenarios, with their evolving environmental demands, similarly gauge the ability to modulate exploration and exploitation in real time as circumstances shift. These paradigms place differing demands on the exploration–exploitation interplay, and their varying susceptibility to disruptions in decision-making—particularly in psychiatric contexts—highlights the importance of comparative task analysis in pinpointing specific impairments [9, 52, 70].
A methodical comparison of these tasks enables researchers to identify which elements of the exploration–exploitation continuum are most prone to disturbance across different disorders. For instance, individuals with Parkinson’s disease may exhibit shortcomings in reinforcement learning, yet they can adapt to fluctuating reward environments when dopaminergic function is appropriately adjusted [21, 70].
In all mentioned tasks, underlying mechanisms and modules of decision making are intertwined, i.e. reward sensitivity, information seeking, memory of past rewards, risk management, loss aversion and etc. are working together to get to a decision. Dissociation of these elements are not always possible. Different studies have different approaches in analysis of participants behavior. Sometimes just a behavioral index is reported such as win-stay or lose-shift rate. Sometimes multiple model-based parameters are reported such as learning rate and inverse temperature. Sometimes reported measures are contaminated and are a mixed report of multiple processes. A methodical comparison of these tasks enables researchers to identify which elements of the exploration–exploitation continuum are most prone to disturbance across different disorders. For instance, individuals with Parkinson’s disease may exhibit shortcomings in reinforcement learning, yet they can adapt to fluctuating reward environments when dopaminergic function is appropriately adjusted [21, 70].
By comparing task paradigms, we can assess whether deficits are more pronounced in situations of uncertainty (e.g., bandit tasks) than in those requiring cognitive flexibility (e.g., reversal learning), and hence yield information regarding the mechanisms and allow for more targeted treatments to be constructed. Standardisation of the measures and tasks is important in order to improve comparability across studies and translational utility of the research.
Theoretical integration: neural mechanisms
In decision making and reinforcement learning tasks, participants should decide between relying on current knowledge on some options and reducing current uncertainty about other options. Information seeking for uncertainty reduction is one of the main factors that drives exploration. So the relative amount of confidence about the options’ values plays a crucial role in exploration–exploitation balance. Dissociation between directed exploration and random exploration also emphasizes on the type of information is sought.
On the other hand, we know that diffusive neuro-modulatory systems such as Dopamine, Norepinephrine, and acetylcholine represent the uncertainty in the brain [32]. Considering that the synthesis pathway of dopamine is common with norepinephrine and the similarities of their role in the representation of uncertainty, an integrated picture of observed patterns in exploration–exploitation-trade-off in disorders can emerge. Norepinephrine that represents the unexpected uncertainty [90] is important in alertness and anxiety related disorders. So, the effects of stress and anxiety on exploration can be linked to uncertainty via norepinephrine. Dopamine also represents uncertainty and has a similar pathway for synthesis to norepinephrine, so it has similarities to attention and learning. In ADHD, ASD, Parkinson's, substance use, eating disorders, and gambling disorders, the modified dopaminergic activity plays an important role in the change of exploration–exploitation trade-off.
These models illuminate how rising uncertainty tends to steer behavior toward exploration, whereas shrinking timeframes favor exploitation, a pattern well-documented in reinforcement learning studies [9]. Such theoretical developments anchor experimental observations within a unified decision-making perspective, underscoring the pivotal influence of altered norepinephrine and dopamine signaling across diverse psychiatric conditions [52, 70].
Dopamine and balancing exploration and exploitation
Psychiatric disorders that increase exploration often involve heightened uncertainty sensitivity, impaired reward processing, or dysfunctional cognitive control, all of which influence dopamine-mediated decision-making. Anxiety disorders, for instance, enhance exploration due to an aversion to uncertainty, linked to hyperactivity in the anterior insula (aINS) and dorsal anterior cingulate cortex (dACC), which modulate exploratory decisions [1]. Similarly, depression is associated with increased exploration, as diminished reward sensitivity leads individuals to undervalue potential gains, prompting a shift toward seeking new information [16]. Additionally, schizophrenia is characterized by deficits in uncertainty-driven directed exploration, linked to dopamine dysfunction and altered novelty processing in prefrontal and striatal regions [19, 57].
Conversely, other psychiatric conditions increase exploitation by biasing individuals toward known choices. Addiction, particularly substance use disorders, is linked to a dysregulated dopaminergic system that reinforces habitual behaviors while diminishing the ability to shift strategies [4, 73]. Gambling disorder similarly reduces uncertainty-driven exploration due to impaired dopamine-related neural circuits, affecting decision-making in risky contexts [86]. In Parkinson’s disease, dopamine depletion leads to an excessive reliance on exploitation, as patients struggle with flexible reinforcement learning and exhibit perseverative choices despite changing rewards [70]. Lastly, individuals with schizophrenia who display negative symptoms exhibit decreased exploration, potentially due to motivational deficits linked to impaired ventromedial prefrontal cortex (vmPFC) and striatal function [71]. These findings underscore the critical role of dopamine and frontostriatal circuits in modulating the exploration–exploitation trade-off across psychiatric disorders.
Effect of age on exploration–exploitation trade-off
Though most studies are age-matched, but a crucial factor to consider is how age influences the exploration–exploitation balance. Adolescents, for example, tend to favor novelty-seeking and exploration-driven behavior due to ongoing brain development, especially in areas like the prefrontal cortex and striatum [38, 43, 47]. This age-related sensitivity contributes to the exploration-heavy patterns seen in disorders like anorexia nervosa, which often emerges during adolescence [48, 64].
On the other hand, older adults typically lean toward exploitative choices, reflecting changes in dopamine function and a preference for stability over new experiences. These patterns align with observations in conditions like Parkinson’s disease, where individuals often show a strong tendency toward exploitative decision-making. While research has included age-matched comparison groups, interpreting findings about age-related trends requires care. For example, people with neurodevelopmental disorders such as ADHD and ASD often display heightened exploratory behavior [6, 33]. In contrast, older individuals with these conditions may face cognitive decline that intensifies their reliance on exploitative approaches.
Clinical implications and personalized therapies
The exploration–exploitation framework carries important implications for creating tailored treatments for disorders linked to dopamine dysfunction and related illnesses. For instance, gambling disorder, marked by flawed risk-taking and reward processing, suggests that therapies aimed at improving striatal and amygdala activity could restore balance to decision-making abilities [34, 35]. Likewise, neurodevelopmental conditions such as ADHD may benefit from personalized cognitive training designed to reduce excessive and ineffective exploration caused by corticostriatal issues [69].
All interventions can be medication, behavioral and game, or stimulation and etc. can be tailored by considering their effects on dopamine, norepinephrine, confidence and uncertainty, and brain plasticity. Recent progress in neuromodulation techniques—like transcranial magnetic stimulation (TMS) and transcranial direct current stimulation (tDCS)—offers potential for targeting brain areas tied to decision-making difficulties, including the dorsolateral prefrontal cortex (dlPFC) and anterior cingulate cortex (ACC) [31, 40, 79].
The exploration–exploitation framework highlights opportunities for tailored treatments that focus on brain regions tied to decision-making. In conditions like obsessive–compulsive disorder (OCD), where faulty fronto-striatal pathways lead to rigid thinking and repetitive actions, therapies targeting these areas could prove helpful [59]. The dorsolateral prefrontal cortex (dlPFC) and anterior cingulate cortex (ACC), both essential for managing the choice between exploration and exploitation, stand out as key areas for neuromodulation efforts aimed at improving decision-making [23].
Similarly, for addictive disorders like gambling, treatments that adjust activity in the amygdala and insula could strengthen learning and reward processing [34]. Long-term studies are needed to observe how exploration–exploitation patterns evolve over time, offering better understanding of how age shapes these dynamics. Such work could uncover valuable links between age, specific disorder-related challenges, and decision-making tendencies, guiding the design of future treatments.
Addressing inconsistencies
Variations in research on psychiatric disorders and the exploration–exploitation balance stem from differences in study methods, sample diversity, biological factors, and reporting biases. Experimental setups—like bandit tasks and reinforcement learning models—vary in design and reward setups, making direct comparisons tricky. Due to implicit measures in some papers, it is possible that other non-included articles also have exploration–exploitation related measures that not captured in our early screening, especially because of our limitation on searching just one database, Pubmed. Participant differences, such as medication use, symptom intensity, and co-occurring conditions, also contribute to uneven findings. Biological elements, like dopamine function, and external factors, such as stress or sleep, further influence decision-making. To add some information for this part, we included some studies too that were on disorder related traits. Though this makes the exclusion criteria less stringent, but added more related information to our review. Moreover, publication biases may tilt the literature toward notable results. To tackle these issues, standardizing study designs, using well-defined participant groups, accounting for biological influences, and boosting transparency through pre-registration and open data sharing are practical steps forward.
Limitations and future directions
Although our review focused on age-appropriate comparisons within the selected studies, we acknowledge the need for a more nuanced examination of how age interacts with exploration/exploitation behavior across the lifespan. Future research should incorporate broader age ranges and longitudinal methodologies to explore these dynamics in greater depth and to understand their implications for clinical interventions. Additionally, computational models that integrate reinforcement learning principles with ecological validity could offer a promising direction for strengthening the robustness of findings in this area. Such models could provide more precise and dynamic representations of how exploration and exploitation behaviors evolve across different ages.
Furthermore, neuroimaging meta-analyses have the potential to clarify the neural substrates underlying exploration–exploitation dynamics, refining both diagnostic criteria and therapeutic approaches [56]. While PubMed served as the primary database for this review, we recognize that it may not fully capture the interdisciplinary nature of this topic. Due to time constraints and the scope of the review, we focused on PubMed because it provides comprehensive coverage of studies central to our research question. However, other databases might have yielded additional relevant studies, and including a broader range of interdisciplinary sources in future updates would likely enrich the findings and provide a more holistic understanding of the subject. Several additional limitations warrant explicit discussion. Publication bias is a critical concern, as studies reporting null or inconclusive findings may be underrepresented in the literature, potentially skewing our understanding of task sensitivity and disorder-specific deficits. This bias limits the generalizability of conclusions and underscores the need for greater transparency and accessibility of all research findings, regardless of their outcomes.
Moreover, heterogeneity in task designs, including differences in paradigms, outcome metrics, and experimental protocols, presents challenges for cross-study comparisons. Such variability may obscure task-specific sensitivities and hinder efforts to synthesize findings across diverse studies. Additionally, the limited number of studies addressing specific psychiatric disorders restricts our ability to comprehensively map exploration–exploitation imbalances to particular conditions. Disorders that are less frequently studied may lack adequate representation, resulting in gaps in our understanding of disorder-specific mechanisms.
To overcome these limitations, we propose the development and adoption of standardized experimental designs in future research. This includes consistently using well-defined metrics, harmonizing task paradigms, and ensuring uniform reporting standards. Standardization would enhance comparability across studies, improve reproducibility, and strengthen the translational value of research findings. Furthermore, by systematically examining the sensitivity of different tasks to exploration–exploitation deficits, standardized approaches can help clarify which paradigms are most effective in capturing specific behavioral disruptions in various disorders. Incorporating these measures will allow future research to address publication bias, reduce variability in study designs, and better represent under-studied populations. Such advancements would provide a more reliable foundation for identifying disorder-specific mechanisms underlying exploration and exploitation behaviors and support the development of targeted clinical interventions.
Conclusion
Psychiatric disorders affect the exploration–exploitation balance in subtle yet related ways, reflecting wider cognitive and brain-based challenges. By grouping these conditions into addiction-like, mood-psychiatric, and neurological categories, this review offers a cohesive model that ties together varied findings and tackles research inconsistencies. We showed the importance of uncertainty handling in the brain and disorders as a key to understand exploration–exploitation balance in disorders. Uncertainty is represented by dopamine and norepinephrine and their modulation impacts the exploratory behavior. This common theme across studies provides insights that open doors to refining personalized therapies, connecting basic research with clinical practice, and enhancing our grasp of decision-making in mental health.
Data availability
No datasets were generated or analysed during the current study.
References
Aberg KC, Toren I, Paz R. A neural and behavioral trade-off between value and uncertainty underlies exploratory decisions in normative anxiety. Mol Psychiatry. 2021. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41380-021-01363-z.
Aberg KC, Toren I, Paz R. A neural and behavioral trade-off between value and uncertainty underlies exploratory decisions in normative anxiety. Mol Psychiatry. 2022;27(3):1573–87. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41380-021-01363-z.
Abram SV, Redish AD, MacDonald AW 3rd. Learning From Loss After Risk: Dissociating Reward Pursuit and Reward Valuation in a Naturalistic Foraging Task. Front Psych. 2019;10:359–359. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpsyt.2019.00359.
Addicott MA, Pearson JM, Froeliger B, Platt ML, McClernon FJ. Smoking automaticity and tolerance moderate brain activation during explore-exploit behavior. Psychiatry Res. 2014;224(3):254–61. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.pscychresns.2014.10.014.
Addicott MA, Pearson JM, Kaiser N, Platt ML, McClernon FJ. Suboptimal foraging behavior: a new perspective on gambling. Behav Neurosci. 2015;129(5):656–65. https://doiorg.publicaciones.saludcastillayleon.es/10.1037/bne0000082.
Addicott MA, Pearson JM, Schechter JC, Sapyta JJ, Weiss MD, Kollins SH. Attention-deficit/hyperactivity disorder and the explore/exploit trade-off. Neuropsychopharmacology. 2021;46(3):614–21. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41386-020-00881-8.
Addicott MA, Pearson JM, Sweitzer MM, Barack DL, Platt ML. A Primer on Foraging and the Explore/Exploit Trade-Off for Psychiatry Research. Neuropsychopharmacology. 2017;42(10):1931–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/npp.2017.108.
Addicott MA, Pearson JM, Wilson J, Platt ML, McClernon FJ. Smoking and the bandit: a preliminary study of smoker and nonsmoker differences in exploratory behavior measured with a multiarmed bandit task. Exp Clin Psychopharmacol. 2013;21(1):66–73. https://doiorg.publicaciones.saludcastillayleon.es/10.1037/a0030843.
Averbeck BB. Theory of choice in bandit, information sampling and foraging tasks. PLoS Comput Biol. 2015;11(3): e1004164.
Aylward J, Valton V, Ahn W-Y, Bond RL, Dayan P, Roiser JP, Robinson OJ. Altered learning under uncertainty in unmedicated mood and anxiety disorders. Nat Hum Behav. 2019;3(10):1116–23. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41562-019-0628-0.
Barto, R. S. S. a. A. G. (2018). Reinforcement Learning: An Introduction. (The MIT Press)
Bechara A, Damasio AR, Damasio H, Anderson SW. Insensitivity to future consequences following damage to human prefrontal cortex. Cognition. 1994;50(1–3):7–15. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/0010-0277(94)90018-3.
Bechara A, Damasio H, Tranel D, Damasio AR. Deciding advantageously before knowing the advantageous strategy. Science. 1997;275(5304):1293–5. https://doiorg.publicaciones.saludcastillayleon.es/10.1126/science.275.5304.1293.
Bechara A, Tranel D, Damasio H, Damasio AR. Failure to respond autonomically to anticipated future outcomes following damage to prefrontal cortex. Cereb Cortex. 1996;6(2):215–25. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/cercor/6.2.215.
Bell, W. J. (1991). Searching Behaviour: The behavioural ecology of finding resources (Chapman & Hall Animal Behaviour Series) 1990th Edition. 370 pages. ( Springer; 1990th edition (January 31, 1991))
Blanco NJ, Otto AR, Maddox WT, Beevers CG, Love BC. The influence of depression symptoms on exploratory decision-making. Cognition. 2013;129(3):563–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.cognition.2013.08.018.
Bódi N, Kéri S, Nagy H, Moustafa A, Myers CE, Daw N, Dibó G, Takáts A, Bereczki D, Gluck MA. Reward-learning and the novelty-seeking personality: a between- and within-subjects study of the effects of dopamine agonists on young Parkinson’s patients. Brain. 2009;132(Pt 9):2385–95. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/brain/awp094.
Carlisi CO, Norman L, Murphy CM, Christakou A, Chantiluke K, Giampietro V, Simmons A, Brammer M, Murphy DG, Mataix-Cols D, Rubia K. Shared and Disorder-Specific Neurocomputational Mechanisms of Decision-Making in Autism Spectrum Disorder and Obsessive-Compulsive Disorder. Cereb Cortex. 2017;27(12):5804–16. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/cercor/bhx265.
Cathomas F, Klaus F, Guetter K, Chung HK, Raja Beharelle A, Spiller TR, Schlegel R, Seifritz E, Hartmann-Riemer MN, Tobler PN, Kaiser S. Increased random exploration in schizophrenia is associated with inflammation. NPJ Schizophr. 2021;7(1):6. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41537-020-00133-0.
Cella M, Dymond S, Cooper A. Impaired flexible decision-making in major depressive disorder. Journal of Affective Disorders. 2010;124(1):207–10. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jad.2009.11.013.
Chakroun K, Wiehler A, Wagner B, Mathar D, Ganzer F, van Eimeren T, Sommer T, Peters J. Dopamine regulates decision thresholds in human reinforcement learning in males. Nat Commun. 2023;14(1):5369.
CogliatiDezza I, Cleeremans A, Alexander W. Should we control? The interplay between cognitive control and information integration in the resolution of the exploration-exploitation dilemma. J Exp Psychol Gen. 2019;148(6):977–93. https://doiorg.publicaciones.saludcastillayleon.es/10.1037/xge0000546.
Cohen JD, McClure SM, Yu AJ. Should I stay or should I go? How the human brain manages the trade-off between exploitation and exploration. Philosophical transactions of the Royal Society of London. Series B, Biological sciences. 2007;362(1481):933–42. https://doiorg.publicaciones.saludcastillayleon.es/10.1098/rstb.2007.2098.
Constantino SM, Daw ND. Learning the opportunity cost of time in a patch-foraging task. Cogn Affect Behav Neurosci. 2015;15(4):837–53. https://doiorg.publicaciones.saludcastillayleon.es/10.3758/s13415-015-0350-y.
De Sousa C, Uva M, Luminet O, Cortesi M, Constant E, Derely M, De Timary P. Distinct effects of protracted withdrawal on affect, craving, selective attention and executive functions among alcohol-dependent patients. Alcohol Alcohol. 2010;45(3):241–6. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/alcalc/agq012.
Culbreth AJ, Gold JM, Cools R, Barch DM. Impaired Activation in Cognitive Control Regions Predicts Reversal Learning in Schizophrenia. Schizophr Bull. 2015;42(2):484–93. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/schbul/sbv075.
Daw ND, O’Doherty JP, Dayan P, Seymour B, Dolan RJ. Cortical substrates for exploratory decisions in humans. Nature. 2006;441(7095):876–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/nature04766.
de Visser L, van der Knaap LJ, van de Loo AJ, van der Weerd CM, Ohl F, van den Bos R. Trait anxiety affects decision-making differently in healthy men and women: towards gender-specific endophenotypes of anxiety. Neuropsychologia. 2010;48(6):1598–606. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.neuropsychologia.2010.01.027.
Djamshidian A, O’Sullivan SS, Wittmann BC, Lees AJ, Averbeck BB. Novelty seeking behaviour in Parkinson’s disease. Neuropsychologia. 2011;49(9):2483–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.neuropsychologia.2011.04.026.
Dombrovski AY, Clark L, Siegle GJ, Butters MA, Ichikawa N, Sahakian BJ, Szanto K. Reward/Punishment reversal learning in older suicide attempters. Am J Psychiatry. 2010;167(6):699–707. https://doiorg.publicaciones.saludcastillayleon.es/10.1176/appi.ajp.2009.09030407.
Estaji R, Hosseinzadeh M, Arabgol F, Nejati V. Transcranial direct current stimulation (tDCS) improves emotion regulation in children with attention-deficit hyperactivity disorder (ADHD). Sci Rep. 2024;14(1):13889. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/s41598-024-64886-9.
Friston K. The free-energy principle: a unified brain theory? Nat Rev Neurosci. 2010;11(2):127–38. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/nrn2787.
Furukawa E, Alsop B, Alves H, Vorderstrasse V, Carrasco KD, Chuang C-C, Tripp G. Disrupted waiting behavior in ADHD: exploring the impact of reward availability and predictive cues. Child Neuropsychol. 2023;29(1):76–95. https://doiorg.publicaciones.saludcastillayleon.es/10.1080/09297049.2022.2068518.
García-Castro J, Cancela A, Cárdaba MA. Neural cue-reactivity in pathological gambling as evidence for behavioral addiction: a systematic review. Curr Psychol. 2023;42(32):28026–37.
Genauck A, Matthis C, Andrejevic M, Ballon L, Chiarello F, Duecker K, Heinz A, Kathmann N, Romanczuk-Seiferth N. Neural correlates of cue-induced changes in decision-making distinguish subjects with gambling disorder from healthy controls. Addict Biol. 2021;26(3): e12951.
Giannunzio V, Degortes D, Tenconi E, Collantoni E, Solmi M, Santonastaso P, Favaro A. Decision-making impairment in anorexia nervosa: New insights into the role of age and decision-making style. Eur Eat Disord Rev. 2018;26(4):302–14. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/erv.2595.
Gittins, J. C., & Jones, D. M. (1974). A dynamic allocation index for the sequential design of experiments Progress in statistics 1 ; 1. - Amsterdam [u.a.] : North-Holland Publ, 241–266.
Gopnik A. Childhood as a solution to explore–exploit tensions. Philosophical Transactions of the Royal Society B: Biological Sciences. 2020;375(1803):20190502. https://doiorg.publicaciones.saludcastillayleon.es/10.1098/rstb.2019.0502.
Goudriaan AE, Oosterlaan J, de Beurs E, van den Brink W. Decision making in pathological gambling: a comparison between pathological gamblers, alcohol dependents, persons with Tourette syndrome, and normal controls. Brain Res Cogn Brain Res. 2005;23(1):137–51. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.cogbrainres.2005.01.017.
Harika-Germaneau G, Gosez J, Bokam P, Guillevin R, Doolub D, Thirioux B, Wassouf I, Germaneau A, Langbour N, Jaafari N. Investigating brain structure and tDCS response in obsessive-compulsive disorder. Journal of Psychiatric Research. 2024;177:39–45. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.jpsychires.2024.06.053.
Harlé KM, Guo D, Zhang S, Paulus MP, Yu AJ. Anhedonia and anxiety underlying depressive symptomatology have distinct effects on reward-based decision-making. PLoS ONE. 2017;12(10):e0186473. https://doiorg.publicaciones.saludcastillayleon.es/10.1371/journal.pone.0186473.
Harlé KM, Zhang S, Schiff M, Mackey S, Paulus MP, Yu AJ. Altered statistical learning and decision-making in methamphetamine dependence: Evidence from a two-armed bandit task [Article]. Frontiers in Psychology. 2015;6(DEC):Article 01910. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpsyg.2015.01910.
Harms MB, Xu Y, Green CS, Woodard K, Wilson R, Pollak SD. The structure and development of explore-exploit decision making. Cognitive Psychology. 2024;150:101650. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.cogpsych.2024.101650.
Hauser TU, Fiore VG, Moutoussis M, Dolan RJ. Computational Psychiatry of ADHD: Neural Gain Impairments across Marrian Levels of Analysis. Trends Neurosci. 2016;39(2):63–73. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.tins.2015.12.009.
Hauser TU, Iannaccone R, Ball J, Mathys C, Brandeis D, Walitza S, Brem S. Role of the medial prefrontal cortex in impaired decision making in juvenile attention-deficit/hyperactivity disorder. JAMA Psychiat. 2014;71(10):1165–73. https://doiorg.publicaciones.saludcastillayleon.es/10.1001/jamapsychiatry.2014.1093.
Kembro JM, Lihoreau M, Garriga J, Raposo EP, Bartumeus F. Bumblebees learn foraging routes through exploitation-exploration cycles. J R Soc Interface. 2019;16(156):20190103. https://doiorg.publicaciones.saludcastillayleon.es/10.1098/rsif.2019.0103.
Kim S, Carlson SM. Understanding explore-exploit dynamics in child development: Current insights and future directions. Frontiers in Developmental Psychology. 2024;2:1467880. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fdpys.2024.1467880.
Kohn M, Golden NH. Eating Disorders in Children and Adolescents. Paediatr Drugs. 2001;3(2):91–9. https://doiorg.publicaciones.saludcastillayleon.es/10.2165/00128072-200103020-00002.
Kristjánsson Á, Helgadóttir A, Kristjánsson T. Eating disorder symptoms and foraging for food related items. J Eat Disord. 2021;9(1):18. https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s40337-021-00373-0.
Lamba A, Frank MJ, FeldmanHall O. Anxiety Impedes Adaptive Social Learning Under Uncertainty. Psychol Sci. 2020;31(5):592–603. https://doiorg.publicaciones.saludcastillayleon.es/10.1177/0956797620910993.
Lawrence AD, Sahakian BJ, Rogers RD, Hodge JR, Robbins TW. Discrimination, reversal, and shift learning in Huntington’s disease: mechanisms of impaired response selection. Neuropsychologia. 1999;37(12):1359–74. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/s0028-3932(99)00035-4.
Le Heron C, Kolling N, Plant O, Kienast A, Janska R, Ang Y-S, Fallon S, Husain M, Apps MA. Dopamine modulates dynamic decision-making during foraging. J Neurosci. 2020;40(27):5273–82.
Lenow JK, Constantino SM, Daw ND, Phelps EA. Chronic and Acute Stress Promote Overexploitation in Serial Decision Making. J Neurosci. 2017;37(23):5681–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1523/jneurosci.3618-16.2017.
Lloyd A, McKay R, Sebastian CL, Balsters JH. Are adolescents more optimal decision-makers in novel environments? Examining the benefits of heightened exploration in a patch foraging paradigm. Dev Sci. 2021;24(4):e13075. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/desc.13075.
MacDonald HJ, Kleppe R, Szigetvari PD, Haavik J. The dopamine hypothesis for ADHD: An evaluation of evidence accumulated from human studies and animal models. Frontiers in Psychiatry. 2024;15:1492126. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fpsyt.2024.1492126.
Marković D, Goschke T, Kiebel SJ. Meta-control of the exploration-exploitation dilemma emerges from probabilistic inference over a hierarchy of time scales. Cogn Affect Behav Neurosci. 2021;21:509–33.
Martinelli C, Rigoli F, Averbeck B, Shergill SS. The value of novelty in schizophrenia. Schizophr Res. 2018;192:287–93. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.schres.2017.05.007.
Marzuki AA, Tomic I, Ip SHY, Gottwald J, Kanen JW, Kaser M, Sule A, Conway-Morris A, Sahakian BJ, Robbins TW. Association of Environmental Uncertainty With Altered Decision-making and Learning Mechanisms in Youths With Obsessive-Compulsive Disorder. JAMA Netw Open. 2021;4(11):e2136195. https://doiorg.publicaciones.saludcastillayleon.es/10.1001/jamanetworkopen.2021.36195.
Mataix-Cols D, van den Heuvel OA. Common and Distinct Neural Correlates of Obsessive-Compulsive and Related Disorders. Psychiatric Clinics. 2006;29(2):391–410. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.psc.2006.02.006.
Matsuzawa D, Shirayama Y, Niitsu T, Hashimoto K, Iyo M. Deficits in emotion based decision-making in schizophrenia; a new insight based on the Iowa Gambling Task. Prog Neuropsychopharmacol Biol Psychiatry. 2015;57:52–9. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.pnpbp.2014.10.007.
Metha JA, Brian ML, Oberrauch S, Barnes SA, Featherby TJ, Bossaerts P, Murawski C, Hoyer D, Jacobson LH. Separating Probability and Reversal Learning in a Novel Probabilistic Reversal Learning Task for Mice. Front Behav Neurosci. 2019;13:270. https://doiorg.publicaciones.saludcastillayleon.es/10.3389/fnbeh.2019.00270.
Morris LS, Baek K, Kundu P, Harrison NA, Frank MJ, Voon V. Biases in the Explore-Exploit Tradeoff in Addictions: The Role of Avoidance of Uncertainty. Neuropsychopharmacology. 2016;41(4):940–8. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/npp.2015.208.
Mussey JL, Travers BG, Klinger LG, Klinger MR. Decision-making skills in ASD: performance on the Iowa Gambling Task. Autism Res. 2015;8(1):105–14. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/aur.1429.
Nagl M, Jacobi C, Paul M, Beesdo-Baum K, Höfler M, Lieb R, Wittchen H-U. Prevalence, incidence, and natural course of anorexia and bulimia nervosa among adolescents and young adults. Eur Child Adolesc Psychiatry. 2016;25(8):903–18. https://doiorg.publicaciones.saludcastillayleon.es/10.1007/s00787-015-0808-z.
Patzelt EH, Kurth-Nelson Z, Lim KO, MacDonald AW 3rd. Excessive state switching underlies reversal learning deficits in cocaine users. Drug Alcohol Depend. 2014;134:211–7. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.drugalcdep.2013.09.029.
Perandrés-Gómez A, Navas JF, van Timmeren T, Perales JC. Decision-making (in)flexibility in gambling disorder. Addict Behav. 2021;112:106534. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.addbeh.2020.106534.
Reddy LF, Waltz JA, Green MF, Wynn JK, Horan WP. Probabilistic Reversal Learning in Schizophrenia: Stability of Deficits and Potential Causal Mechanisms. Schizophr Bull. 2016;42(4):942–51. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/schbul/sbv226.
Robinson AH, Perales JC, Volpe I, Chong TT, Verdejo-Garcia A. Are methamphetamine users compulsive? Faulty reinforcement learning, not inflexibility, underlies decision making in people with methamphetamine use disorder. Addict Biol. 2021;26(4):e12999. https://doiorg.publicaciones.saludcastillayleon.es/10.1111/adb.12999.
Rosa VDO, Schmitz M, Moreira-Maia CR, Wagner F, Londero I, Bassotto CDF, Moritz G, Souza CDSD, Rohde LAP. Computerized cognitive training in children and adolescents with attention deficit/hyperactivity disorder as add-on treatment to stimulants: feasibility study and protocol description. Trends in psychiatry and psychotherapy. 2017;39:65–76.
Rutledge RB, Lazzaro SC, Lau B, Myers CE, Gluck MA, Glimcher PW. Dopaminergic drugs modulate learning rates and perseveration in Parkinson’s patients in a dynamic foraging task. J Neurosci. 2009;29(48):15104–14. https://doiorg.publicaciones.saludcastillayleon.es/10.1523/jneurosci.3524-09.2009.
Saperia S, Da Silva S, Siddiqui I, Agid O, Daskalakis ZJ, Ravindran A, Voineskos AN, Zakzanis KK, Remington G, Foussias G. Reward-driven decision-making impairments in schizophrenia. Schizophr Res. 2019;206:277–83. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.schres.2018.11.004.
Schmid-Hempel, P. (1988). Stephens, D. W., and J. R. Krebs, Foraging Theory, Princeton University Press, Princeton. 1986. 247 pp. $ 14.50 (pbk.) $ 40.00 (cloth) [https://doiorg.publicaciones.saludcastillayleon.es/10.1046/j.1420-9101.1988.1010086.x]. Journal of Evolutionary Biology, 1(1), 86–88. https://doiorg.publicaciones.saludcastillayleon.es/10.1046/j.1420-9101.1988.1010086.x.
Schultz W. Multiple dopamine functions at different time courses. Annu Rev Neurosci. 2007;30:259–88. https://doiorg.publicaciones.saludcastillayleon.es/10.1146/annurev.neuro.28.061604.135722.
Smith R, Schwartenbeck P, Stewart JL, Kuplicki R, Ekhtiari H, Paulus MP, Tulsa I. Imprecise action selection in substance use disorder: Evidence for active learning impairments when solving the explore-exploit dilemma [Article]. Drug Alcohol Depend. 2020;215:Article 108208. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.drugalcdep.2020.108208.
Solomon M, Smith AC, Frank MJ, Ly S, Carter CS. Probabilistic reinforcement learning in adults with autism spectrum disorders. Autism Res. 2011;4(2):109–20. https://doiorg.publicaciones.saludcastillayleon.es/10.1002/aur.177.
Stephens, D. W., & Krebs, J. R. (1986). Foraging theory (Vol. 6). Princeton university press.
Strauss GP, Frank MJ, Waltz JA, Kasanova Z, Herbener ES, Gold JM. Deficits in positive reinforcement learning and uncertainty-driven exploration are associated with distinct aspects of negative symptoms in schizophrenia. Biol Psychiatry. 2011;69(5):424–31. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.biopsych.2010.10.015.
Strauss GP, Robinson BM, Waltz JA, Frank MJ, Kasanova Z, Herbener ES, Gold JM. Patients with schizophrenia demonstrate inconsistent preference judgments for affective and nonaffective stimuli. Schizophr Bull. 2011;37(6):1295–304. https://doiorg.publicaciones.saludcastillayleon.es/10.1093/schbul/sbq047.
Summerell E, Xiao W, Huang C, Terranova J, Gilam G, Riva P, Denson TF. The effects of transcranial direct current stimulation over the ventromedial prefrontal cortex on reactive aggression in intoxicated and sober individuals. Biological Psychology. 2024;193:108899. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.biopsycho.2024.108899.
Urošević, S., Halverson, T., Youngstrom, E. A., & Luciana, M. (2018). Probabilistic reinforcement learning abnormalities and their correlates in adolescent bipolar disorders. J Abnorm Psychol, 127(8):807–817. https://doiorg.publicaciones.saludcastillayleon.es/10.1037/abn0000388 (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6242350/).
van Dooren R, de Kleijn R, Hommel B, Sjoerds Z. The exploration-exploitation trade-off in a foraging task is affected by mood-related arousal and valence. Cogn Affect Behav Neurosci. 2021;21(3):549–60. https://doiorg.publicaciones.saludcastillayleon.es/10.3758/s13415-021-00917-6.
Verharen JPH, Danner UN, Schröder S, Aarts E, van Elburg AA, Adan RAH. Insensitivity to Losses: A Core Feature in Patients With Anorexia Nervosa? Biol Psychiatry Cogn Neurosci Neuroimaging. 2019;4(11):995–1003. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.bpsc.2019.05.001.
Vinckier F, Gaillard R, Palminteri S, Rigoux L, Salvador A, Fornito A, Adapa R, Krebs MO, Pessiglione M, Fletcher PC. Confidence and psychosis: a neuro-computational account of contingency learning disruption by NMDA blockade. Mol Psychiatry. 2016;21(7):946–55. https://doiorg.publicaciones.saludcastillayleon.es/10.1038/mp.2015.73.
von Helversen B, Mata R, Samanez-Larkin GR, Wilke A. Foraging, exploration, or search? On the (lack of) convergent validity between three behavioral paradigms. Evol Behav Sci. 2018;12(3):152–62. https://doiorg.publicaciones.saludcastillayleon.es/10.1037/ebs0000121.
Waltz JA, Frank MJ, Wiecki TV, Gold JM. Altered probabilistic learning and response biases in schizophrenia: behavioral evidence and neurocomputational modeling. Neuropsychology. 2011;25(1):86–97. https://doiorg.publicaciones.saludcastillayleon.es/10.1037/a0020882.
Waltz JA, Kasanova Z, Ross TJ, Salmeron BJ, McMahon RP, Gold JM, Stein EA. The roles of reward, default, and executive control networks in set-shifting impairments in schizophrenia. PLoS ONE. 2013;8(2):e57257. https://doiorg.publicaciones.saludcastillayleon.es/10.1371/journal.pone.0057257.
Wiehler A, Chakroun K, Peters J. Attenuated directed exploration during reinforcement learning in gambling disorder [Article]. J Neurosci. 2021;41(11):2512–22. https://doiorg.publicaciones.saludcastillayleon.es/10.1523/JNEUROSCI.1607-20.2021.
Winterhalder, B., & Smith, E. (1981). Hunter-Gatherer Foraging Strategies: Ethnographic and Archeological Analyses (Vol. 18). https://doiorg.publicaciones.saludcastillayleon.es/10.2307/2801481.
Yechiam E, Arshavsky O, Shamay-Tsoory SG, Yaniv S, Aharon J. Adapted to explore: reinforcement learning in Autistic Spectrum Conditions. Brain Cogn. 2010;72(2):317–24. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.bandc.2009.10.005.
Yu AJ, Dayan P. Uncertainty, Neuromodulation, and Attention. Neuron. 2005;46(4):681–92. https://doiorg.publicaciones.saludcastillayleon.es/10.1016/j.neuron.2005.04.026.
Zhu, C., Zhou, K., Tang, F., Tang, Y., Li, X., & Si, B. (2022). A Hierarchical Bayesian Model for Inferring and Decision Making in Multi-Dimensional Volatile Binary Environments. Mathematics, 10(24):4775. https://www.mdpi.com/2227-7390/10/24/4775.
Acknowledgements
We want to extend our gratitude to all those who contributed to this systematic review. First and foremost, we thank our colleagues and peers who provided valuable feedback and guidance throughout the process. Your insights and constructive criticism were instrumental in refining this review. We are also grateful to the librarians and information specialists who assisted in developing our search strategy and accessing the necessary resources and databases. Your expertise was invaluable. Additionally, we acknowledge the efforts of the researchers and authors of the studies included in this review. Your work forms the foundation of our analysis and conclusions. Lastly, we thank our families and friends for their unwavering support and understanding during the time spent on this project. Your encouragement has been greatly appreciated. This systematic review received no funding.
Clinical trial number
Not applicable.
Funding
No fund was involved in carrying out the study.
Author information
Authors and Affiliations
Contributions
Ali Jami has done the main searches and screening, data extraction and drafting the manuscript, Sajjad Abbaszade has helped in screening and data extraction and editing the draft, Abdol-Hossein Vahabie has supervised all steps and edited the draft.
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
About this article
Cite this article
Jami, A., Abbaszade, S. & Vahabie, AH. A review on exploration–exploitation trade-off in psychiatric disorders. BMC Psychiatry 25, 420 (2025). https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12888-025-06837-w
Received:
Accepted:
Published:
DOI: https://doiorg.publicaciones.saludcastillayleon.es/10.1186/s12888-025-06837-w