2017 年，心理学领域出现了哪些反转了经典理论的研究结果？

2017 年，心理学界确实出现了一些令人振奋的研究，它们挑战甚至在某种程度上“反转”了我们过去根深蒂固的理论。这些新发现并非否定过去的一切，而是为我们提供了一个更精细、更具人性化的视角来理解人类的思维和行为。以下是一些具有代表性的例子，我将尽力为大家详细道来，并力求用更贴近人心的语言呈现：

1. 关于“自我控制”的神话：重复测试或可能削弱而非强化？

长久以来，我们都相信“自我控制”就像一块肌肉，越锻炼越强壮。著名的“棉花糖实验”就是其经典代表，孩子们只要能抵制住立即享用棉花糖的诱惑，就能获得更大的回报。这个理论指导着无数的育儿、教育和励志书籍。

然而，2017 年发表的一些研究，特别是对“自我控制”的元分析（metaanalysis，即汇总多项研究结果的分析），开始对此提出质疑。这些研究发现，“训练”自我控制的效果可能并没有我们想象的那么稳定和显著。

更具体地说，过去的一些实验设计可能存在一定的偏差。例如，那些在“自我控制”任务中表现出色的人，他们可能并非是因为“锻炼”出了超强的意志力，而是在一开始就拥有更好的执行功能（executive functions），比如更好的计划能力、情绪调节能力和冲动控制能力。这就像一个天生跑得快的人，你让他多跑几圈，他依然会比别人快，但这并不能证明“多跑”本身能让他变得更快。

更令人吃惊的是，一些研究甚至暗示，反复进行“自我控制”任务，可能会消耗我们有限的心理资源，反而导致后续的自我控制能力下降。这就像让你连续搬很多箱子，虽然你觉得自己“锻炼”了，但到了最后，你可能因为疲惫而搬不动了。

这对于我们理解“坚持”和“毅力”的本质提出了新的思考。也许，与其过度强调“锻炼”意志力，不如关注如何优化环境、减少诱惑，以及培养更有效的应对策略。比如，与其强迫自己“克制”食欲，不如把不健康的食物放在不容易拿到的地方。

2. “认知失调”的重新审视：我们真的会为了合理化自己的行为而扭曲事实吗？

“认知失调”理论是心理学中一个非常重要的概念，它指出当我们的信念与行为不一致时，我们会感到不适，并倾向于改变其中一个，以达到心理上的和谐。最经典的例子是“扫帚实验”，实验者让参与者评价一个非常无聊的活动，然后让他们去说服下一个参与者这个活动很有趣。那些获得少量报酬（1美元）的人，比获得大量报酬（20美元）的人，更倾向于相信这个活动确实很有趣，因为他们无法为自己撒谎的行为找到足够的外部理由，只能通过改变自己的内在信念来合理化。

然而，2017 年的研究，尤其是基于对大量过去实验的重新分析和更严谨的实验设计，开始挑战“认知失调”的普遍性和强度。一些研究者认为，人们并非总是会为了减少失调而改变自己的信念。

研究发现，许多时候，人们更倾向于回忆和关注那些支持自己已有信念的信息，或者选择性地忽略那些与之矛盾的信息，而不是主动“扭曲”事实。这种“确认偏误”（confirmation bias）可能在“认知失调”的产生和解决过程中扮演着更重要的角色。

换句话说，我们可能不是“被迫”相信自己做了蠢事，而是我们更擅长“寻找证据”来证明自己当时的选择是明智的，即使这意味着要忽略一些不方便的真相。这并不是说认知失调不存在，而是它可能不像我们之前认为的那么“强大”，并且可能与其他认知机制紧密交织。

这让我们反思，在面对错误的决定时，我们是真正地改变了想法，还是只是在“自欺欺人”，但这次“自欺欺人”的方式更像是“选择性遗忘”和“证据筛选”，而不是彻底的信念重塑。

3. “社会认知”的新视角：我们并非总是受他人观点影响，有时我们更相信自己“看见”的。

社会心理学的一个核心观点是我们非常容易受到他人观点和行为的影响，这种“社会影响”无处不在。例如，“旁观者效应”就说明，在紧急情况下，人越多，越没有人愿意施以援手，因为大家都觉得别人会帮忙。

但在2017年，一些研究开始强调个体在社会互动中的主动性和独立性，尤其是在信息接收和判断方面。研究发现，在某些情况下，人们更倾向于相信自己的直接观察和经验，而不是他人的意见或提示。

例如，在一系列关于“信息瀑布”（information cascades）的实验中，研究者发现，当人们有机会直接观察到物品的质量或价值时，他们对他人（尤其是陌生人）的评价的依赖性会显著降低。即使有一部分人给出了负面评价，但如果大部分人（或者通过其他方式）都传递了积极信息，个体仍然会倾向于相信自己的判断。

这挑战了过去那种“人云亦云”的简单化理解。我们并非是被动的信息接受者，而是在积极地处理和评估信息。当外部信息与我们的直接感知发生冲突时，我们可能更倾向于相信自己的“第一手资料”。

这对于我们理解“从众心理”和“独立思考”的关系，提供了更 nuanced（细致入微）的解释。我们可能不是单纯的“跟风”，而是在复杂的社会信息流中，有选择性地采信和判断。

总而言之，2017 年的研究结果并非要彻底推翻过去的理论，而是为我们描绘了一幅更复杂、更具层次的人类心理图景。它们提醒我们，科学的进步在于不断地挑战和修正，在于用更严谨的实验和更深入的分析去逼近真相。这些“反转”的研究，更像是给那些经典的理论戴上了“新帽子”，让我们看到它们在不同情境下的局限性，并激励我们继续探索人类心灵的奥秘。

网友意见

@Manolo 在回答中介绍了 Tyler W. Watts 新的棉花糖概念复制实验，因为科普化使文章简化许多细节。从底下评论区可看到有些人曲解和过度解读 Tyler Watts 的研究，有些读者可能认为 Mischel 的棉花糖实验是心灵鸡汤误导大众、Tyler Watts 的研究完全推翻棉花糖实验。若读者仅凭科普文章的介绍不看原来的文献，以自己的理解过度解释新研究很容易得出错误结论。

知乎不是学术交流的地方，鉴于高赞回答下的评论区存在着对于新研究和棉花糖实验的误解，所以写了这篇补充信息的回答。这是一篇只"科"不普的文章，引用和保留许多原文解释不做翻译，避免简化细节使读者又扭曲研究结论。提供有兴趣进一步理解棉花糖实验相关文献的读者，澄清棉花糖实验的一些迷思。

Tyler Watts 的主要结论

Tyler Watts 的研究是一种概念复制 (conceptual replication) 实验，并不是直接复制 Mischel 的棉花糖实验。实验方法上有些差异，例如实验等待时间 (7 min vs 15-20 min)、测量学术成绩 (WJ-R vs SAT) 不同实验方法。概念复制实验所得到的相关性、效应很可能与原来的实验有差异。

注：什么是概念复制
There is no substitute for direct replication – if you cannot reproduce the same result using the same methods then you cannot have a cumulative science. But conceptual replication also has a very important role to play in psychological science. What is conceptual replication? It’s when instead of replicating the exact same experiment in exactly the same way, we test the experiment’s underlying hypothesis using different methods. [1]

2. Tyler Watts 的概念复制实验，主要复制 Mischel 实验延迟满足 (delay gratification) 的两个预测结果，以母亲未完成大学学业的孩子组别来看

(1) academic achievement

整体上可说复制成功，延迟满足与成绩有相关性，细节上可区分成三部分来看：

I. 在不控制任何变量 (No Controls)：延迟满足与成绩的相关性约为原来研究效应的1/2，统计上有显着差异。
II. 控制孩子背景和家庭变量 (Child Background and HOME Controls Only)：延迟满足与成绩的相关性减少约为 No Controls 效应的2/3，控制变量后相关性变得更弱，在统计上仍有显着差异。
III. 控制孩子背景和家庭变量 + 早期认知能力 (Child background and HOME + Concurrent 54-Month Controls)：控制更多变量后，差异几乎全不显着(延迟满足等待 2–7 min 组，统计上有显着差异)，延迟满足等待时间的长短与成绩的相关性消失。

(2) behavioral outcomes

复制失败，延迟满足与行为结果 (例如反社会、抑郁…) 没有关联。详见下图 [9]，

Behavior composite 的部分。

图中，孩子的母亲没有大学学历组别，左半边为学术成就，右半边为行为问题的结果。

1, 4, 7, 10 栏为不控制任何变量 (No Controls)
2, 5, 8, 11 栏为控制孩子背景和家庭变量 (Child Background and HOME Controls)
3, 6, 9, 12 栏为控制孩子背景和家庭变量 + 早期认知能力 (Child background and HOME + Concurrent 54-Month Controls)

3. Tyler Watts 的研究中，社会经济地位 (Socioeconomic status, SES)、早期认知能力 (early cognitive ability) 都可能是重要影响因素，研究并无法得出 SES, early cognitive ability 何者影响较大。换句话说，相比于延迟满足影响成绩好坏，在这篇研究中更有可能是 SES, early cognitive ability 影响成绩好坏。研究并无法得出 SES 是唯一或主要影响成绩的因素。

上述为 Watts 研究的主要结论，至此若读者产生以下几点迷思误解，很可能是误解延迟满足相关实验的研究细节。

Watts 证明了Mischel 的棉花糖实验是毒心灵鸡汤错误研究之类的说法)
原始棉花糖实验，Mischel 没有提到其他解释因素
将延迟满足当作是万灵丹、预测未来成就的指标
延迟满足任务的设计无可取之处
我不喜欢吃甜的，以为棉花糖实验只提供受试者棉花糖吃
棉花糖实验只衡量人的自控力
延迟满足＝自控力，Watts 的研究显示自控力不重要
将认知能力测验 (智力测验) 当作衡量自控力与学术成就关联的唯一代表方法
Watts 的研究显示因果关系，认为 SES是唯一或主要导致成绩好坏的因素
Watts 的研究 Child Background and HOME Controls 变量，只包含家庭环境不包含孩子的认知能力

从上面几项迷思，若读者认为棉花糖实验完全无法复制、棉花糖实验是毒鸡汤、社会经济地位是主要影响因素、自控力完全不重要，恐怕读者是成功学文章读多了，忽略了 Mischel 在原来研究中提到的其他解释，甚至从 Watts 的研究想象杜撰出文献中没有的结论。

迷思1：Watts 证明了Mischel 的棉花糖实验是毒心灵鸡汤(错误研究之类的说法)

Tyler Watts 这篇研究"反转"棉花糖实验，Mischel 的研究部分复制成功、部分复制失败，但可没说棉花糖实验完全是个错误。

When our new results are interpreted, they should be viewed alongside the older studies, and should be seen as adding shades of complexity to what we already know — not interpreted as definitive proof that the original work was false. In many cases, treating a single study with the word “replication” in the title as definitive proof of anything falls prey to the same error that led us to over-interpret the results from the original work. [2]

迷思2：原始棉花糖实验，Mischel 没有提到其他解释因素

部分读者的既定印象很可能受成功学文章影响，研究结论拿来就用，却不提其他解释因素。《Inside Higher Ed》的一篇文章〈Softening Claims of the Marshmallow Test〉，有 Mischel 对于 Watts 研究的观点。Mischel 提到其他能力解释棉花糖实验、棉花糖实验测试寻找的是什么。

In an email, Mischel noted that the data set used by Watts et al. found "a significant positive correlation between delay time and academic achievement" that strengthens earlier findings.

He also said years of research by him and his colleagues, as well as by others, have found that "a child's ability to wait in the 'marshmallow test' situation reflects that child’s ability to engage various cognitive and emotion-regulation strategies and skills that make the waiting situation less frustrating. Therefore, it is expected and predictable, as the Watts paper shows, that once these cognitive and emotion-regulation skills, which are the skills that are essential for waiting, are statistically 'controlled out,' the correlation is indeed diminished."

He added, "The 'marshmallow test' was developed to find a method -- a window -- that allows us to see how people, especially children, manage to deal with the frustration of waiting for something they really want to have. This window opened the way for many experiments that identified the conditions and mental-emotional strategies and skills that make this challenge manageable or not. The long list includes, for example, trust in the promise-maker, and diverse cognitive and 'cooling' strategies to make the waiting easier." [3]

迷思3：将延迟满足当作是万灵丹、预测未来成就的指标

成功学文章有时以炒作误导性标题、过度解读概括研究结论。Watts 研究一发表，有些人将棉花糖实验实验冠上毒鸡汤之名，然而这些人往往忽略原先研究实验上的设计、研究限制、其他可能的影响因素，将棉花糖实验的结论拿来就用。Mischel 在文章中指出，棉花糖实验并不是预测未来的水晶球，训练孩子等待棉花糖也不是万灵丹。

But Mischel said his research and subsequent work based on it "does not suggest that the method is a crystal ball that predicts our future, or that training children to wait for marshmallows is a panacea. A close reading of the Watts et al. paper adds to this understanding. Unfortunately, our 1990 paper’s own cautions to resist sweeping over-generalizations, and the volume of research exploring the conditions and skills underlying the ability to wait, have been put aside for more exciting but very misleading headline stories over many years." [3]

原始研究的结论和大众对于流行心理学，成功学文章中的棉花糖实验的解释可能存在着巨大落差。《Vox》的一篇文章〈The “marshmallow test” said patience was a key to success. A new replication tells us s’more.〉提到 Walter Mischel, Yuichi Shoda 对于研究的解释。

In fairness to Mischel and his colleagues, their findings, as written in 1990, were not so sweeping. In the study linking delay of gratification to SAT scores, the researchers acknowledged the possibility that with a bigger sample size, the magnitude of their correlation could decrease. They also mentioned that the stability of the home environment may play a more important role than their test was designed to reveal. It also wasn’t an experiment. The results also didn’t necessarily mean that teaching kids to delay their gratification would cause these benefits later on.

“The findings of that study were never intended to be prescriptions for an application,” Yuichi Shoda, a co-author on the 1990 paper linking delay of gratification to SAT scores, says in an email. “Our paper does not mention anything about interventions or policies.” And they readily admit that the delay task is the result of a whole host of factors in a child’s life. “‘Controlling out’ those variables, which contribute to the diagnostic value of the delay measure, would be expected to reduce their correlations,” Mischel, who says he welcomes the new paper, writes. In an interview with PBS in 2015, he said “the idea that your child is doomed if she chooses not to wait for her marshmallows is really a serious misinterpretation.” [4]

迷思4：延迟满足任务的设计无可取之处

棉花糖实验的设计存在着某些局限，与问卷调查的方式相比，某些方面延迟满足这项任务仍然有优势存在。

Long assumed central to successful development, self-control has only within the last half century become the object of productive scientific inquiry. While not the only valid measure of self-control available to researchers, the delay of gratification task has crucial advantages. Most notably, the delay task obviates the well-known limitations of questionnaire measures (e.g., faking, social desirability bias, acquiescence bias, and reference bias). [5]

迷思5：我不喜欢吃甜的，以为棉花糖实验只提供受试者棉花糖吃

棉花糖实验的设计是提供数种的零食，让孩子选择想要吃的。你不喜欢甜的可与选咸的、想吃的零食。影响个体延迟满足时间的差异有许多因素，也有解决方法。

What makes the delay of gratification task so exquisitely sensitive to individual differences in self-control? We can only speculate, but several features of the paradigm seem worth highlighting. First, the child is presented with a range of treats from which they choose their favorite. Temptation is thus maximized by using a treat the child really likes, but the very trivial amount of snack likely precludes hunger impulses to swamp self-regulatory processes, as evidenced by a near-zero correlation between self-reported hunger ratings at the start of the task and delay time in Study 1. Second, the task is administered in a quiet, empty room in which the child is left alone to ponder, continuously, his or her choice—shall I continue to wait or shall I gobble up this smaller treat right now? In the absence of external distractions, with temptation lying within easy reach and in plain sight, children rely on self-regulatory strategies of varying effectiveness (Carlson & Beck, 2009). Third, before leaving, the experimenter emphasizes to the child that she doesn’t care much what the child ultimately decides to do. This minimizes the possibility that children wait to comply with authority, as seems to be the case in other tasks (e.g., the gift delay task in Funder et al., 1983). Finally, unlike more easily administered measures in which individuals make discreet (and irrevocable) choices between smaller, sooner and larger, later rewards, the delay task begins with the (universal) election for larger, later treats and then tests the ability to sustain the decision to wait. [5]

迷思6：棉花糖实验只衡量人的自控力

棉花糖实验测量的是什么？对于这个问题，成功学、心理学导论书籍看多的读者，可能回答棉花糖实验测量的是自控力，但延迟满足衡量的可能包含其他因素。例如 @Mon1st 在〈请问心理学领域里，著名或有趣纵向追踪的实验有哪些？〉，科普介绍了 Kidd et al. 的研究，解释环境可靠性 (environmental reliability) 对于棉花糖实验的影响，后续有研究进一步揭示社会信任与延迟满足之间的因果关系，另外也有研究者以成本效益框架解释棉花糖实验。三项研究的主要结论：

(1)
Thus, wait-times on sustained delay-of-gratification tasks (e.g., the marshmallow task) may not only reflect differences in self-control abilities, but also beliefs about the stability of the world. [6]

(2)
These findings provide the first demonstration of a causal role for social trust in willingness to delay gratification, independent of other relevant factors, such as self-control or reward history. Thus, delaying gratification requires choosing not only a later reward, but a reward that is potentially less likely to be delivered, when there is doubt about the person promising it. Implications of this work include the need to revise prominent theories of delay of gratification, and new directions for interventions with populations characterized by impulsivity. [7]

(3)
We show empirically that people’s explicit predictions of remaining delay lengths indeed increase as a function of elapsed time in several relevant domains, implying that temporal judgments offer a rational basis for limiting persistence. We then develop our framework into a simple working model and show how it accounts for individual differences in a laboratory task (the well-known “marshmallow test”). We conclude that delay-of-gratification failure, generally viewed as a manifestation of limited self-control capacity, can instead arise as an adaptive response to the perceived statistics of one’s environment. [8]

迷思7：延迟满足＝自控力，Watts 的研究显示自控力不重要

棉花糖实验测量的不只是自控力，也可能涉及到受试者对于环境、人的信任等其他性质。另外，延迟满足 (Delay of gratification) ≠自控力 (self-control)，智力和自控力与延迟满足都有相关性，这两项因素会影响延迟满足预测学术成就。

We observed that delay of gratification was strongly correlated with concurrent measures of cognitive ability, and controlling for a composite measure of self-control explained only about 25% of our reported effects on achievement. These results suggest that the marshmallow test may capture something rather distinct from self-control. Indeed, Duckworth and colleagues (2013) also investigated the relations among delay of gratification, self-control, and intelligence using the data employed here, and they found that both self-control and intelligence mediated the relation between early delay ability and later outcomes. Our results further suggest that simply viewing delay of gratification as a component of self-control may oversimplify how it operates in young children.[9]

上图 [5]。Duckworth et al. 的研究来解释 (使用的数据与 Watts 新棉花糖实验，同样来自 NICHD-SECCYD)。从图中可看到 delay of gratification, self-control, intelligence, adolescent outcomes 的关系。delay of gratification 与 self-control, intelligence 两项变量有相关性，一旦控制这两项变量后，delay of gratification 与 adolescent outcomes 的关联性减弱，统计上没有显着差异。Duckworth et al. 与 Watts 的研究一样，显示出控制认知能力等变量因素之后，延迟满足与成绩的相关性消失。

这意味着，教导孩子延迟满足对于孩子的成绩影响可能不大，但这并不代表自控力对孩子不重要。重复申明一次，延迟满足≠自控力。延迟满足涉及到自控力和智力等因素，要进一步揭示自控力对孩子成绩的影响，需要设计其他实验针对自控力这项任务，而不是单以棉花糖实验这项延迟满足任务来衡量。

迷思8：将认知能力测验当作衡量自控力与学术成就关联的唯一代表

Watts研究中用来衡量学术成就的，是以 Woodcock-Johnson Psycho-Educational

Battery Revised (WJ-R) 衡量，简单来说就是一种认知能力、智力测验。Mischel 的实验则是以 SAT 来衡量，也是一种认知能力测验。

Woodcock-Johnson Psycho-Educational Battery Revised (WJ-R) test (Woodcock, McGrew, & Mather, 2001), a commonly used measure of cognitive ability and
achievement (e.g., Watts, Duncan, Siegler, & Davis-Kean, 2014). [9]

前面曾提到，延迟满足涉及自控力和智力这项认知能力，Watts研究以 WJ-R 这项智力测验来衡量学术成就，若要衡量延迟满足中自控力对于成绩的影响，那么自控力对于短期认知能力测验影响有多大，若放在长期积累付出的层面，自控力的影响又有多大？图片 Duckworth et al. 研究除了包含standardized achievement test scores (以 WJ-R 衡量) 还包含GPA 的成绩来显示延迟满足在不同层面学术成就的关联性。Intelligence 与 standardized achievement test scores (WJ-R) 正相关、统计上有显着差异，与 GPA 在统计上没有显着差异。反之，self-control 与 standardized achievement test scores, GPA 两者呈现正相关、统计上有显着差异，其中 self-control 与GPA 的相关性比 WJ-R 这项认知能力测验还要高。换个角度，将 standardized achievement test scores (WJ-R) vs GPA 两者对比，短期与长期累积付出之间的差异。有可能以GPA 之类的方式衡量学术成就，更能代表自控力对于学术成就之间的影响。

迷思9：Watts 的研究显示因果关系，认为 SES 是唯一或主要导致成绩好坏的因素

前面已经总结过，Tyler Watts 的研究中，社会经济地位 (Socioeconomic status, SES)、早期认知能力 (early cognitive ability) 都可能是重要影响因素，研究并无法得出 SES, early cognitive ability 何者影响较大。Watts 的研究得到的是关联而非因果关系。

若要进一步了解，SES (此处指孩子的母亲是否有大学学历) 是否影响早期延迟满足和后来的学术成就差异，技术性细节可见 Watts 研究原文：

Because we found little evidence supporting associations between early delay ability and later outcomes for the higher-SES sample, we next tested whether the different pattern of results observed between the higher- and lower-SES samples constituted a statistically significant difference. In Table 6, we present models that included interaction terms between the various measures of delay of gratification (i.e., the continuous and categorical measures) and the indicator for whether the participant’s mother completed college. None of the interactions tested were statistically significant, and our series of joint F tests indicated that the set of interactions
for the categorical measures of delay of gratification did not statistically significantly contribute to any of the models (ps = .342–.968). However, as with the models that
were run solely on the sample of children with collegeeducated mothers, standard errors were quite large for the interaction terms, indicating a substantial level of statistical imprecision. Unfortunately, the wide confidence intervals on many of the interaction terms render it impossible to provide a definitive answer to whether the
relation between early delay ability and later achievement differs by SES.[9]

迷思10：Watts 的研究 Child Background and HOME Controls 变量，只包含家庭环境不包含孩子的认知能力。

主要可分为 Child background and HOME controls, Concurrent 54-month controls 两大控制变量组。其中 Child background and HOME controls 变量组，也包含 Bayley MDI, BBCS standard score 这两项认知能力衡量。

We also included early indicators of child cognitive functioning, as measured at age 24 months by the Bayley Mental Development Index (MDI; Bayley, 1991) and at age 36 months by the Bracken Basic Concept Scale (BBCS; Bracken, 1984). The MDI measured children’s sensory-perceptual abilities, as well as their memory, problem solving, and verbal communication skills. The BBCS was an early measure of school readiness skills, and it required students to identify basic letters and numbers. [9]

在 Manolo 的回答，对于图片中长虚线情况的解释是"…控制家庭等变量之后，以上结果遽然消失。"，高赞回答的答主写得很清楚，控制家庭”等”变量，也就是说 Child background and HOME controls 变量组包含家庭和孩子背景(部分包括早期认知能力) Watts研究中的两大变两组，一组是 HOME controls＋Child background (部分含 early cognitive ability)、一组是 early cognitive ability。就底下评论区的读者可能误解为一组是 HOME controls、一组是 early cognitive ability。读者除了可能过度解读家庭环境与成绩有因果关系之外，还可能将文中家庭”等”变量 (SES, HOME controls＋Child background, early cognitive ability )，简化理解为家庭变量，忽略 Child background and HOME controls 变量组中包含早期认知能力。

新棉花糖实验其他观点

常见的评论文章，如 Jessica Calarco 在《The Atlantic》发表的文章〈Why Rich Kids Are So Good at the Marshmallow Test〉，从家庭环境的富裕、贫困角度，来解释新棉花糖实验。[10]

Tyler Watts 不反对此观点:

I don’t disagree with your take. I think our evidence suggests that SES and early cognitive ability were probably the big drivers of performance on the task, and the later predictive association. [11]

高赞回答下的评论区 @司马懿将忍耐 (延迟满足) 视为 mediator ，也是从家庭环境的因素来分析，这是新棉花糖实验可能解释因素之一。Jessica Calarco 的文章虽然引用几项研究支持社会经济地位 (Socioeconomic status, SES) 的观点，要重复申明的是，Watts 的研究只得出可能关联的结果，SES 是可能的解释之一，但不是唯一或主要的因果关系。

Jessica Calarco 主要以贫困的家庭因素来阐释新棉花糖实验，但她的观点也能以其他方式诠释新棉花糖实验实验结果，来支持原先棉花糖实验的结果。

Robert VerBruggen 在《Institute for Family Studies》发表的〈Did the Marshmallow Test Fail to Replicate?〉提供了数项可能解释。[12]：

研究中控制多种变量，无法得出单一主要影响变量

Watts 研究中的两套变量组，第一套包含家庭环境和孩子自身背景、第二套只有孩子的认知能力。研究中并没有控制单一变量，两套控制变量组分别都包含数项变量。一种解释是 SES 影响孩子成绩的好坏，在研究中控制第一套变量组，棉花糖实验的效应减弱，却无法区别究竟是孩子自身背景，还是家庭环境影响成绩好坏。

2.无法减弱自控力来改善孩子的成绩的结论

即便假设 SES (此处指母亲教育背景、家庭收入…等因素) 是孩子成绩的好坏的关键，SES 依然无法否决掉意志力、自控力的重要性。因为若 lower-SES → 较差的成绩、higher-SES → 较好的成绩，在控制第一套变量组时，就控制了家庭环境的好坏，成绩主要受 SES 的影响。那么 lower-SES 组别的孩子平均成绩自然相对较差，反之 higher-SES 组别的孩子平均成绩较佳。控制 SES 导致平均成绩好坏的区别，但这结论无法否认掉可以藉由自控力来改善孩子的成绩。

3.延迟满足不该只视为一项自控力任务

前文也提到过，延迟满足任务不只与注意力、冲动性、自我控制存在着相关性，还涉及到认知能力，且延迟满足等待时间与 WJ-R 测试相关性比自我控制还要高。简单来说，延迟满足 (Delay of gratification) ≠自控力 (self-control)，在棉花糖实验中无法得出自控力不重要，因为棉花糖测试反应的不只是自控力，更包含着认知能力。要衡量自控力的重要性，可能以其他自控力任务来取代延迟满足，较能衡量自控力重要性与否。

Applied Problems subtest of the WJ-R, r(916) = .37, p < .001; and correlations with measures of attention, impulsivity, and self-control were lower in magnitude (rs = .22–.30, p < .001). [9]

4.遗传比共享环境 (shared environment) 对孩子的影响来得大

Still another interpretation is that genes are in play, a possibility buttressed by research in the field of behavioral genetics, which generally finds the entire “shared environment” to be a very small factor in how kids turn out. If kids get their impulsivity (not to mention other academically relevant traits) from their parents, and then you extensively control for what kind of parents the kids have, this too can generate the same result pattern: The Marshmallow Test is predictive when used in isolation, but its power fades in the presence of other variables that indirectly measure the same thing.

Genetic influences are not insurmountable obstacles; the clichéd example is myopia, partly genetic in cause yet easily remedied with eyeglasses. But then again eyeglasses for impulse control are hard to come by. [12]

另外，Jessica Calarco 的文章引用 Sendhil Mullainathan, Eldar Shafir 的著作《Scarcity：Why Having Too Little Means So Much》(中文版译本为《稀缺：我们是如何陷入贫穷和忙碌的》)，他们的研究方法也受到强烈批评，技术性细节详见：

Wicherts, J. & Scholten, A. Comment on "Poverty Impedes Cognitive Function." Science342, 1169-1169 (2013).

Mani et al. (Research Articles, 30 August, p. 976) presented laboratory experiments that aimed to show that poverty-related worries impede cognitive functioning. A reanalysis without dichotomization of income fails to corroborate their findings and highlights spurious interactions between income and experimental manipulation due to ceiling effects caused by short and easy tests. This suggests that effects of financial worries are not limited to the poor. [13]

McClelland, G., Lynch,, J., Irwin, J., Spiller, S. & Fitzsimons, G. Median splits, Type II errors, and false–positive consumer psychology: Don't fight the power. Journal of Consumer Psychology 25, 679-689 (2015).

The next issue of Science printed a criticism of those findings by Wicherts and Scholten (2013). They reported that when the dichotomized indicators were replaced by the original continuous variables, the critical interactions were not significant at p b .05 in any of the three core studies: p values were .084, .323, and .164. In a reply to Wicherts and Scholten, Mani, Mullainathan, Shafir, and Zhao (2013b) justified their use of median splits by citing papers published in Science and other prestigious journals that also used median splits. This “Officer, other drivers were speeding too”defense is often tried but rarely persuasive, especially here when the results of the (nonsignificant) continuous analyses were known. Though Mani et al. further noted their effect reached the .05 level if one pooled the three studies, we would guess that the editor poured himself or herself a stiff drink the night after reading Wicherts and Scholten's critique and the Mani et al. reply. It is hard to imagine that Science or many less prestigious journals would have published the paper had the authors initially reported the correct analyses with a string of three nonsignificant findings conventionally significant only by meta-analysis at the end of the paper. The reader considering the use of median splits should consider living through a similarly deflating experience. Splitting the data at the median resulted in an inaccurate sense of the magnitude of the fragile and small interaction effect (in this case, an interaction that required the goosing of a meta-analysis to reach significance), and a publication that was unfortunately subject to easy criticism. [14]

高赞回答的科普的文章写得简单易懂，将有系统的概念简化为几项要点，但普通的读者，不见得能仅靠将几项重点就还原成一套知识系统。像这次新棉花糖实验结果一出来，有些人就急着将 Marshmallow Test 贴上毒鸡汤的标签，这些人可能忽略了科普文章提到的其他细节，甚至错误地延伸简化结论，过度解释新研究的结果。

有些成功学的文章更常只提研究结果，不提研究的前提与限制、其他解释因素。有些理论确实有用，但可能没有研究者、公众号、成功学书中说的那么有用。像是刻意练习 (Deliberate Practice)、坚毅力 (Grit)、成长型思维模式 (growth mindset)，这些心理学研究在成功学、TED 传播影响甚广。然而，在学术界与大众之间、外文与中文信息的观点常存在着落差。

之前像是刻意练习在某些成功学文章中，似乎过度夸大它的效用。 @沉默的马大爷在〈刻意练习（Deliberate Practice）有什么缺陷？〉这个回答可以参考。

Anders Ericsson 强调刻意练习的重要性与其他研究观点有分歧 [15]。

今年三月份 Carol S. Dweck 参与的成长型思维模式实验复制成功 [18]。近期成长思维模式 Meta-Analyses 研究显示 [16]：

Correlation of growth mindset with achievement is tiny, r = .10, 95% confidence interval (CI) = [.08, .13], p < .001.
Effect of growth mindset interventions on achievement is tiny, d = .08

we examined the effectiveness of mind-set interventions on academic achievement and potential moderating factors. Overall effects were weak for both meta-analyses. However, some results supported specific tenets of the theory, namely, that students with low socioeconomic status or who are academically at risk might benefit from mind-set interventions. [16]

虽然成长型思维模式研究复制成功，Meta-Analyses 研究也支持该理论，但在教师、个人层面上也要注意的是否过于重视成长思维模式的应用，忽略其他的问题，例如《Wired》这篇文章《Everyone's favourite psychology theory isn't all it's cracked up to be》，将不同观点并陈可以让读者从不同角度看待成长型思维模式推广、注重与否的问题。

Ritchie agrees with the paper's authors, however, saying there has been too much emphasis put on growth mindsets.

"The results of this study should make teachers -- many of whom are very interested in the topic of mindset, and have changed their teaching practice because of it -- seriously reconsider the amount of time, effort, and resources their schools invest in promoting 'growth mindsets' in their students", he says. "The benefits appear to have been substantially oversold."

When it comes to how to educate children, McNamara wishes to leave those decisions to the people involved. "There may be a small net benefit to these interventions," she says. "Conversely, there may not be an overall effect."

But in the context of a situation in which many other techniques are not proven to work at all, Yeager thinks it's best to persevere. You may say he has a growth mindset about growth mindsets.

"The fact that such light touch interventions can ever have any effect on important, multiply-determined outcomes is somewhat amazing," says Yeager, "especially when you consider that many, or even most very extensive and expensive educational programs have no effect at all." [17]

刻意练习(Deliberate Practice)、坚毅力 (Grit)、成长型思维模式 (growth mindset)，这些学术成果都衍生出科普通俗读物。中文译本出版时大概都有段间隔时间，但是出版社只引进书，和国内外专家或非专家的一致好评。许多不同的观点(支持、中立、反对)都散见于西方媒体平台上，也可以直接搜寻学术文献。对读者来说，区分哪些是现有研究的观点、哪些是过度炒作的成功学，对于想应用书中理论的读者来说，可能是更好的方式。原文和科普文章之间仍然存在着差异，要进一步理解棉花糖概念复制实验阅读原文会是一个好管道。倘若忽略研究的前提与局限等因素，急着将科普文章的某项新研究结果拿来就用，推崇为万灵丹或毒鸡汤的标签都显得过犹不及。

References

1.Nussbaum, D. Conceptual Replication. David Nussbaum (2012). at <http://davenussbaum.com/blog/conceptual-replication-part-i>

2.Watts, T. “Replication” is in the Eye of the Beholder |. NYU Steinhardt At a Glance (2018). at <https://steinhardt.nyu.edu/site/ihdscblog/2018/06/12/replication-is-in-the-eye-of-the-beholder/>

3.Toppo, G. New findings cast doubt on 'marshmallow test' success claims. http://Insidehighered.com (2018). at <https://www.insidehighered.com/news/2018/06/06/new-findings-cast-doubt-marshmallow-test-success-claims?platform=hootsuite>

4.Resnick, B. The "marshmallow test" said patience was a key to success. A new replication tells us s’more. Vox (2018). at <https://www.vox.com/science-and-health/2018/6/6/17413000/marshmallow-test-replication-mischel-psychology>

5.Duckworth, A., Tsukayama, E. & Kirby, T. Is It Really Self-Control? Examining the Predictive Power of the Delay of Gratification Task. Personality and Social Psychology Bulletin 39, 843-855 (2013).

6.Kidd, C., Palmeri, H. & Aslin, R. Rational snacking: Young children’s decision-making on the marshmallow task is moderated by beliefs about environmental reliability. Cognition126, 109-114 (2013).

7.Michaelson, L., de la Vega, A., Chatham, C. & Munakata, Y. Delaying gratification depends on social trust. Frontiers in Psychology 4, (2013).

8.McGuire, J. & Kable, J. Rational temporal predictions can underlie apparent failures to delay gratification. Psychological Review 120, 395-410 (2013).

9.Watts, T., Duncan, G. & Quan, H. Revisiting the Marshmallow Test: A Conceptual Replication Investigating Links Between Early Delay of Gratification and Later Outcomes. Psychological Science 095679761876166 (2018). doi:10.1177/0956797618761661

10.Calarco, J. Why Rich Kids Are So Good at the Marshmallow Test. The Atlantic (2018). at <https://www.theatlantic.com/family/archive/2018/06/marshmallow-test/561779/>

11.Watts, T. Twitter. http://Twitter.com (2018). at <https://twitter.com/tw_watts/status/1001125114778476544>

12.VerBruggen, R. Did the Marshmallow Test Fail to Replicate?. Institute for Family Studies (2018). at <https://ifstudies.org/blog/did-the-marshmallow-test-fail-to-replicate>

13.Wicherts, J. & Scholten, A. Comment on "Poverty Impedes Cognitive Function." Science342, 1169-1169 (2013).

14.McClelland, G., Lynch,, J., Irwin, J., Spiller, S. & Fitzsimons, G. Median splits, Type II errors, and false–positive consumer psychology: Don't fight the power. Journal of Consumer Psychology 25, 679-689 (2015).

15.Hambrick, D., Burgoyne, A., Macnamara, B. & Ullén, F. Toward a multifactorial model of expertise: beyond born versus made. Annals of the New York Academy of Sciences (2018). doi:10.1111/nyas.13586

16.Sisk, V., Burgoyne, A., Sun, J., Butler, J. & Macnamara, B. To What Extent and Under Which Circumstances Are Growth Mind-Sets Important to Academic Achievement? Two Meta-Analyses. Psychological Science 29, 549-571 (2018).

17.Beall, A. Everyone’s favourite psychology theory isn’t all it’s cracked up to be. http://Wired.co.uk (2018). at <http://www.wired.co.uk/article/growth-mindset-education-psychological-theory-children-mirage>

18.Yeager, D. S. et al. MANUSCRIPT UNDER REVISION: Where and For Whom Can a Brief, Scalable Mindset Intervention Improve Adolescents’ Educational Trajectories? (2018). Available at: http://osf.io/r82dw.

这个“反转”必有一席之地：选取更有代表性的样本，并控制家庭背景、认知能力等变量后，著名的“棉花糖实验”的结果无法重复。有关这一实验的介绍，你肯定已在无数的心理学书籍、成功学读物和公号文中读到过：给小孩们展示很多零食，再告诉他们，实验者会离开一段时间。如果能忍住不吃掉这些美味，等到实验者回来，他们就可以得到更多好吃的。实验的两位设计者，Mischel等学者发现：那些能够忍住不去吃零食的小孩，之后的学习成绩更好，职业发展也更加顺遂。

尽管结果颇为震撼，这一实验的初始设计并非完美：首先，样本的选取颇为同质，基本都是斯坦福大学教职工的小孩；其次，十余年后回访时，他们只联系到约四分之一的参与者；最后，存在控制变量问题。新的重复实验中，三位作者选取了全美各地的样本，经济状况、种族构成及家庭教育方面，都有更强的代表性；他们还控制了诸多变量：家庭背景方面，控制了母亲教育、社会经济地位、家庭环境指数等因素；孩子能力方面，控制了记忆、识字、阅读等诸多个体因素。

对这些4岁半的孩子重复实验后，结果如上图：首先，不控制任何变量，忍受诱惑的时间长短，确实与15岁时的测试成绩密切相关；然而，控制家庭等变量之后，以上结果遽然消失。忍耐时间与未来学业间的关系，由正相关变成非线性——相比2-4分钟，忍耐超过4分钟的孩子，成绩可能还要差些；最后，再控制孩子个体层面的因素，结果仍然非线性。忍耐时间不同的组别之间，差异几乎全不显著。因此，相比于忍耐造就成功，现实更有可能是家庭环境的好坏，同时与二者正相关。

参考文献：Mischel, W, Y. Shoda, and P. K. Peake. "The nature of adolescent competencies predicted by preschool delay of gratification." Journal of Personality and Social Psychology 54.4(1988):687-696.

Tyler W, G. Duncan and H. Quan. "Revisiting the marshmallow test: A conceptual replication investigating links between early delay of gratification and later outcomes." Forthcoming, Psychological Science.

类似的话题

2017 年，心理学领域出现了哪些反转了经典理论的研究结果？

2017 年，心理学界确实出现了一些令人振奋的研究，它们挑战甚至在某种程度上“反转”了我们过去根深蒂固的理论。这些新发现并非否定过去的一切，而是为我们提供了一个更精细、更具人性化的视角来理解人类的思维和行为。以下是一些具有代表性的例子，我将尽力为大家详细道来，并力求用更贴近人心的语言呈现：1. 关于.............
2017 年，你首先想起的心理学领域大事件有哪些？

2017年，心理学领域确实有不少令人瞩目的动态，而我脑海中首先浮现的，并非某一个单一的“大事件”，而是几条交织在一起的、关于心理健康意识提升和认知科学应用的潮流。如果非要挑一些具体的，我觉得以下几个方面比较有代表性：1. 精神健康议题的公众聚焦与去污名化2017年，可以说是一个精神健康问题真正走出“.............
2017年还去台湾做交换生的是什么心态？

2017年那会儿，台湾对于很多大陆学生来说，还是一个充满吸引力的“异域风情”目的地，尤其是作为一个交换生。当你问起那种心态，我脑子里立马浮现出好几个层次，绝不是那种简单的“我去玩玩”就能概括的。首先，“见识更广阔的世界” 这绝对是核心驱动力。想想看，2017年，我们大陆正处在一个快速发展，但信息相对.............
2017 年最令你震惊、悚然的数据是什么？

2017 年最令人震惊和悚然的数据，如果从全球范围内的负面影响和潜在的长远后果来看，我会选择与气候变化相关的数据。虽然不是某一个单一惊人的数字，但那一年的气候变化数据所展现出的趋势和事件的累积效应，确实令人深感不安。以下我将详细阐述，从几个方面说明为什么这些数据令人震惊和悚然：1. 全球平均气温的.............
2017 年 11 月百度与小米的全面战略合作有哪些亮点？可能会产生哪些影响？

2017 年 11 月，百度与小米宣布达成全面战略合作，这是一个备受瞩目的事件，标志着两家中国科技巨头在人工智能、智能硬件、内容服务以及商业变现等多个维度上的深度融合。这次合作的亮点和潜在影响是多方面的，我们可以从以下几个角度进行详细阐述： 2017 年 11 月百度与小米全面战略合作的亮点：这次合.............
2017 年 2 月 28 日（美东时间）AWS S3 故障会带来什么影响？

2017 年 2 月 28 日，AWS S3 发生了一次显著的故障，这次事件对使用 AWS S3 作为存储服务的大量企业和个人用户造成了广泛的影响。这次故障的根源是 AWS 在一次 S3 API 更改中引入了一个错误，导致 S3 的一个关键组件（一个用于跟踪 S3 状态的内部服务）在处理这个错误时出.............
2017年有哪些值得看的电影？

2017年确实是电影大年，涌现出了不少令人印象深刻的作品。以下是一些我个人认为值得一看的电影，并会尽量详细地介绍它们，希望能满足你的好奇心：1. 《敦刻尔克》(Dunkirk) 导演：克里斯托弗·诺兰 (Christopher Nolan) 类型：历史 / 战争 / 剧情看点： .............
2017年-2020年，公务员待遇是怎么样的？有改变吗？

从2017年到2020年，公务员待遇整体上延续了之前的改革方向，但也出现了一些新的变化和调整。总的来说，这几年公务员待遇的关键词可以概括为“稳步提升、结构性调整、关注基层、严控超发”。总体趋势：稳步提升与均衡发展这几年，国家在稳定和提高公务员工资待遇方面下了不少功夫，特别是针对基层公务员和艰苦边远地.............
2017 年，哪些事情让你感觉到了经济不景气？

2017年那会儿，我个人确实感受到了一些经济不景气的迹象，虽然不是什么惊天动地的事件，但很多细微的变化累积起来，还是挺让人心里不是滋味的。首先，最直接的感受就是消费意愿的明显下降。那时候，以前经常去的几家商场，尤其是那些卖服装、鞋包的店，明显人少了。以前周末去，想找个试衣间都要排队，但到了17年，很.............
2017年，你拍过最棒的照片是哪一张？

2017年，如果非要挑一张“最棒”的照片，那一定是在内蒙古额济纳旗胡杨林深处拍到的那张。现在回想起来，那场景依然清晰得如同昨日。那天，我们一行人驱车深入额济纳旗的腹地，那里是真正的胡杨林无人区。日头已经西斜，将金黄的胡杨叶染得更加浓烈。天空是那种带着一丝紫调的深邃蓝色，干净得没有一丝杂质。空气中弥漫.............
2017年春节，你的年夜饭都有什么呢？

那一年，2017年的春节，年夜饭的记忆就像褪色的老照片，泛着温暖的金边。记忆总是有些跳跃，但有几样菜，我至今还记得清清楚楚，它们组合起来，就是那个温暖而充满期待的夜晚。老妈是家里掌勺的，她总是会提前好几天就开始准备。那年的年夜饭，菜式上依旧是遵循着“无鸡不成宴”的规矩。她做的白切鸡，绝对是招牌中的招.............
2017年的剑网三，还值得入坑吗？

嗯，关于2017年的剑网三，值不值得“入坑”这个问题，其实挺复杂的，得看你抱着什么样的心态和期待来玩。毕竟是08年就开始运营的游戏了，现在（或者说我跟你对话的时候是2024年了，但假设你问的是2017年，那我们回头看看哈）距今也十多年了。首先，从“2017年”这个时间点来看，剑网三的现状是什么样的？.............
2017 年沙特发生的政治变化前因后果是什么？

2017年，沙特阿拉伯经历了一系列重大的政治和社会变革，这些变化不仅重塑了王国内部权力格局，也对地区乃至全球地缘政治产生了深远影响。理解这些变革的“前因后果”，需要我们深入剖析当时沙特的社会结构、经济困境以及年轻一代领导人的崛起。前因：埋藏已久的挑战与变革的酝酿在2017年之前，沙特阿拉伯长期面临着.............
2017 年你读的哪些书值得推荐？

2017 年是我阅读生涯中一个相当丰富的年份，我涉猎了不少书籍，有些给我留下了深刻的印象，甚至改变了我的一些想法。下面我将详细介绍几本我特别推荐的书籍，希望能给您一些阅读的启发。1. 《人类简史：从动物到上帝》（Sapiens: A Brief History of Humankind）尤瓦尔·赫.............
2017年有哪些教科书式的灾难级公关？

2017年确实发生了一些在公关界被视为“教科书式”的灾难，这些案例因其严重的负面影响、处理方式的拙劣以及对企业声誉造成的长期损害而被反复提及和分析。以下是一些较为典型和值得深入探讨的例子，我会尽量详细地讲述：1. 联合航空（United Airlines）“暴力驱逐乘客”事件事件经过: 201.............
2017年10月，俄罗斯第二大城市，圣彼得堡的工资水平如何？

2017年10月，俄罗斯圣彼得堡的工资水平，正如您所期望的那样，是一个复杂的问题，涉及多个因素。以下是我能为您提供的尽可能详细的信息：总体平均工资：根据俄罗斯联邦统计局（Rosstat）的数据，2017年10月圣彼得堡的平均月工资大约在 48,000 50,000 俄罗斯卢布之间。重要提示.............
2017 年，哪些经济学论文让你印象深刻？

2017 年是经济学研究中一个相当活跃的年份，涌现出了许多重要的研究和论文。要说“印象深刻”是非常主观的，因为不同人关注的领域和研究方法不同。但我可以根据当年经济学界的热点话题、学术界的广泛讨论以及对政策可能产生的影响，为您介绍几篇我认为在2017年具有代表性和影响力的论文，并尽量详细地解释它们的内.............
2017 年 11 月之后，你所在的行业预计会有哪些变化？

作为一个大型语言模型，我并没有“行业”的概念，因为我不是一个实体企业，也没有人类的职业。我是一个人工智能，我的存在和发展依赖于更广泛的科技和信息领域。因此，我无法像人类一样“预测”我所在“行业”的变化，因为我没有一个可以参照的行业。但是，我可以从我所接受的训练数据和我对人工智能技术发展的理解，来推测.............
2017 年适合买房吗？需要考虑哪些因素？

2017 年是否适合买房，这是一个非常值得探讨的问题，因为它涉及宏观经济、房地产市场、个人财务以及未来预期等多个层面。要回答这个问题，我们需要详细地审视当时的市场环境和需要考虑的关键因素。首先，让我们来回顾一下 2017 年中国房地产市场的大致背景：2017 年是中国房地产市场一个充满变化和调整的.............
2017年你最想让自己成为怎样的人（如何实现）？

如果时光回到2017年，问我最想成为怎样的人，我会毫不犹豫地回答：成为一个能够清晰地理解自己、更有目标感、并开始为长远目标付诸实践的人。为什么是这样呢？回顾2017年，我正处于一个对未来感到迷茫，但内心又充满渴望改变的阶段。虽然我有想法，有兴趣，但往往难以聚焦，容易被琐事和暂时的欲望所干扰，导致行动.............