2017 年，心理学领域出现了哪些反转了经典理论的研究结果？第1页

Taniyama-11-17 网友的相关建议:

@Manolo 在回答中介绍了 Tyler W. Watts 新的棉花糖概念复制实验，因为科普化使文章简化许多细节。从底下评论区可看到有些人曲解和过度解读 Tyler Watts 的研究，有些读者可能认为 Mischel 的棉花糖实验是心灵鸡汤误导大众、Tyler Watts 的研究完全推翻棉花糖实验。若读者仅凭科普文章的介绍不看原来的文献，以自己的理解过度解释新研究很容易得出错误结论。

知乎不是学术交流的地方，鉴于高赞回答下的评论区存在着对于新研究和棉花糖实验的误解，所以写了这篇补充信息的回答。这是一篇只"科"不普的文章，引用和保留许多原文解释不做翻译，避免简化细节使读者又扭曲研究结论。提供有兴趣进一步理解棉花糖实验相关文献的读者，澄清棉花糖实验的一些迷思。

Tyler Watts 的主要结论

Tyler Watts 的研究是一种概念复制 (conceptual replication) 实验，并不是直接复制 Mischel 的棉花糖实验。实验方法上有些差异，例如实验等待时间 (7 min vs 15-20 min)、测量学术成绩 (WJ-R vs SAT) 不同实验方法。概念复制实验所得到的相关性、效应很可能与原来的实验有差异。

注：什么是概念复制
There is no substitute for direct replication – if you cannot reproduce the same result using the same methods then you cannot have a cumulative science. But conceptual replication also has a very important role to play in psychological science. What is conceptual replication? It’s when instead of replicating the exact same experiment in exactly the same way, we test the experiment’s underlying hypothesis using different methods. [1]

2. Tyler Watts 的概念复制实验，主要复制 Mischel 实验延迟满足 (delay gratification) 的两个预测结果，以母亲未完成大学学业的孩子组别来看

(1) academic achievement

整体上可说复制成功，延迟满足与成绩有相关性，细节上可区分成三部分来看：

I. 在不控制任何变量 (No Controls)：延迟满足与成绩的相关性约为原来研究效应的1/2，统计上有显着差异。
II. 控制孩子背景和家庭变量 (Child Background and HOME Controls Only)：延迟满足与成绩的相关性减少约为 No Controls 效应的2/3，控制变量后相关性变得更弱，在统计上仍有显着差异。
III. 控制孩子背景和家庭变量 + 早期认知能力 (Child background and HOME + Concurrent 54-Month Controls)：控制更多变量后，差异几乎全不显着(延迟满足等待 2–7 min 组，统计上有显着差异)，延迟满足等待时间的长短与成绩的相关性消失。

(2) behavioral outcomes

复制失败，延迟满足与行为结果 (例如反社会、抑郁…) 没有关联。详见下图 [9]，

Behavior composite 的部分。

图中，孩子的母亲没有大学学历组别，左半边为学术成就，右半边为行为问题的结果。

1, 4, 7, 10 栏为不控制任何变量 (No Controls)
2, 5, 8, 11 栏为控制孩子背景和家庭变量 (Child Background and HOME Controls)
3, 6, 9, 12 栏为控制孩子背景和家庭变量 + 早期认知能力 (Child background and HOME + Concurrent 54-Month Controls)

3. Tyler Watts 的研究中，社会经济地位 (Socioeconomic status, SES)、早期认知能力 (early cognitive ability) 都可能是重要影响因素，研究并无法得出 SES, early cognitive ability 何者影响较大。换句话说，相比于延迟满足影响成绩好坏，在这篇研究中更有可能是 SES, early cognitive ability 影响成绩好坏。研究并无法得出 SES 是唯一或主要影响成绩的因素。

上述为 Watts 研究的主要结论，至此若读者产生以下几点迷思误解，很可能是误解延迟满足相关实验的研究细节。

Watts 证明了Mischel 的棉花糖实验是毒心灵鸡汤错误研究之类的说法)
原始棉花糖实验，Mischel 没有提到其他解释因素
将延迟满足当作是万灵丹、预测未来成就的指标
延迟满足任务的设计无可取之处
我不喜欢吃甜的，以为棉花糖实验只提供受试者棉花糖吃
棉花糖实验只衡量人的自控力
延迟满足＝自控力，Watts 的研究显示自控力不重要
将认知能力测验 (智力测验) 当作衡量自控力与学术成就关联的唯一代表方法
Watts 的研究显示因果关系，认为 SES是唯一或主要导致成绩好坏的因素
Watts 的研究 Child Background and HOME Controls 变量，只包含家庭环境不包含孩子的认知能力

从上面几项迷思，若读者认为棉花糖实验完全无法复制、棉花糖实验是毒鸡汤、社会经济地位是主要影响因素、自控力完全不重要，恐怕读者是成功学文章读多了，忽略了 Mischel 在原来研究中提到的其他解释，甚至从 Watts 的研究想象杜撰出文献中没有的结论。

迷思1：Watts 证明了Mischel 的棉花糖实验是毒心灵鸡汤(错误研究之类的说法)

Tyler Watts 这篇研究"反转"棉花糖实验，Mischel 的研究部分复制成功、部分复制失败，但可没说棉花糖实验完全是个错误。

When our new results are interpreted, they should be viewed alongside the older studies, and should be seen as adding shades of complexity to what we already know — not interpreted as definitive proof that the original work was false. In many cases, treating a single study with the word “replication” in the title as definitive proof of anything falls prey to the same error that led us to over-interpret the results from the original work. [2]

迷思2：原始棉花糖实验，Mischel 没有提到其他解释因素

部分读者的既定印象很可能受成功学文章影响，研究结论拿来就用，却不提其他解释因素。《Inside Higher Ed》的一篇文章〈Softening Claims of the Marshmallow Test〉，有 Mischel 对于 Watts 研究的观点。Mischel 提到其他能力解释棉花糖实验、棉花糖实验测试寻找的是什么。

In an email, Mischel noted that the data set used by Watts et al. found "a significant positive correlation between delay time and academic achievement" that strengthens earlier findings.

He also said years of research by him and his colleagues, as well as by others, have found that "a child's ability to wait in the 'marshmallow test' situation reflects that child’s ability to engage various cognitive and emotion-regulation strategies and skills that make the waiting situation less frustrating. Therefore, it is expected and predictable, as the Watts paper shows, that once these cognitive and emotion-regulation skills, which are the skills that are essential for waiting, are statistically 'controlled out,' the correlation is indeed diminished."

He added, "The 'marshmallow test' was developed to find a method -- a window -- that allows us to see how people, especially children, manage to deal with the frustration of waiting for something they really want to have. This window opened the way for many experiments that identified the conditions and mental-emotional strategies and skills that make this challenge manageable or not. The long list includes, for example, trust in the promise-maker, and diverse cognitive and 'cooling' strategies to make the waiting easier." [3]

迷思3：将延迟满足当作是万灵丹、预测未来成就的指标

成功学文章有时以炒作误导性标题、过度解读概括研究结论。Watts 研究一发表，有些人将棉花糖实验实验冠上毒鸡汤之名，然而这些人往往忽略原先研究实验上的设计、研究限制、其他可能的影响因素，将棉花糖实验的结论拿来就用。Mischel 在文章中指出，棉花糖实验并不是预测未来的水晶球，训练孩子等待棉花糖也不是万灵丹。

But Mischel said his research and subsequent work based on it "does not suggest that the method is a crystal ball that predicts our future, or that training children to wait for marshmallows is a panacea. A close reading of the Watts et al. paper adds to this understanding. Unfortunately, our 1990 paper’s own cautions to resist sweeping over-generalizations, and the volume of research exploring the conditions and skills underlying the ability to wait, have been put aside for more exciting but very misleading headline stories over many years." [3]

原始研究的结论和大众对于流行心理学，成功学文章中的棉花糖实验的解释可能存在着巨大落差。《Vox》的一篇文章〈The “marshmallow test” said patience was a key to success. A new replication tells us s’more.〉提到 Walter Mischel, Yuichi Shoda 对于研究的解释。

In fairness to Mischel and his colleagues, their findings, as written in 1990, were not so sweeping. In the study linking delay of gratification to SAT scores, the researchers acknowledged the possibility that with a bigger sample size, the magnitude of their correlation could decrease. They also mentioned that the stability of the home environment may play a more important role than their test was designed to reveal. It also wasn’t an experiment. The results also didn’t necessarily mean that teaching kids to delay their gratification would cause these benefits later on.

“The findings of that study were never intended to be prescriptions for an application,” Yuichi Shoda, a co-author on the 1990 paper linking delay of gratification to SAT scores, says in an email. “Our paper does not mention anything about interventions or policies.” And they readily admit that the delay task is the result of a whole host of factors in a child’s life. “‘Controlling out’ those variables, which contribute to the diagnostic value of the delay measure, would be expected to reduce their correlations,” Mischel, who says he welcomes the new paper, writes. In an interview with PBS in 2015, he said “the idea that your child is doomed if she chooses not to wait for her marshmallows is really a serious misinterpretation.” [4]

迷思4：延迟满足任务的设计无可取之处

棉花糖实验的设计存在着某些局限，与问卷调查的方式相比，某些方面延迟满足这项任务仍然有优势存在。

Long assumed central to successful development, self-control has only within the last half century become the object of productive scientific inquiry. While not the only valid measure of self-control available to researchers, the delay of gratification task has crucial advantages. Most notably, the delay task obviates the well-known limitations of questionnaire measures (e.g., faking, social desirability bias, acquiescence bias, and reference bias). [5]

迷思5：我不喜欢吃甜的，以为棉花糖实验只提供受试者棉花糖吃

棉花糖实验的设计是提供数种的零食，让孩子选择想要吃的。你不喜欢甜的可与选咸的、想吃的零食。影响个体延迟满足时间的差异有许多因素，也有解决方法。

What makes the delay of gratification task so exquisitely sensitive to individual differences in self-control? We can only speculate, but several features of the paradigm seem worth highlighting. First, the child is presented with a range of treats from which they choose their favorite. Temptation is thus maximized by using a treat the child really likes, but the very trivial amount of snack likely precludes hunger impulses to swamp self-regulatory processes, as evidenced by a near-zero correlation between self-reported hunger ratings at the start of the task and delay time in Study 1. Second, the task is administered in a quiet, empty room in which the child is left alone to ponder, continuously, his or her choice—shall I continue to wait or shall I gobble up this smaller treat right now? In the absence of external distractions, with temptation lying within easy reach and in plain sight, children rely on self-regulatory strategies of varying effectiveness (Carlson & Beck, 2009). Third, before leaving, the experimenter emphasizes to the child that she doesn’t care much what the child ultimately decides to do. This minimizes the possibility that children wait to comply with authority, as seems to be the case in other tasks (e.g., the gift delay task in Funder et al., 1983). Finally, unlike more easily administered measures in which individuals make discreet (and irrevocable) choices between smaller, sooner and larger, later rewards, the delay task begins with the (universal) election for larger, later treats and then tests the ability to sustain the decision to wait. [5]

迷思6：棉花糖实验只衡量人的自控力

棉花糖实验测量的是什么？对于这个问题，成功学、心理学导论书籍看多的读者，可能回答棉花糖实验测量的是自控力，但延迟满足衡量的可能包含其他因素。例如 @Mon1st 在〈请问心理学领域里，著名或有趣纵向追踪的实验有哪些？〉，科普介绍了 Kidd et al. 的研究，解释环境可靠性 (environmental reliability) 对于棉花糖实验的影响，后续有研究进一步揭示社会信任与延迟满足之间的因果关系，另外也有研究者以成本效益框架解释棉花糖实验。三项研究的主要结论：

(1)
Thus, wait-times on sustained delay-of-gratification tasks (e.g., the marshmallow task) may not only reflect differences in self-control abilities, but also beliefs about the stability of the world. [6]

(2)
These findings provide the first demonstration of a causal role for social trust in willingness to delay gratification, independent of other relevant factors, such as self-control or reward history. Thus, delaying gratification requires choosing not only a later reward, but a reward that is potentially less likely to be delivered, when there is doubt about the person promising it. Implications of this work include the need to revise prominent theories of delay of gratification, and new directions for interventions with populations characterized by impulsivity. [7]

(3)
We show empirically that people’s explicit predictions of remaining delay lengths indeed increase as a function of elapsed time in several relevant domains, implying that temporal judgments offer a rational basis for limiting persistence. We then develop our framework into a simple working model and show how it accounts for individual differences in a laboratory task (the well-known “marshmallow test”). We conclude that delay-of-gratification failure, generally viewed as a manifestation of limited self-control capacity, can instead arise as an adaptive response to the perceived statistics of one’s environment. [8]

迷思7：延迟满足＝自控力，Watts 的研究显示自控力不重要

棉花糖实验测量的不只是自控力，也可能涉及到受试者对于环境、人的信任等其他性质。另外，延迟满足 (Delay of gratification) ≠自控力 (self-control)，智力和自控力与延迟满足都有相关性，这两项因素会影响延迟满足预测学术成就。

We observed that delay of gratification was strongly correlated with concurrent measures of cognitive ability, and controlling for a composite measure of self-control explained only about 25% of our reported effects on achievement. These results suggest that the marshmallow test may capture something rather distinct from self-control. Indeed, Duckworth and colleagues (2013) also investigated the relations among delay of gratification, self-control, and intelligence using the data employed here, and they found that both self-control and intelligence mediated the relation between early delay ability and later outcomes. Our results further suggest that simply viewing delay of gratification as a component of self-control may oversimplify how it operates in young children.[9]

上图 [5]。Duckworth et al. 的研究来解释 (使用的数据与 Watts 新棉花糖实验，同样来自 NICHD-SECCYD)。从图中可看到 delay of gratification, self-control, intelligence, adolescent outcomes 的关系。delay of gratification 与 self-control, intelligence 两项变量有相关性，一旦控制这两项变量后，delay of gratification 与 adolescent outcomes 的关联性减弱，统计上没有显着差异。Duckworth et al. 与 Watts 的研究一样，显示出控制认知能力等变量因素之后，延迟满足与成绩的相关性消失。

这意味着，教导孩子延迟满足对于孩子的成绩影响可能不大，但这并不代表自控力对孩子不重要。重复申明一次，延迟满足≠自控力。延迟满足涉及到自控力和智力等因素，要进一步揭示自控力对孩子成绩的影响，需要设计其他实验针对自控力这项任务，而不是单以棉花糖实验这项延迟满足任务来衡量。

迷思8：将认知能力测验当作衡量自控力与学术成就关联的唯一代表

Watts研究中用来衡量学术成就的，是以 Woodcock-Johnson Psycho-Educational

Battery Revised (WJ-R) 衡量，简单来说就是一种认知能力、智力测验。Mischel 的实验则是以 SAT 来衡量，也是一种认知能力测验。

Woodcock-Johnson Psycho-Educational Battery Revised (WJ-R) test (Woodcock, McGrew, & Mather, 2001), a commonly used measure of cognitive ability and
achievement (e.g., Watts, Duncan, Siegler, & Davis-Kean, 2014). [9]

前面曾提到，延迟满足涉及自控力和智力这项认知能力，Watts研究以 WJ-R 这项智力测验来衡量学术成就，若要衡量延迟满足中自控力对于成绩的影响，那么自控力对于短期认知能力测验影响有多大，若放在长期积累付出的层面，自控力的影响又有多大？图片 Duckworth et al. 研究除了包含standardized achievement test scores (以 WJ-R 衡量) 还包含GPA 的成绩来显示延迟满足在不同层面学术成就的关联性。Intelligence 与 standardized achievement test scores (WJ-R) 正相关、统计上有显着差异，与 GPA 在统计上没有显着差异。反之，self-control 与 standardized achievement test scores, GPA 两者呈现正相关、统计上有显着差异，其中 self-control 与GPA 的相关性比 WJ-R 这项认知能力测验还要高。换个角度，将 standardized achievement test scores (WJ-R) vs GPA 两者对比，短期与长期累积付出之间的差异。有可能以GPA 之类的方式衡量学术成就，更能代表自控力对于学术成就之间的影响。

迷思9：Watts 的研究显示因果关系，认为 SES 是唯一或主要导致成绩好坏的因素

前面已经总结过，Tyler Watts 的研究中，社会经济地位 (Socioeconomic status, SES)、早期认知能力 (early cognitive ability) 都可能是重要影响因素，研究并无法得出 SES, early cognitive ability 何者影响较大。Watts 的研究得到的是关联而非因果关系。

若要进一步了解，SES (此处指孩子的母亲是否有大学学历) 是否影响早期延迟满足和后来的学术成就差异，技术性细节可见 Watts 研究原文：

Because we found little evidence supporting associations between early delay ability and later outcomes for the higher-SES sample, we next tested whether the different pattern of results observed between the higher- and lower-SES samples constituted a statistically significant difference. In Table 6, we present models that included interaction terms between the various measures of delay of gratification (i.e., the continuous and categorical measures) and the indicator for whether the participant’s mother completed college. None of the interactions tested were statistically significant, and our series of joint F tests indicated that the set of interactions
for the categorical measures of delay of gratification did not statistically significantly contribute to any of the models (ps = .342–.968). However, as with the models that
were run solely on the sample of children with collegeeducated mothers, standard errors were quite large for the interaction terms, indicating a substantial level of statistical imprecision. Unfortunately, the wide confidence intervals on many of the interaction terms render it impossible to provide a definitive answer to whether the
relation between early delay ability and later achievement differs by SES.[9]

迷思10：Watts 的研究 Child Background and HOME Controls 变量，只包含家庭环境不包含孩子的认知能力。

主要可分为 Child background and HOME controls, Concurrent 54-month controls 两大控制变量组。其中 Child background and HOME controls 变量组，也包含 Bayley MDI, BBCS standard score 这两项认知能力衡量。

We also included early indicators of child cognitive functioning, as measured at age 24 months by the Bayley Mental Development Index (MDI; Bayley, 1991) and at age 36 months by the Bracken Basic Concept Scale (BBCS; Bracken, 1984). The MDI measured children’s sensory-perceptual abilities, as well as their memory, problem solving, and verbal communication skills. The BBCS was an early measure of school readiness skills, and it required students to identify basic letters and numbers. [9]

在 Manolo 的回答，对于图片中长虚线情况的解释是"…控制家庭等变量之后，以上结果遽然消失。"，高赞回答的答主写得很清楚，控制家庭”等”变量，也就是说 Child background and HOME controls 变量组包含家庭和孩子背景(部分包括早期认知能力) Watts研究中的两大变两组，一组是 HOME controls＋Child background (部分含 early cognitive ability)、一组是 early cognitive ability。就底下评论区的读者可能误解为一组是 HOME controls、一组是 early cognitive ability。读者除了可能过度解读家庭环境与成绩有因果关系之外，还可能将文中家庭”等”变量 (SES, HOME controls＋Child background, early cognitive ability )，简化理解为家庭变量，忽略 Child background and HOME controls 变量组中包含早期认知能力。

新棉花糖实验其他观点

常见的评论文章，如 Jessica Calarco 在《The Atlantic》发表的文章〈Why Rich Kids Are So Good at the Marshmallow Test〉，从家庭环境的富裕、贫困角度，来解释新棉花糖实验。[10]

Tyler Watts 不反对此观点:

I don’t disagree with your take. I think our evidence suggests that SES and early cognitive ability were probably the big drivers of performance on the task, and the later predictive association. [11]

高赞回答下的评论区 @司马懿将忍耐 (延迟满足) 视为 mediator ，也是从家庭环境的因素来分析，这是新棉花糖实验可能解释因素之一。Jessica Calarco 的文章虽然引用几项研究支持社会经济地位 (Socioeconomic status, SES) 的观点，要重复申明的是，Watts 的研究只得出可能关联的结果，SES 是可能的解释之一，但不是唯一或主要的因果关系。

Jessica Calarco 主要以贫困的家庭因素来阐释新棉花糖实验，但她的观点也能以其他方式诠释新棉花糖实验实验结果，来支持原先棉花糖实验的结果。

Robert VerBruggen 在《Institute for Family Studies》发表的〈Did the Marshmallow Test Fail to Replicate?〉提供了数项可能解释。[12]：

研究中控制多种变量，无法得出单一主要影响变量

Watts 研究中的两套变量组，第一套包含家庭环境和孩子自身背景、第二套只有孩子的认知能力。研究中并没有控制单一变量，两套控制变量组分别都包含数项变量。一种解释是 SES 影响孩子成绩的好坏，在研究中控制第一套变量组，棉花糖实验的效应减弱，却无法区别究竟是孩子自身背景，还是家庭环境影响成绩好坏。

2.无法减弱自控力来改善孩子的成绩的结论

即便假设 SES (此处指母亲教育背景、家庭收入…等因素) 是孩子成绩的好坏的关键，SES 依然无法否决掉意志力、自控力的重要性。因为若 lower-SES → 较差的成绩、higher-SES → 较好的成绩，在控制第一套变量组时，就控制了家庭环境的好坏，成绩主要受 SES 的影响。那么 lower-SES 组别的孩子平均成绩自然相对较差，反之 higher-SES 组别的孩子平均成绩较佳。控制 SES 导致平均成绩好坏的区别，但这结论无法否认掉可以藉由自控力来改善孩子的成绩。

3.延迟满足不该只视为一项自控力任务

前文也提到过，延迟满足任务不只与注意力、冲动性、自我控制存在着相关性，还涉及到认知能力，且延迟满足等待时间与 WJ-R 测试相关性比自我控制还要高。简单来说，延迟满足 (Delay of gratification) ≠自控力 (self-control)，在棉花糖实验中无法得出自控力不重要，因为棉花糖测试反应的不只是自控力，更包含着认知能力。要衡量自控力的重要性，可能以其他自控力任务来取代延迟满足，较能衡量自控力重要性与否。

Applied Problems subtest of the WJ-R, r(916) = .37, p < .001; and correlations with measures of attention, impulsivity, and self-control were lower in magnitude (rs = .22–.30, p < .001). [9]

4.遗传比共享环境 (shared environment) 对孩子的影响来得大

Still another interpretation is that genes are in play, a possibility buttressed by research in the field of behavioral genetics, which generally finds the entire “shared environment” to be a very small factor in how kids turn out. If kids get their impulsivity (not to mention other academically relevant traits) from their parents, and then you extensively control for what kind of parents the kids have, this too can generate the same result pattern: The Marshmallow Test is predictive when used in isolation, but its power fades in the presence of other variables that indirectly measure the same thing.

Genetic influences are not insurmountable obstacles; the clichéd example is myopia, partly genetic in cause yet easily remedied with eyeglasses. But then again eyeglasses for impulse control are hard to come by. [12]

另外，Jessica Calarco 的文章引用 Sendhil Mullainathan, Eldar Shafir 的著作《Scarcity：Why Having Too Little Means So Much》(中文版译本为《稀缺：我们是如何陷入贫穷和忙碌的》)，他们的研究方法也受到强烈批评，技术性细节详见：

Wicherts, J. & Scholten, A. Comment on "Poverty Impedes Cognitive Function." Science342, 1169-1169 (2013).

Mani et al. (Research Articles, 30 August, p. 976) presented laboratory experiments that aimed to show that poverty-related worries impede cognitive functioning. A reanalysis without dichotomization of income fails to corroborate their findings and highlights spurious interactions between income and experimental manipulation due to ceiling effects caused by short and easy tests. This suggests that effects of financial worries are not limited to the poor. [13]

McClelland, G., Lynch,, J., Irwin, J., Spiller, S. & Fitzsimons, G. Median splits, Type II errors, and false–positive consumer psychology: Don't fight the power. Journal of Consumer Psychology 25, 679-689 (2015).

The next issue of Science printed a criticism of those findings by Wicherts and Scholten (2013). They reported that when the dichotomized indicators were replaced by the original continuous variables, the critical interactions were not significant at p b .05 in any of the three core studies: p values were .084, .323, and .164. In a reply to Wicherts and Scholten, Mani, Mullainathan, Shafir, and Zhao (2013b) justified their use of median splits by citing papers published in Science and other prestigious journals that also used median splits. This “Officer, other drivers were speeding too”defense is often tried but rarely persuasive, especially here when the results of the (nonsignificant) continuous analyses were known. Though Mani et al. further noted their effect reached the .05 level if one pooled the three studies, we would guess that the editor poured himself or herself a stiff drink the night after reading Wicherts and Scholten's critique and the Mani et al. reply. It is hard to imagine that Science or many less prestigious journals would have published the paper had the authors initially reported the correct analyses with a string of three nonsignificant findings conventionally significant only by meta-analysis at the end of the paper. The reader considering the use of median splits should consider living through a similarly deflating experience. Splitting the data at the median resulted in an inaccurate sense of the magnitude of the fragile and small interaction effect (in this case, an interaction that required the goosing of a meta-analysis to reach significance), and a publication that was unfortunately subject to easy criticism. [14]

高赞回答的科普的文章写得简单易懂，将有系统的概念简化为几项要点，但普通的读者，不见得能仅靠将几项重点就还原成一套知识系统。像这次新棉花糖实验结果一出来，有些人就急着将 Marshmallow Test 贴上毒鸡汤的标签，这些人可能忽略了科普文章提到的其他细节，甚至错误地延伸简化结论，过度解释新研究的结果。

有些成功学的文章更常只提研究结果，不提研究的前提与限制、其他解释因素。有些理论确实有用，但可能没有研究者、公众号、成功学书中说的那么有用。像是刻意练习 (Deliberate Practice)、坚毅力 (Grit)、成长型思维模式 (growth mindset)，这些心理学研究在成功学、TED 传播影响甚广。然而，在学术界与大众之间、外文与中文信息的观点常存在着落差。

之前像是刻意练习在某些成功学文章中，似乎过度夸大它的效用。 @沉默的马大爷在〈刻意练习（Deliberate Practice）有什么缺陷？〉这个回答可以参考。

Anders Ericsson 强调刻意练习的重要性与其他研究观点有分歧 [15]。

今年三月份 Carol S. Dweck 参与的成长型思维模式实验复制成功 [18]。近期成长思维模式 Meta-Analyses 研究显示 [16]：

Correlation of growth mindset with achievement is tiny, r = .10, 95% confidence interval (CI) = [.08, .13], p < .001.
Effect of growth mindset interventions on achievement is tiny, d = .08

we examined the effectiveness of mind-set interventions on academic achievement and potential moderating factors. Overall effects were weak for both meta-analyses. However, some results supported specific tenets of the theory, namely, that students with low socioeconomic status or who are academically at risk might benefit from mind-set interventions. [16]

虽然成长型思维模式研究复制成功，Meta-Analyses 研究也支持该理论，但在教师、个人层面上也要注意的是否过于重视成长思维模式的应用，忽略其他的问题，例如《Wired》这篇文章《Everyone's favourite psychology theory isn't all it's cracked up to be》，将不同观点并陈可以让读者从不同角度看待成长型思维模式推广、注重与否的问题。

Ritchie agrees with the paper's authors, however, saying there has been too much emphasis put on growth mindsets.

"The results of this study should make teachers -- many of whom are very interested in the topic of mindset, and have changed their teaching practice because of it -- seriously reconsider the amount of time, effort, and resources their schools invest in promoting 'growth mindsets' in their students", he says. "The benefits appear to have been substantially oversold."

When it comes to how to educate children, McNamara wishes to leave those decisions to the people involved. "There may be a small net benefit to these interventions," she says. "Conversely, there may not be an overall effect."

But in the context of a situation in which many other techniques are not proven to work at all, Yeager thinks it's best to persevere. You may say he has a growth mindset about growth mindsets.

"The fact that such light touch interventions can ever have any effect on important, multiply-determined outcomes is somewhat amazing," says Yeager, "especially when you consider that many, or even most very extensive and expensive educational programs have no effect at all." [17]

刻意练习(Deliberate Practice)、坚毅力 (Grit)、成长型思维模式 (growth mindset)，这些学术成果都衍生出科普通俗读物。中文译本出版时大概都有段间隔时间，但是出版社只引进书，和国内外专家或非专家的一致好评。许多不同的观点(支持、中立、反对)都散见于西方媒体平台上，也可以直接搜寻学术文献。对读者来说，区分哪些是现有研究的观点、哪些是过度炒作的成功学，对于想应用书中理论的读者来说，可能是更好的方式。原文和科普文章之间仍然存在着差异，要进一步理解棉花糖概念复制实验阅读原文会是一个好管道。倘若忽略研究的前提与局限等因素，急着将科普文章的某项新研究结果拿来就用，推崇为万灵丹或毒鸡汤的标签都显得过犹不及。

References

1.Nussbaum, D. Conceptual Replication. David Nussbaum (2012). at <http://davenussbaum.com/blog/conceptual-replication-part-i>

2.Watts, T. “Replication” is in the Eye of the Beholder |. NYU Steinhardt At a Glance (2018). at <https://steinhardt.nyu.edu/site/ihdscblog/2018/06/12/replication-is-in-the-eye-of-the-beholder/>

3.Toppo, G. New findings cast doubt on 'marshmallow test' success claims. http://Insidehighered.com (2018). at <https://www.insidehighered.com/news/2018/06/06/new-findings-cast-doubt-marshmallow-test-success-claims?platform=hootsuite>

4.Resnick, B. The "marshmallow test" said patience was a key to success. A new replication tells us s’more. Vox (2018). at <https://www.vox.com/science-and-health/2018/6/6/17413000/marshmallow-test-replication-mischel-psychology>

5.Duckworth, A., Tsukayama, E. & Kirby, T. Is It Really Self-Control? Examining the Predictive Power of the Delay of Gratification Task. Personality and Social Psychology Bulletin 39, 843-855 (2013).

6.Kidd, C., Palmeri, H. & Aslin, R. Rational snacking: Young children’s decision-making on the marshmallow task is moderated by beliefs about environmental reliability. Cognition126, 109-114 (2013).

7.Michaelson, L., de la Vega, A., Chatham, C. & Munakata, Y. Delaying gratification depends on social trust. Frontiers in Psychology 4, (2013).

8.McGuire, J. & Kable, J. Rational temporal predictions can underlie apparent failures to delay gratification. Psychological Review 120, 395-410 (2013).

9.Watts, T., Duncan, G. & Quan, H. Revisiting the Marshmallow Test: A Conceptual Replication Investigating Links Between Early Delay of Gratification and Later Outcomes. Psychological Science 095679761876166 (2018). doi:10.1177/0956797618761661

10.Calarco, J. Why Rich Kids Are So Good at the Marshmallow Test. The Atlantic (2018). at <https://www.theatlantic.com/family/archive/2018/06/marshmallow-test/561779/>

11.Watts, T. Twitter. http://Twitter.com (2018). at <https://twitter.com/tw_watts/status/1001125114778476544>

12.VerBruggen, R. Did the Marshmallow Test Fail to Replicate?. Institute for Family Studies (2018). at <https://ifstudies.org/blog/did-the-marshmallow-test-fail-to-replicate>

13.Wicherts, J. & Scholten, A. Comment on "Poverty Impedes Cognitive Function." Science342, 1169-1169 (2013).

14.McClelland, G., Lynch,, J., Irwin, J., Spiller, S. & Fitzsimons, G. Median splits, Type II errors, and false–positive consumer psychology: Don't fight the power. Journal of Consumer Psychology 25, 679-689 (2015).

15.Hambrick, D., Burgoyne, A., Macnamara, B. & Ullén, F. Toward a multifactorial model of expertise: beyond born versus made. Annals of the New York Academy of Sciences (2018). doi:10.1111/nyas.13586

16.Sisk, V., Burgoyne, A., Sun, J., Butler, J. & Macnamara, B. To What Extent and Under Which Circumstances Are Growth Mind-Sets Important to Academic Achievement? Two Meta-Analyses. Psychological Science 29, 549-571 (2018).

17.Beall, A. Everyone’s favourite psychology theory isn’t all it’s cracked up to be. http://Wired.co.uk (2018). at <http://www.wired.co.uk/article/growth-mindset-education-psychological-theory-children-mirage>

18.Yeager, D. S. et al. MANUSCRIPT UNDER REVISION: Where and For Whom Can a Brief, Scalable Mindset Intervention Improve Adolescents’ Educational Trajectories? (2018). Available at: http://osf.io/r82dw.

网友的相关建议:

这个“反转”必有一席之地：选取更有代表性的样本，并控制家庭背景、认知能力等变量后，著名的“棉花糖实验”的结果无法重复。有关这一实验的介绍，你肯定已在无数的心理学书籍、成功学读物和公号文中读到过：给小孩们展示很多零食，再告诉他们，实验者会离开一段时间。如果能忍住不吃掉这些美味，等到实验者回来，他们就可以得到更多好吃的。实验的两位设计者，Mischel等学者发现：那些能够忍住不去吃零食的小孩，之后的学习成绩更好，职业发展也更加顺遂。

尽管结果颇为震撼，这一实验的初始设计并非完美：首先，样本的选取颇为同质，基本都是斯坦福大学教职工的小孩；其次，十余年后回访时，他们只联系到约四分之一的参与者；最后，存在控制变量问题。新的重复实验中，三位作者选取了全美各地的样本，经济状况、种族构成及家庭教育方面，都有更强的代表性；他们还控制了诸多变量：家庭背景方面，控制了母亲教育、社会经济地位、家庭环境指数等因素；孩子能力方面，控制了记忆、识字、阅读等诸多个体因素。

对这些4岁半的孩子重复实验后，结果如上图：首先，不控制任何变量，忍受诱惑的时间长短，确实与15岁时的测试成绩密切相关；然而，控制家庭等变量之后，以上结果遽然消失。忍耐时间与未来学业间的关系，由正相关变成非线性——相比2-4分钟，忍耐超过4分钟的孩子，成绩可能还要差些；最后，再控制孩子个体层面的因素，结果仍然非线性。忍耐时间不同的组别之间，差异几乎全不显著。因此，相比于忍耐造就成功，现实更有可能是家庭环境的好坏，同时与二者正相关。

参考文献：Mischel, W, Y. Shoda, and P. K. Peake. "The nature of adolescent competencies predicted by preschool delay of gratification." Journal of Personality and Social Psychology 54.4(1988):687-696.

Tyler W, G. Duncan and H. Quan. "Revisiting the marshmallow test: A conceptual replication investigating links between early delay of gratification and later outcomes." Forthcoming, Psychological Science.

2017 年，心理学领域出现了哪些反转了经典理论的研究结果？的其他答案点击这里

2017 年，心理学领域出现了哪些反转了经典理论的研究结果？第1页

References

相关话题

前一个讨论

下一个讨论

相关的话题

2017 年，心理学领域出现了哪些反转了经典理论的研究结果？ 第1页

References

相关话题

前一个讨论

下一个讨论

相关的话题

2017 年，心理学领域出现了哪些反转了经典理论的研究结果？第1页