related texts

==================================================================
The full text of the review from Neuroscience with my responses, indented. ==================================================================

General points: The point that I make is not challanged at all. In the first paragraph the reviewer pays lip service to the importance of replication, but then he goes into a demagogical effort to denigrate the paper.

NEUROSCIENCE MS No. A98138

Author: Y. Harpaz

Title: Replicability of cognitive imaging...

This paper addresses an extremely important issue, and the author is to be congratulated for undertaking the endeavor. If reproducibility across studies is as bad as he claims, no real progress will be made in the until the neuroscience community is made fully aware of th eproblem and furnishes a remedy or eplanation. The author is correct that lack of replication has not been discussed adaquately by the large majority of researchers, that many seem to be ignoring the problem, and that claims of replication are offten greatly exaggerated.

Unfortunately, not many readers will be convinced by the mostly qualitative observations presented here.

The question of how 'convincing' an article is is totally irrelevant to a scientific paper, and this kind of comment should never appear in a scientific review. Using this argument shows that the reviewer wants to block the publication of this paper, but cannot base the rejection on scientific grounds.

It should also be noted that the reviewer claims that the readers will not be convinced, not that he is not convinced. If he had made the latter claim, he would have to explain how come the paper is not convincing, so he avoids making this claim (because the paper is convincing).

In general, there is nothing wrong with qualitative observations, provided they are evaluated reasonably. There are question which can be answered only by qualitative observations, and the question of existence of replication studies in the literature is one of them.

While objecting to the unsubstantiated, subjective claims of investigators, the author frequently resorts to equally qualitative claims regarding lack of replication.
Typical demagoguery. The reviewer implies that my claims are 'unsubstantiated and subjective', but not by actually saying it. If had claimed directly that my claims are unsubstantiated, he would have had to support it somehow, so he doesn't.

In addition, the claim that I object to unsubstantiated and subjective claims is wrong. I object to claims that are clearly in disagreement with the data they are based on.

For example, on pgs. 10-11, he describes data that "seems (sic) to be randomly distributed," published figures form which "it is clear that these regions are only small part of the activation," and data that show activations that are "different" or "vary a lot." There are almost no quantitative analyses presented, and it gradually dawns on the reader that these personal judgements of the published figures constitute the majority of the critique.
An example of blatant nonsense. The whole point of publishing figures and tables is that the quantitative tendencies in the data are easily visible to everybody. The reviewer decides to object to me using these figures and tables, but does not give any ground for it.

This statement contains other demagogical tricks:

  1. "gradually dawns on the reader" - gives the reader of the review the impression that I claim in the beginning of the article to have something in it which I don't. This is obviously false, as I made it clear right from the title of the article what I am doing.
  2. "personal judgements" - These 'judgements' are obvious to any redaer, so the reviewer does not challange them directly. Instead he just refers to them by denigrating terms.
  3. "constitute the majority of the critique" - The paper is about lack of replication, and I demonstrate it convincingly. The reviewer tries to make the reader forget this point.
This paper would considerably stronger if the author used published stereotaxic coordinates as data, defined a maximal separation distance criterion for replication, and confined the analysis to comparisons between very similar experiments.
Another piece of demagoguery, here disguised as a 'contructive criticism'. Confining the analysis to 'very similar experiments' will eliminate the main point of this paper, which is that it is an almost complete survey of the literature. I discuss this point explicitly in the intruduction of my paper.
Demonstrating, for example, that less then 50% of activation foci are replicated across similar experiments (I suspect this would be the outcome) would be a truly alarming and forceful finding, and one not sullied by subjective interpretation.
Another example of blatant nonsense. As the reviewer himself says later in the review, these kind of variations are always explained away by small variations in the experimental setting, so nobody is worried about them. In addition, the reviewer implies again that I am using 'subjective interpretation', without actually saying it.

Another weakness of the paper is that the author makes no clear distinction between variation between sibjects and variation between studies,

A blunt lie. I distinguish between these all along the article, and in section 2 even have two separate subsections for each variation. The reviewer seems to be quite desparate by now.
and even seems to imply that that the former may be the main cause of the latter.
This contradict the first part of the sentence (you cannot imply that A causes B unless you distinguish between A and B), and contains another demagogical trick: I don't 'imply' that variation between subjects causes variation between studies. I suggest using the former as hint about the latter, and in section 3 explicitly hypothesise the causal relation.
There are, however, several reasons to distinguish these phenomena. First, most of researchers already accept individual variability as likely (at least for functions that are more learned and less hard-wired), while ignoring or denying variability across studies
An irrelevant comment, presented as if it is refutation to something that I said.
Second, even great variation at the the subject level does not preclude replication at the study level, as long as there is some central tendency across subjects and a large enough sample is taken (remember the central limit theorem?).
Another piece of demagoguery. The reviewer mislead the reader to believe that I claim that variation at the subject level preclude replication, which is false. I only use it as a hint about the likelihood of replication of the results.

Another interpretation is that the reviewer refers to my hypothesis in section 3, but this has no bearing on the evidence or the main conclusion of the paper. In addition he misrepresents it. The hypothesis is that there is no central tendency (because underlying cognitive architecture is different).

Another demagogical trick is the comment in parentsis, which is intended to create the impression that I don't understand mathematics.

Third, the statistical analyses used in these studies are intended to identify regions that are very likely to be activated in common across all or most subjects in the study. Thus, such analyses are already, in a sense, replication tests.
Another example of blatant nonsense. Identifying regions that are common to subjects in the study tells us nothing about the replicability of the results. For this, the researchers need to analyse the probability of these common regions arising by chance, taking into account the ranges of all the parameters than can be manipulated in the serach for common regions. This analysis is never done.
Because most such studies randomly select normal, healthy subjects from the same human subject pool, there is no compelling reason to think that studies with reasonable samples sizes (i.e. >10) should produce very different results merely because of normal variation among individuals.
Repeat the claim in his Second point above.
In contrast, small, systematic differences between experiment design factors in different studies could cause vastly different results different results, but the author devotes almost no attention to discussing this problem.
So what? The point of this article is the lack of replication, not discussion of the reasons for it. That there are possible explanations for the lack of reproducibility does not affect the main conclusions of the paper.

In addition, I do mention this point in section 3, and explain why it seems unlikely to explain all the results. The reviewer does not actually argue against what I am writing. He ignores it, and hope that the reader will miss it too.

In particular, there is no discussion of the role played by variations in activation and control tasks across supposedly similar studies, even though many investigators have stressed the probable importance of this factor.
The logic of this sentence is: other investigators stress this point, so you have to stress it too. That is reverse logic. If other investigators stressed this point, then there is no reason why I should repeat it. It is enough if I quote them, which I do.
Poeppel, for example, has exhorted investigators to more closely analyze the cognitive demands of the tasks they use, attributing the lack of replication across phonological processing studies to a lack of uniformity of the tasks used to engage such processing.
Subtle demagoguery: by quoting Poeppel, the reviewer wants to give the impression that I ignored this paper, which obviously I didn't, as I discuss it in the introduction.

A side point: Poeppel attributes the lack of replication to lack of uniformity of the tasks, but he does not give any evidence for it, i.e. he does not show that using uniform tasks does give replication.

By ignoring this factor, the author perpetuates the tendency to generically label tasks and studies without looking at the the details, even though this likely to be the place where, as usual, the devil is.
Another example of blatant nonsense. My paper certainly does not perpetuates any existing tendency in the field.

The English used in this paper is not good. Almost every paragraph contains several major errors, such as incorrect use of or omission of articles, or incorrect noun-verb agreement.

This reviewer is so hostile, that he even calls incorrect noun-verb agreement a 'major error'.
The words "replicabcle" and "replicability" do not exist in English, but "reproducible/reproducibility" or reliable/reliability could be substituted.