Back to main page
[ last updated 8 dec 2003 ]
The texts in the rest of this site are written for a reader that is quite familiar with scientific practices and the ideas behind them. However, I found that many of the people that read this site are quite remote from scientific practice. This page includes explanations for these people.
The pages about Blatant Nonsense Effect and Irrefutability of Nonsense-Arguments are also useful pages that do not require any familiarity with scientific practice. The discussion of "what is evidence" may also be useful.
Reproducibility (or its synonym replicability) is the property of an experiment that if you repeat it with essentially the same conditions, you get essentially the same results. The qualifier 'essentially' adds some blur to the definition, but in practice, in most of the cases there is no problem to agree what is essential and what isn't. Clearly, reproducibility is applicable only to experiments that can be repeated (Laboratory experiments). In cases of observations that are based on non-repeatable conditions (e.g. observing a supernova), the question of reproducibility is not applicable.
Reproducibility is an extremely important concept in science practice, and may be the most important one. The importance of reproducibility is based on the observation that laboratory experiments tend to go wrong (i.e. give wrong results) in an infinite number of ways. Therefore, there must be a way to filter out wrong results; otherwise they will swamp the real results. The only known way to filter out wrong results is trying to replicate the experiment. In general, it is so easy to go wrong without tests for replicability that, for laboratory experiments, if it is not reproducible, it is not science.
Experience showed that an important factor in producing wrong results is a bias of the experimenters. Even if the experimenters do not knowingly affect the results, there are many ways in which they affect the results without being aware of doing it. Therefore, replication by the same group of experimenters is always somewhat suspicious, and only replication by another group of experimenters (independent replication) is regarded as a real replication.
In general, it is not so interesting to repeat exactly the same experiment that somebody else has already done, and scientific journals normally would not publish it. Therefore, normally other groups do not repeat the experiment exactly, unless the result is extremely significant. In most of the cases, they repeat an experiment with a small change in the conditions, and expect a small change in the results. Only if the results are grossly different from the original results (it is common in the scientific literature to say in this situation that 'the results are controversial'), they will try to repeat the original experiment more accurately. Thus you normally don't find exact replication, but close replications.
For example, if one group has found the 3D-structure of a protein, other groups are not going to publish papers showing the same 3D-structure. They may find the 3D-structure of a similar protein, or maybe the same protein with some mutations, and try to deduce some conclusion from the differences between the structures. On the way, they effectively validated the original result by showing that a similar-sequence protein gives a similar 3D-structure.
The point of the cognitive brain imaging replicability paper is that it shows that replication, of this kind or otherwise, does not happen at all in the sub-field of cognitive brain imaging. In other fields of research, that would cause researchers in the rest of the field to stop regarding this sub-field as a serious science, and the researchers in the sub-field itself to put all their efforts in making sure that they can produce replicable results. In cognitive science, however, the response is different. They simply reject my paper, and continue to accept irreproducible results.
It should be pointed that while my paper establishes the lack of replication in cognitive brain imaging, this lack is obvious to researchers in cognitive science, and can easily be deduced from reading even a single paper. This is because these papers cannot tie their results to previous results. In general, when a scientific paper shows some graphical results like the active pixels in brain images, they are able to point to some features in the data that correspond to known features. In cognitive brain imaging they can't. This point is difficult to see for a non-expert, but it is obvious to any cognitive scientist.
That the lack of reproducibility is known to cognitive scientists is clear from the various reviews and responses from the editors that this paper got (links in the top of the paper). None of the referees or editors expressed any surprise at the finding, they all knew it already.
Scientific papers are (in general) reviewed (or refereed), and only reviewed papers are regarded as 'real' scientific papers. This is because if people are allowed to publish anything they feel like, the garbage will swamp the serious stuff. Thus the review process is an essential part of scientific practice, and has the role of filtering the garbage out. The importance of the review process is almost universally accepted by scientists.
There are no universal rules about how the review process should work, and there are large variations. What I describe here is the 'central tendency' of the process.
Once researchers wrote a paper, they send it to a journal (submit it for publication). In principle, the 'researchers' in the previous sentence can be anybody, not only people that are formally scientists. Most of journals agree to consider a paper only if it is not considered to publication by other journals at the same time, so the researchers can submit to one journal only at any time.
Once the journal receives the paper, it is handled by the editor. In small journals, there is only one editor and he is 'the editor'. In large journals, there are more than one editors, and one of them is assigned to deal with the paper. The actual title of this person is not necessarily 'editor'. However, I will refer to this person as 'the editor'.
The editor is dealing with the review of the paper, and is the one that ultimately decides if the paper is worth publishing, on scientific grounds. Therefore, he must be an expert in the relevant field. In general, editors are senior scientists in the field, even in journals that are published by commercial bodies. In the CV of a scientist, being an editor is an important highlight, especially if it is a well-known journal.
In most of the cases, each journal has some guidelines of how the review process should be done, but the editor has quite a large margin for variations.
Typically, the editor will send the paper for a review by two or three referees. The selection of the referees is the responsibility of the editor. In general, they will have expertise in the field, but it is not regarded as important that they be senior persons in the field. The editor finds referees using his/her knowledge of the field. They may be people that the editor knows personally, but it can also be people that the editor knows only from their publications in the field.
The referees are in general not paid for doing the review, and they do it because scientists know that it is part of their job to review other people's papers. In addition, it is a useful way of getting into the net of personal connections in the field: once you have reviewed a paper for an editor, this editor, which, as mentioned above, is normally quite senior person, 'owes you one' (even though quite a small one). One of the reasons that editors need to be senior persons is that the more senior the editor is, the easier it is for him to find referees.
The informal 'norm' for the number of referees is three, but because it is not always easy to find referees, editors in many cases use only two reviews or even only one. The latter, however, would be regarded as highly undesirable. That is because it is generally agreed that the quality of the reviews is not so high, so a single review is not a reliable indication.
The review process is normally a confidential review. This means two things:
The idea of the latter is that the referee can give a negative review without incurring a personal hostility from the authors. This is especially important if the authors include a senior scientist. Keeping the names of referees secret has many associated problems, and therefore it is not universally used. However, the importance of protecting the referees is in most of the cases regarded as more important than the associated problems.
In some cases the editor also keeps the name of the author secret from the referees, but this is quite rare.
In general, the editor needs to find from the referees if the paper worth publishing as it is, worth publishing after some revision, or not worth publishing at all. Most of the journals ask the referees some specific question, but they also need to write a review of the paper to base their opinion on. The format of this review is very variable.
What should the referees base their opinion on? This is a problematic question, and my experience is that most scientists cannot actually answer this question immediately. After some thinking, most of them will reach a conclusion similar to: the referee needs to check that the paper is sound and interesting. 'Interesting' means it has something new in it, in the field that the journal specializes in. 'Sound' is a more open question. In general, it means that the methods used in the paper are known to be reliable (unless the paper introduces a new method, in which case it needs to be tested by reliable methods), the logic of arguments is valid, and relevant work by other researchers is not ignored.
An important point to note what the referee should not judge: the correctness of the conclusions of the paper. The latter has to be verified by later work in the field by a process of convergence of opinions based on accumulation of data. That is because science is working on the boundaries of knowledge, and there is no way anybody can reliably judge the correctness or otherwise of any novel statement which is based on what looks like sound methods and arguments.
Once the referees wrote their reviews, they send them to the editor. The editor then decides how to respond to the authors. In general, if all the referees agree, the editor would be expected to accept their judgement, but the reality is that there is nothing that compels him/her to do so. If the referees do not agree, the editor has to make up his/her mind. In some cases the editor will send the paper to more referees, but since it is a problem to get referees, this is quite rare.
The response that the editor sends to the authors includes the reviews written by the referees, normally without the names of the referees (confidential review as explained above). In addition, the editor will write his/her decision, which is mostly one of three possibilities: accepted for publication, rejected, or accepted conditioned on a revision. In the latter case, the required revision will normally be based on suggestions for revision in the reviews written by the referees.
If the editor 'accepts with revision', the authors normally revise as appropriate, and send the revised version to the editor. The editor may give it to the referee(s) that requested revision, or may accept the revised version without further review.
If the paper is rejected, which is a quite common event, the normal action of the authors is to submit it to another journal. This is a standard behaviour, and it is quite common for papers to be serially submitted to several journals. In some cases, a paper that was serially submitted and rejected is found later to be an important advance. The logic of "serial submissions" is that the quality of review process is not that high, and the negative reviews may be bad luck. That may be because it reached referees with bias against the conclusions, or without enough expertise to appreciate the novel points, or with personal animosity to some of the authors, or maybe other irrelevant factors. The hope is that in the next journal the paper will have a better luck, and the general assumption is that a good paper will be 'lucky' reasonably fast.
It is worth noting the somewhat weird structure of the process: The referees are the ones that actually read and evaluate the paper, but it is the editor that actually makes the decision. In addition, the names of referees are normally secret. Thus it is not obvious who is actually responsible for the quality of the reviews. The referees cannot be held responsible because their names are kept secret, and they don't actually make the decision. The editor may be regarded as responsible, but he didn't actually write the reviews, and probably hasn't read the paper.
The editor is supposed to at least read the reviews and understand what they say, and hence form a judgement of the paper. In reality, however, in many cases (maybe most) the editor simply check if the referees recommend or not, and pass it on. Some journals actually have a system for the editor to rate the reviews, but I think this is pretty rare. Most journals leave it to the editor to decide how much effort to put in it.
It is clear from the discussion above that the review process has many flaws. However, its flaws are actually reflection of the limitations of humans, and nobody has yet come up with a better idea of how to filter out the garbage from real science. This filtering is vital and therefore the review process continues to be used with all its flaws. That is the main reason it is so variable, because different journals try different ways of getting over the flaws.
What makes the cognitive brain imaging replicability paper and the associated reviews and the reviews of the brain-symbols paper very significant is that they show that in cognitive science, the review process does not work. This is particularly evident in the case of the cognitive imaging paper itself, because it shows that cognitive brain imaging the way it is done now is not a sound method (because it cannot generate reproducible results), and hence papers that are based on it must be rejected, but they are not. The reviews in both cases also show a broken system, because none of them can be regarded as a fair review. As I wrote above, 2-3 bad reviews may be regarded as bad luck, but consistent junk as I got about these two papers shows a strong bias in the field, which overrides basic level of sensibility and honesty.
The terms that are used to describe the publication process of scientific papers ('article', 'letter', 'journal') suggest similarity to other form of publications (newspapers and journals), but this is very misleading.
Most of scientific papers are read by a very small number of people (tens or less) when they are published. Substantial number of them are not read by anybody at the time they are published. The reason for that is that most of papers contain a relatively small piece of new data or ideas, which to be useful needs to be integrated with other pieces of ideas and data. This integration is difficult to do at the time of publication.
Most of the readings of the papers happen later, when researchers search the literature for previous work. When they do that, they find many papers that are related to their work, and together with their own expertise, are (hopefully) able to integrate it all into a better personal understanding of the system under investigation. If that is actually successful, they are in a better position to design new experiments and come with new ideas. Thus a scientific paper is not like an article in a newspaper, and is actually better described as a record in a database, which describes what researchers have done.
Because scientific papers are mainly used when found by a search, being able to search them is very important. Until short time ago, this meant going to the library and searching manually all the abstracts of potentially interesting papers. By now, it is possible to do it electronically. However, because of the commercial structure of the publication process, this is hindered. See here for extended discussion of this point.
The mode of usage of scientific papers also means that the evaluation of their value is difficult: except for some outstanding papers, most of papers are just a record in a database, and it is the value of the database as a whole that is important. Thus generating large number of papers, most of which are read by very few if at all, is not necessarily bad, provided it increases the probability that when researchers search, they find relevant and useful papers. This probability obviously depends on other factors (quality of papers, search tools, accessibility of the papers), so it is not an argument for an unbounded increase in number of published papers, but it does mean that papers with a small number of readers are not necessarily useless.
Most lay people learn about various research areas by reading popular science books, i.e. books that are explaining science in a simple and easy to read manner. In fields in which there is already substantial well-established understanding that the public, in general, is not aware of, this is quite reasonable way of learning.
In fields where there isn't such well-established understanding, however, it is actually a bad idea to try to learn anything from popular science books. The reason for that is that to be popular, a book needs to draw a clear and easy to grasp picture. In a field where a clear picture doesn't exist, the author needs to invent such picture. The only way of doing is to use distortion: ignore conflicting data, use invalid lines of argument, and various other maneuvers. The distorted result makes its more difficult to the reader to understand the real situation.
Most of people have a problem with the argument above, because they find it difficult to imagine that popular science books may be using these kinds of maneuvers. However, this is quite common (two extreme examples: Kauffman, Carter). In some sense, this lack of belief in bad writing is a kind of Blatant Nonsense effect.
A common defense of distorted representation is that the author needs to simplify the picture. That is simply invalid argument: simplifying is not the same as distortion, and the distinction is important. In many cases, people use the "simplification" argument as an implicit way of saying that the book is not a distortion, but because they don't make the latter claim explicitly, they don't need to defend this claim.
Another reason to find it difficult that popular science show a distorted picture is that lay people believe that other scientists will criticize books that present a distorted picture. That is simply false. In general, scientists don't criticize other scientists' books, unless they have a direct interest. Even in this case, they will try to give an overall positive review of the book, even if criticizing some ideas in it. For example, the two books above by Carter and Kauffman are, as my reviews show, serious garbage, but you will be pressed hard to find any harsh criticism of either.
Scientists don't criticize other scientist's books because they want theirs books to be also positively reviewed, and to some extent because they, and lay people, don't realize how damageful is a distorted picture. The problem with distorted picture is that once the reader accepted it, they will tend to ignore and dismiss any contradictory evidence, and to misinterpret ambiguous data as fitting the distorted picture. It is a serious problem for professional scientists, but at least they know that part of their job of to carefully evaluate the data in their field, so have some (small) chance of getting over their bias. Lay people do not have this obligation, so for them accepting a distorted picture is, in most cases, catastrophic: They will probably never evaluate the data carefully enough to change their minds.
Another point that most scientists miss is that experts and lay people understand distorted text differently. The experts already know the field, so they will tend to interpret the text in a way that corresponds to their knowledge. Lay people don't know the field, so interpret the text the way it is written, taking in all the misleading statements and their implications. For example, when Kauffman claims that cell fusion may "unleash a cataclysmic supracritical explosion", an expert knows that this is rubbish and ignores it. A lay person will get the impression that cell fusion is a dangerous operation that has not been explored yet. The latter implication is not actually written in the text, but every reader that is not an expert will make it.
Journalistic reports in newspapers etc. suffer the same problem, and are actually even worse, because authors of books try to at least give some coherent picture, while journalistic reports do not. However, journalistic reports carry must less authority with the public, and are therefore probably less of a problem.
Cognitive Science is an area that is still very confused, and there are no established global theories. As a result popular science books of cognitive science are currently always a distortion of the real field. At the moment, the only way to get a better understanding of cognitive science is to read textbooks, because these are less dependent on popularity, and are written with more attention to reliability.