Nature reports that computer-generated fake papers are flooding academia
(From the Scigen site) SCIgen is a program that generates random Computer Science research papers, including graphs, figures, and citations. It uses a hand-written context-free grammar to form all elements of the papers. Our aim here is to maximize amusement, rather than coherence.One useful purpose for such a program is to auto-generate submissions to conferences that you suspect might have very low submission standards. A prime example, which you may recognize from spam in your
inbox, is SCI/IIIS and its dozens of co-located conferences (check out the very broad conference description on the WMSCI 2005 website). There’s also a list of known bogus conferences. Using SCIgen to generate submissions for conferences like this gives us pleasure to no end. In fact, one of our papers was accepted to SCI 2005! See Examples for more details.
According to Nature, scientists are now using (Scigen) to submit papers. French researcher Cyril Labbé revealed that 16 gobbledegook papers created by SCIgen had been used by German academic publisher Springer. More than 100 more fake SCIgen papers were published by the US Institute of Electrical and Electronic Engineers (IEEE). Both organizations have now taken steps to remove the papers.
Labbé has written a counter program and made it available to spot robotic papers.
It is not difficult to imagine that something like Scigen may now be widely spread in my own field. Recently I got into a fight with a colleague after he gave a talk to young biologists. My colleague was outlining what he saw as the path to success in the future. The path he described required teams of scientists. The job of these teams was to fill papers with so much data and so much expertise that it was probable that no one reading the papers would be able to understand, much less criticize, all of the data.
While my colleague obviously was not setting out a standard for scientific fraud, the effect is not that different. As biology has expanded its knowledge, the content of papers that is understood by individual scientists is decreasing rapidly. It is not at all difficult to imagine a computer program that would be able to take a so-so paper and add analyses that would in fact be meaningless.
A good example in my own field is the current use of computer software to analyze large data sets . It has now become almost de rigeur for biology papers to include lists of all the genes showing difference in expression. The numbers of such genes can run into the thousands. So, what do you do? Sadly the answer is not that different from Scigen. We have created computer programs whose job it is to look at these lists and offer us hunches as to what they mean. These hunches can appear very elaborate often backed up by impressive sounding statistical tests that few if any biologists understand. Worse yet, these papers are often sanctified by people with supposed expertise in “Bioinformatics.” The bioinformatics methods may or may not be valid, but they are usually well beyond the expertise of the biologists writing or reviewing the paper. I doubt very much that any review could tell whether the coauthor contributing much of this information is a robot or a human.
There are many other forms of data embroidery. Another colleague of mine founded a company to do large-scale analyses of certain proteins. Being a conscientious sort, my friend vetted something like 10,000 different antibodies. Of these antibodies, only about 10% met his criteria. Nonetheless biology papers are often based on uses of large numbers of antibodies with little evidence beyond the manufacturer’s claims that the antibodies are real. What is the possibility that any reviewer is going to check on the validity of 100s or even tens of antibodies?
Combining the ability of modern biology to embroider a story with so much different data with the creation of programs like Scigen is frightening. Maxwell Krohn, one of the authors of SCIgen, sees computers churning out ever more convincing papers, while other computers (or profit making firms) sell software designed to sniff out the fraud. Krohn said: “I’m psyched, it’s so great. These papers are so funny, you read them and can’t help but laugh. They are total bullshit. And I don’t see this going away.”
I wonder if Maxwell Krohn is smart enough to see all that bullshit?