RSS

Are Robots Writing Papers? Embroidering Facts Leads to Pubication

Nature reports that computer-generated fake papers are flooding academia

 Jeremy Stribling, Dan Aguayo and Maxwell Krohn. three MIT graduate students created  SCIgen, a  computer program that writes academic papers.
In 2005, they submitted  one of the papers to a conference, and it was accepted

(From the Scigen site) SCIgen is a program that generates random Computer Science research papers, including graphs, figures, and citations.  It uses a hand-written context-free grammar to form all elements of the papers.  Our aim here is to maximize amusement, rather than coherence.One useful purpose for such a program is to auto-generate submissions to conferences that you suspect might have very low submission standards.  A prime example, which you may recognize from spam in your

SCIGEN must be an attractive name.  I found at least three companies using the name.  This one seems eerily close to the MIT program.

SCIGEN must be an attractive name. I found at least three companies using the name. This one seems eerily close to the MIT program.

inbox, is SCI/IIIS and its dozens of co-located conferences (check out the very broad conference description on the WMSCI 2005 website).  There’s  also a list of  known bogus conferences.  Using SCIgen to generate submissions for conferences like this gives us pleasure to no end.  In fact, one of our papers was accepted to SCI 2005!  See Examples for more details.

According to Nature, scientists are now using  (Scigen)  to submit papers.  French researcher Cyril Labbé revealed that 16 gobbledegook papers created by SCIgen had been used by German academic publisher Springer. More than 100 more fake SCIgen papers were published by the US Institute of Electrical and Electronic Engineers (IEEE). Both organizations have now taken steps to remove the papers.

 Labbé has written a counter program and made it available to spot robotic papers.

It is not difficult to imagine that something like Scigen may now be widely spread in my own field. Recently I got into a fight with a colleague after he gave a talk to young biologists. My colleague was outlining what he saw as the path to success in the future. The path he described required teams of scientists. The job of these teams was to fill papers with so much data and so much expertise that it was probable that no one reading the papers would be able to understand, much less criticize, all of the data.

While my colleague obviously was not setting out a standard for scientific fraud, the effect is not that different. As biology has expanded its knowledge, the content of papers that is understood by individual scientists is decreasing rapidly. It is not at all difficult to imagine a computer program that would be able to take a so-so paper and add analyses that would in fact be meaningless.

A good example  in my own field is the current  use of computer software to analyze large data sets .   It has now become almost  de rigeur for biology papers to include  lists of all the genes showing difference in expression.   The numbers of such  genes can run  into the thousands.  So, what do you do?  Sadly the answer is not that different from Scigen.  We have created computer programs whose job it is  to look at these lists  and offer us hunches as to what they mean.    These hunches can appear very elaborate often backed up by impressive sounding statistical tests  that few if any biologists understand. Worse yet, these papers are often sanctified  by people with supposed expertise in “Bioinformatics.”  The bioinformatics methods  may or may not be valid, but they are usually well beyond the expertise of the biologists writing or reviewing the paper.   I doubt very much that any review  could tell whether the coauthor contributing   much of this information  is a robot  or a human.

There are many other forms of data embroidery.    Another colleague of mine  founded a company to do large-scale analyses  of certain proteins.  Being a conscientious sort,  my friend vetted something like 10,000 different antibodies.  Of  these antibodies,  only about 10% met his criteria.    Nonetheless  biology papers  are often based on  uses of large numbers of antibodies with little evidence beyond the manufacturer’s claims that the antibodies are real.  What is the possibility that any reviewer is going to check on  the validity of 100s or even tens of antibodies?

Combining the  ability of modern biology  to embroider a story with so much different  data with the creation of programs like Scigen is frightening. Maxwell Krohn, one of the authors of SCIgen,  sees  computers churning out ever more convincing papers, while other computers (or profit making firms) sell software designed to sniff out the fraud.  Krohn said: “I’m psyched, it’s so great. These papers are so funny, you read them and can’t help but laugh. They are total bullshit. And I don’t see this going away.”

I wonder if Maxwell Krohn is smart enough  to see all that bullshit?

 


Comments are closed.