The research repository ArXiv will impose a one-year ban on authors who allow AI to conduct all their work.

ArXiv, a well-known open platform for preprint research, is intensifying its efforts to address the irresponsible utilization of large language models in scientific publications.

Despite the fact that papers are uploaded to the platform prior to peer review, arXiv (pronounced “archive”) has emerged as a primary medium through which research disseminates in areas such as computer science and mathematics, and the platform has also developed into a source of information on trends in scientific inquiry.

ArXiv has already implemented measures to tackle the increasing incidence of subpar, AI-generated papers, such as mandating that first-time submitters obtain an endorsement from a recognized author. Furthermore, after being operated by Cornell for over two decades, the organization is transitioning to an independent nonprofit entity, which is expected to enhance its ability to secure funding to tackle issues like AI-generated inaccuracies.

In its recent announcement, Thomas Dietterich — the head of arXiv’s computer science division — stated on Thursday that “if a submission presents undeniable evidence that the authors failed to verify the results of LLM output, this indicates we cannot rely on any aspect of the paper.”

Such undeniable evidence could encompass elements like “hallucinated references” and interactions with the LLM, according to Dietterich. If such evidence is identified, the authors of a paper will incur “a 1-year suspension from arXiv, followed by a condition that subsequent submissions must first be accepted by a credible peer-reviewed venue.”

It’s important to note that this is not a complete ban on the use of LLMs, but rather a demand that, as Dietterich noted, authors assume “full responsibility” for the material, “regardless of the method of content generation.” Thus, if researchers copy-paste “inappropriate language, plagiarized material, biased information, errors, inaccuracies, incorrect references, or misleading content” directly from an LLM, they remain accountable for it.

Dietterich informed 404 Media that this will follow a “one-strike” rule, yet moderators must identify the issue and section chairs must validate the evidence before enacting the penalty. Authors will also have the opportunity to contest the ruling.

Recent peer-reviewed studies have discovered an increase in fabricated citations in biomedical research, likely attributed to LLMs — although it’s worth noting that scientists aren’t the only ones caught utilizing citations created by AI.

When you purchase through links in our articles, we may earn a small commission. This doesn’t affect our editorial independence.