ArXiv Enforces One-Year Ban for Unvetted AI-Generated Submissions
ArXiv, the widely used open-access repository for preprint research in computer science, mathematics, and physics, has introduced a strict new penalty: authors who submit papers containing obvious signs of unchecked AI generation will be banned for one year. The policy was announced by Thomas Dietterich, chair of arXiv's computer science section, on Thursday. He explained that submissions with "incontrovertible evidence" of unvetted large language model output leave the platform unable to trust any part of the paper.
The rule is not a blanket prohibition on AI tools. Researchers may still use language models for drafting, editing, or analysis. What triggers the penalty is evidence that an author pasted LLM output into a paper without reviewing it. This includes hallucinated references, placeholder instructions from the chatbot, or fabricated data tables with notes like "fill in with the real numbers from your experiments." If moderators find such evidence and a section chair confirms it, the author faces a one-year ban. After the ban ends, all subsequent submissions must first be accepted by a peer-reviewed journal before they can appear on the platform.
Why the Policy Matters
ArXiv is not a peer-reviewed journal, but it has become the primary way research circulates in fast-moving fields like machine learning and artificial intelligence. Papers posted to arXiv are read, cited, and built upon long before formal publication. This makes the platform's quality standards unusually important: a hallucinated citation on arXiv can propagate through the research literature as quickly as one in a peer-reviewed journal, often faster.
The scale of the problem is significant. A study published in The Lancet in May 2026 by Columbia University researchers audited 2.5 million biomedical papers and 126 million references indexed on PubMed Central. It found that fabricated citations have risen twelvefold since 2023. In 2023, roughly one in 2,828 papers contained at least one fake reference. By 2025, the rate had climbed to one in 458. In the first seven weeks of 2026, it was one in 277. The researchers attributed the surge to the proliferation of AI writing tools, noting that previous studies estimate 30 to 69 percent of LLM-generated references in biomedical contexts are fabricated.
ArXiv has reason to take the threat seriously. The platform receives thousands of submissions each month, and its volunteer moderation system was not designed to screen for machine-generated content at scale. Dietterich's announcement described the new penalty as a "one-strike" rule, though decisions are subject to appeal and require confirmation by a section chair before being imposed.
What Counts as Evidence
The policy is deliberately narrow. Dietterich listed specific examples of "incontrovertible evidence": hallucinated references that do not correspond to any real publication, meta-comments from the language model left in the text (such as "here is a 200-word summary; would you like me to make any changes?"), and placeholder data with instructions to the author that were never removed. These are not subtle quality failures. They are signs that the author did not read the paper before submitting it.
The distinction avoids the challenging question of whether AI-assisted writing should be permitted at all. ArXiv's existing policy already states that authors bear "full responsibility" for their content "irrespective of how the contents are generated." The new penalty enforces that principle by targeting the most egregious violations, cases where the author's failure to exercise any oversight is provable from the text itself.
This approach has practical advantages. Detecting whether a well-edited paper was drafted with the help of an LLM is unreliable with current detection tools. Enforcing a broader ban would be technically difficult and potentially punitive toward researchers who use AI tools responsibly. By focusing on obvious slop, Article can enforce the rule without needing to build or buy an AI-detection system, a technology that remains prone to its own errors.
A Broader Problem in Academia
ArXiv is not alone in struggling with this issue. Academic conferences in computer science, including NeurIPS and ICML, have reported surges in submissions that appear to be generated with minimal human oversight. Nature published a feature in late 2025 describing how AI slop is creating a crisis in computer science, overwhelming reviewers and diluting the field's signal-to-noise ratio.
Peer-reviewed journals face the same problem. The Lancet study found that fabricated citations appeared in papers that had already passed peer review, suggesting that reviewers are either not checking references or are unable to identify fabrications at the rate they now appear. Lead author Maxim Topaz, of Columbia University's School of Nursing, warned that clinicians and guideline developers have no way of knowing when the evidence they rely on does not exist. This gap persists despite efforts to reduce AI hallucinations in scientific research.
ArXiv itself is undergoing structural changes that may help address the challenge. After more than 20 years as a project hosted by Cornell University, the platform is becoming an independent nonprofit, a move that should give it greater autonomy over moderation policies and the ability to raise funds specifically to combat quality problems. It has also introduced a requirement for first-time submitters to obtain an endorsement from an established author, a gatekeeping measure aimed at reducing the volume of submissions from accounts created solely to publish AI-generated material.
The Limits of Enforcement
The new rule will catch the most careless offenders: researchers who submit papers they have not read. It will not catch researchers who use language models to generate plausible but incorrect claims, fabricate data, or produce papers that are fluent but scientifically vacuous. Those problems require peer review, institutional oversight, and a willingness within the research community to treat AI-assisted misconduct with the same seriousness as traditional forms of fabrication.
What arXiv's policy establishes is a principle: if you submit a paper, you are responsible for every word in it. That has always been true in theory. The difference now is that language models have made it trivially easy to produce text that reads like science but contains nothing of substance. ArXiv's one-year ban is a modest penalty for a serious offense, but it is also the first formal acknowledgment by a major research platform that the problem is no longer one of occasional carelessness. It is structural, growing, and requires dedicated infrastructure to combat.