Wikipedia talk:WikiProject AI Cleanup#WP:LLMN?

    Usage of AI for a large edit

    [edit]

    I don't know where else to say this. I noticed this revision and this article, The Impact of Oil Spills on Aquatic Animal Life in the United States, which are clearly created with the assistance of a LLM.

    I am a newcomer and I don't know how these are handled. What should be done about this? I genuinely don't think the article is a good fit for an encyclopedia, and checking/reworking everything that was included in the linked revision is a huge chore. I couldn't verify most of the sources used. I don't know if they're real, though I manage to find at least one of them. MeowsyCat99 (talk) 13:14, 13 June 2025 (UTC)[reply]

    Heavily LLM-generated and clearly not verified. I'm willing to put it up for AfD and advocate for TNT as I don't think attempting to salvage that level of generated content is worth any editor's time, not to mention other fundamental issues with the topic. Will wait a short time to see if any editors have a better suggestion. fifteen thousand two hundred twenty four (talk) 14:13, 13 June 2025 (UTC)[reply]
    My general approach now is to strip out made up sources and those that don't corroborate the sentence or paragraph they're attached to, and then send the article to draft with a reason of LLM-written text. I have also tried nominating for speedy deletion as {{db-g3}} (hoax/vandalism) if it is particularly bad. In this case I'd probably try the draftify approach: I note that the creating editor is part of this WikiEd course so would probably notify the course leader / WikiEd person too. Cheers, SunloungerFrog (talk) 14:34, 13 June 2025 (UTC)[reply]
    Draftified
    I would normally still advocate deletion to avoid other editors unknowingly getting caught up in LLM-cruft when trying to improve other's drafts, but will give draftifying a try this time.
    The creator has been warned with {{uw-ai1}} and is now aware that using generated content like that is problematic, if they persist then I agree that contacting a course leader would be appropriate. fifteen thousand two hundred twenty four (talk) 02:52, 15 June 2025 (UTC)[reply]

    Good faith use of Gemini

    [edit]

    [1] Not sure what to do. Doug Weller talk 16:50, 19 June 2025 (UTC)[reply]

    I've told the editor. Doug Weller talk 16:51, 19 June 2025 (UTC)[reply]
    Hi Doug,
    FWIW, here are the three prompts I used from Gemini 2.5 Flash:
    1) Can you generate an updated economic summary using 2024 data for Guyana using the format below, and providing referenced sources for each data point that could be integrated into the Wikipedia page for it located at
    https://en.wikipedia.org/wiki/Guyana
    2) Can you also provide in Wikipedia format the list of references in your prior answer, also including verified working http links to webpages for each one?
    3) Can you
    1) find an alternative source than the website tradingeconomics.com for that reference, and if you cannot, remove that data and reference as it is blacklisted by Wikipedia
    2) and then provide a combination of the last two answers as a single body of Wikipedia text markup , modeled on the format below, but integrating the data you have just collated in the past two answers. Please double check that both the data and coding for Wikipedia markup are accurate.
    And then I made hand-tweaks of a few things that weren't perfect.
    Is there a Wikipedia good-faith-AI crew collating efforts like this?
    It makes no sense to have the world's data centers regenerating the same kinds of outputs afresh when efforts could be strategically coordinated to flow the data to Wikipedia (among those inclined to do so).... Vikramsurya (talk) 17:02, 19 June 2025 (UTC)[reply]
    The problem is this, from your edit summary Data needs full verification but preliminary suggests it's accurate. You should only make edits that you have already fully verified are borne out by the sources, not just a vague suggestion that they're probably accurate. There are also three random inline citations on a line by themselves after the Imports bullet, and there's something wrong with the formatting of ref 57. Cheers, SunloungerFrog (talk) 17:25, 19 June 2025 (UTC)[reply]
    PPP sources are broken, the sites list the data as being both for Guyana and Chad. Under "arable land" the hectare claim is not found in the source. Under "labor force participation" the rate in the source is 49.6%, not 56.4%. Under "industrial production" neither source mentions crude petroleum, gold, timber, or textiles.
    The model's output can be characterized as "subtly wrong", this is par. fifteen thousand two hundred twenty four (talk) 19:15, 19 June 2025 (UTC)[reply]
    AI hallucinating? Doug Weller talk 19:39, 19 June 2025 (UTC)[reply]
    Possibly some hallucination, but sourcing misattribution has certainly occurred, which can be viewed as better or worse. The arable land claim of 420,000 hectares (but not "more than") is the exact figure in Wolfram's database, but the prompt requested "working http links to webpages", so the model's pattern contained a link, even if wrong. fifteen thousand two hundred twenty four (talk) 04:39, 20 June 2025 (UTC)[reply]
    Misattribution and hallucination are really the same issue, the AI is finding words and numbers that fit the pattern it develops. CMD (talk) 05:31, 20 June 2025 (UTC)[reply]
    I have a question - when did you think the verification by other editors would occur? If I was watching the page and started to check and found more than a couple of errors, I would just revert the whole edit with a request not to submit error-strewn material. Why? Because I would judge that the edit overall could not be trusted if there were already this many faults and I wasn't going to waste my time looking further. This is something that happens all the time: we are all volunteers who shouldn't be making work for each other like this. That doesn't mean using an LLM is bad. It's saved you time doing some of the formatting. That frees you up to do what the LLMs are bad at, which is fine-grained fact-checking of reliable sources. OsFish (talk) 05:44, 20 June 2025 (UTC)[reply]

    Royal Gardens of Monza

    [edit]

    I'm not super familiar with the process here, but Royal Gardens of Monza seems like it might be AI generated to me - two of the books it cites have ISBNs with invalid checksums, the third doesn't seem to resolve to an actual book anyways, it cites dead URLs despite an access date of yesterday, and uses some invalid formatting in the "Design and features" heading. The author has also had a draft declined at AFC for being LLM-generated before. ScalarFactor (talk) 23:07, 21 June 2025 (UTC)[reply]

    You are correct. I've draftified and tagged the article, left notices on the draft and creator's talk pages, and notified the editor who accepted the draft at AfC. I think Fazzoacqua100's other articles should be reviewed for similar issues. fifteen thousand two hundred twenty four (talk) 01:22, 22 June 2025 (UTC)[reply]
    Their other submissions and drafts have now been reviewed, draftified, and had notices posted where appropriate. Thank you @ScalarFactor for posting here. fifteen thousand two hundred twenty four (talk) 04:40, 22 June 2025 (UTC)[reply]
    No problem - thanks for dealing with the cleanup. ScalarFactor (talk) 05:15, 22 June 2025 (UTC)[reply]

    More signs of LLM use from my recent AfC patrolling

    [edit]

    For the past month I've been participating in the WP:AFCJUN25 backlog drive, and oh man, I've been finding a LOT of AI slop in the submission queue. I've found a few more telltale signs of LLM use that should probably be added to WP:AICATCH:

    (oh god, these bulleted lists are exactly the sort of thing ChatGPT does...)

    • Red links in the See also section — often these are for generic terms that sound like they could be articles. Makes me wonder if an actually practical use of ChatGPT would be to suggest new article titles... as long as you write the article in your own words. I'm just spitballing here.
    • Fake categories, i.e. red links that sound plausible, but don't currently exist in our category system.
    • Thin spaces? Maybe? I've been encountering a surprisingly high number of Unicode thin space characters, and I'm wondering if there's some chatbot that tends to use them in their output, because I don't know of any common keyboard layouts that let you type them (aside from custom layouts like the one I use, but it seems vanishingly unlikely that some random user with 2 edits is using one of those).

    Anyone got any more insights on any of these? pythoncoder (talk | contribs) 21:05, 30 June 2025 (UTC)[reply]

    Forgot to link a thin space example: Draft:Independent National Electoral and Boundaries Commission (Somalia)
    Another sign I just found: Draft:Opaleak has a bunch of text like :contentReference[oaicite:3]{index=3} in place of references. pythoncoder (talk | contribs) 21:11, 30 June 2025 (UTC)[reply]
    @Pythoncoder Could you note where the thins paces are in that example? CMD (talk) 02:35, 1 July 2025 (UTC)[reply]
    Just double-checked and it looks like they're actually narrow nonbreaking spaces (U+202F) — copy and paste into your find-and-replace dialog: > <
    They appear twice here: "On 15 April 2025, INEBC rolled out..." and "unanimously adopted Law No. 26 on 16 November 2024." pythoncoder (talk | contribs) 02:50, 1 July 2025 (UTC)[reply]
    Another one: excessive use of parentheses any time a term with an acronym show up, even if the acronym in the parentheses is never used again in the article. Sometimes it even does it twice: Draft:Saetbyol-4 pythoncoder (talk | contribs) 19:15, 8 July 2025 (UTC)[reply]
    ChatGPT likes to generate malformed AfC templates (which breaks the submission and automatically creates a broken Decline template).
    An examples of this..
    {{Draft topics|biography|south-asia}} :{{AfC topic|other}} :{{AfC submission|||ts=20250708193354|u=RsnirobKhan|ns=2}} :{{AFC submission|d|ts=2025-06-07T00:00:00Z}} :{{AFC submission|d|ts=19:32, 8 July 2025 (UTC)}} qcne (talk) 19:40, 8 July 2025 (UTC)[reply]

    LLM-translated articles in need of review

    [edit]

    By https://oka.wiki - an organisation that is open and working in good faith, but also extremely into its LLMs. List here - David Gerard (talk) 21:48, 30 June 2025 (UTC)[reply]

    Can you point to a example, it's a lot of articles. Sohom (talk) 03:02, 1 July 2025 (UTC)[reply]
    These are, as far as I am aware, translated by editors with dual fluency. All go through AfC and are tagged as necessary by AfC reviewers. @David Gerard, do you have any specific problems with any of them? If so, please do raise them (and maybe also with the AfC reviewer), but in general I believe these aren't any more of an issue than any other translated article. -- asilvering (talk) 03:45, 1 July 2025 (UTC)[reply]

    User:Jessephu consistently creating LLM articles

    [edit]

    hello, Jessephu has already made articles flagged as AI, which is how i spotted this- see Childbirth in Nigeria and Draft:Olanrewaju Samuel. however, this exact same unusual bullet-point style is seen in many of the articles he created, including but not limited to Cancer care in Nigeria, this revision and Neglected tropical diseases in Nigeria, this revision. he's been doing this for a while now for a lot of articles. ceruleanwarbler2 (talk) 13:33, 1 July 2025 (UTC)[reply]

    For the sake of transparency, this editor asked me on Tumblr what should be done about this situation, and I told her that she could report it to this noticeboard (and clarified that the report would not be seen as casting aspersions). Chaotic Enby (talk · contribs) 13:52, 1 July 2025 (UTC)[reply]
    @Ceruleanwarbler2@ Jessephu (talk) 17:54, 1 July 2025 (UTC)[reply]
    Alright duely noted and thanks for bringing this up
    I understand the concern regarding the formatting style and the tagged AI related article. I ackwnoledge that in some of my previous articles I used the bullet point format as a way of organising my article clearly but after this review I will surely work on that.
    If there is any area my edits has fallen short, I sincerely apologise and will make nessesary corrections. I appreciate your feedback Jessephu (talk) 18:01, 1 July 2025 (UTC)[reply]
    The bullet-point format, while not ideal, is not the main issue at hand – your response doesn't answer the question of whether you were using AI or not. While that is not explicitly disallowed either, it is something that you should ideally be transparent about, especially given the editorializing and verifiability issues in some of your articles. Chaotic Enby (talk · contribs) 18:30, 1 July 2025 (UTC)[reply]
    Thank you for the feedback. Yes, I use AI to sometimes assist with drafting, but I do make sure to review and edit the content to ensure accuracy. Jessephu (talk) 03:07, 2 July 2025 (UTC)[reply]
    You created National Association of Kwara State Students on 21 April. The "Voice of Nigeria" source 404s, the "KSSB:::History" source is repeated twice for for separate claims and fails to support either, the "Ibrahim Wakeel Lekan 'Hon. Minister' Emerges as NAKSS President" source also does not support the accompanying text. Neither of the two provided sources support the subjects notability. The article is unencyclopedic in tone and substance, and is written like an essay. I have serious doubts concerning your claim that you review content for accuracy and have draftified that article. fifteen thousand two hundred twenty four (talk) 07:57, 2 July 2025 (UTC)[reply]
    i do make sure to review....... But the ones mentioned here could be a mistake from my end, currently going through articles listed here to correct errors. Will do well to strictly cross check thoroughly. Jessephu (talk) 08:10, 2 July 2025 (UTC)[reply]
    I admit I might have done somethings wrongly..... sincerely apologise will work on them now Jessephu (talk) 08:12, 2 July 2025 (UTC)[reply]
    i checked now and one of the reason i used the "KSSB:::History" source is to cite the association role in advocating for kwara state student affairs.
    regardless i am sorry, still on other articles to make necessary adjustments. Jessephu (talk) 08:31, 2 July 2025 (UTC)[reply]

    Discussion about CzechJournal at RSN

    [edit]

    There's a discussion about the reliability of CzechJournal at RSN that could use additional opinions from editors with LLM knowledge. See WP:RSN#CzechJournal in articles about AI (or in general). -- LCU ActivelyDisinterested «@» °∆t° 20:10, 3 July 2025 (UTC)[reply]

    Yaswanthgadu.21 - stub expansion using LLM

    [edit]

    I came across a supposed stub expansion to an article on my watchlist, Formby Lighthouse. It seemed to be largely generated by LLM, with all its accompanying problems (flowery text, content not supported by sources, etc.), so I reverted it.

    It seems that the user in question, Yaswanthgadu.21 may have done this for other stub articles, as part of Wikipedia:The World Destubathon. I don't have the time at present to look into this further, but if others had the opportunity, that would be helpful. On the face of it, their additions to Three Cups, Harwich look similarly dubious, and they have destubbed a bunch of other articles. Cheers, SunloungerFrog (talk) 05:59, 6 July 2025 (UTC)[reply]

    Hey SunloungerFrog,
    Just wanted to quickly explain the process I’ve been following: I usually start by Googling for sources based on the requirement. I read through them once, pick out key points or keywords, and then rewrite the content in my own words. After that, I use ChatGPT or other LLM to help refine what I’ve written and organize it the way I want. I also provide the source links at that stage. Once the content is cleaned up, I move it over to Wikipedia.
    Since everything was based on the links I gave, I assumed nothing unrelated or unsourced was getting in. But after your observation, I decided to test it. I asked GPT, “Where did this particular sentence come from? Is it from the data I gave you?” and it replied, “No, it’s not from the data you provided.” So clearly, GPT can sometimes introduce its own info beyond what I input.
    Thanks again for pointing this out. I’ll go back and review the articles I’ve worked on. If I find anything that doesn’t have a solid source, I’ll either add one or remove the sentence. I’d appreciate it if I could have two weeks to go through everything properly. Yaswanthgadu.21 (talk) 07:52, 6 July 2025 (UTC)[reply]
    I'll be blunt: it would be far preferable if you self-reverted all the edits you've made in this way, and started from scratch, because then you know you can be confident in the content, language and sourcing. Please do that instead. Cheers, SunloungerFrog (talk) 08:47, 6 July 2025 (UTC)[reply]
    I agree. Reverting all of the edits you made in this way and redoing them by hand would be preferable on every level. If you want to organize your writing the way you want, organize it yourself. Stepwise Continuous Dysfunction (talk) 16:35, 6 July 2025 (UTC)[reply]

    ISBN checksum

    [edit]

    I just found what appears to be an LLM-falsified reference which came to my attention because it raised the citation error "Check |isbn= value: checksum", added in Special:Diff/1298078281. Searching shows some 300 instances of this error string; it may be worth checking whether others are equally bogus. —David Eppstein (talk) 06:43, 6 July 2025 (UTC)[reply]

    Could be added to Wikipedia:WikiProject AI Cleanup/AI catchphrases. CX Zoom[he/him] (let's talk • {CX}) 19:27, 10 July 2025 (UTC)[reply]

     You are invited to join the discussion at WP:Village pump (idea lab) § Finding sources fabricated by AI, which is within the scope of this WikiProject. SunloungerFrog (talk) 16:58, 6 July 2025 (UTC)[reply]

    User:Yunus_Abdullatif has been expanding dozens of stub articles for the last few weeks obviously using AI. For example, their edits include capitalization and quoting that does not follow the style guideline, duplicate references, and invalid syntax. 2001:4DD4:17D:0:DA74:25C:8189:4830 (talk) 07:35, 7 July 2025 (UTC)[reply]

    Possible new idea for WP:AITELLS: non-breaking spaces in dates

    [edit]

    Over the past few weeks, I've been noticing a ton of pages showing up in Category:CS1 errors: invisible characters with non-breaking spaces in reference dates (also causing CS1 date errors). I've been trying to figure out where these are coming from, and I'm leaning towards it being another AI thing -- see this draft, which has various other AI hallmarks. Jay8g [VTE] 20:36, 7 July 2025 (UTC)[reply]

    For the interested

    [edit]

    A German newspaper [2] had an AI/human team check articles on German WP, and found that there are many WP-articles that contain errors and have outdated information, and the number of editors are not that many. Apparently this didn't use to be the case, unclear when it changed.[sarcasm]

    Anyway, this was interesting:

    "Can artificial intelligence replace the online encyclopedia? Not at the moment. The FAS study also shows this: When Wikipedia and artificial intelligence disagreed, the AI ​​wasn't more often right than Wikipedia. Sometimes, the AI ​​even correctly criticized a sentence, but also provided false facts itself. That's why human review was so important. At the same time, most AI models are also trained on Wikipedia articles. The AI ​​has therefore very likely overlooked some errors because it learned inaccurate information from Wikipedia." Gråbergs Gråa Sång (talk) 09:47, 8 July 2025 (UTC)[reply]

    This discussion wasn't very conclusive, but it seems clear this page is the closest to an LLM noticeboard we have atm. So, I made a couple or redirects, WP:LLMN and Wikipedia:Large language models/Noticeboard, and added this page to Template:Noticeboard links. We'll see what happens. Gråbergs Gråa Sång (talk) 14:18, 8 July 2025 (UTC)[reply]

    Looks good to me. Thanks for adding the link and redirects. — Newslinger talk 15:14, 8 July 2025 (UTC)[reply]

    Possible disruptive LLM usage by User:Pseudopolybius

    [edit]

    I'm not sure if this is the right place to report this kind of thing.

    I started working on a section of Long Peace until I realized the whole article has been totally transformed in the last few months, mostly by one extremely fast editor, User:Pseudopolybius. Their contributions to the article include the following nonsense: "The Coming War with Japan will be followed by The Coming Conflict with China who are locked in the Thucydides Trap and The Jungle Grows Back, While America Sleeps."

    Looks like the work of an LLM to me. Also, this user has been warned three times for using copyrighted content. Apfelmaische (talk) 19:42, 8 July 2025 (UTC)[reply]

    I've just reverted the article. Apfelmaische (talk) 20:13, 8 July 2025 (UTC)[reply]

    "Nonsense" makes perfect sense, see the Talk:Long Peace For this misunderstanding Apfelmaische reverted the article.--Pseudopolybius (talk) 22:40, 8 July 2025 (UTC)[reply]

    I was mistaken. Sorry! Apfelmaische (talk) 23:44, 8 July 2025 (UTC)[reply]

    I filled in all the incomplete entries, added some new ones, and expanded explanations. After a year and a half, I marked our core guidance page Wikipedia:WikiProject AI Cleanup/AI catchphrases as complete. Feel free to expand it with new entries if you notice new characteristics of AI writing. Ca talk to me! 13:00, 10 July 2025 (UTC)[reply]

    @Ca I've added a couple of examples I've come across in my AfC work. A thought: the drafts linked as examples will be deleted under G13 in six months- should we take a copy as a subpage under this project? qcne (talk) 15:32, 12 July 2025 (UTC)[reply]
    I think that's a good idea! It would be useful to have a corpus of LLM text examples. Ca talk to me! 15:46, 12 July 2025 (UTC)[reply]

    Move proposal

    [edit]

    – The word "Catchphrases" insinuate that the page contains specific phrases or words that can catch AI-writing which were true in the essay's inception but is no longer true in its current form; the entries are too broad and wide-reaching to fit the definition. Ca talk to me! 13:11, 10 July 2025 (UTC)[reply]

    Support. I prefer "LLM" over "AI", but with a project name of "AI Cleanup" its not something I'm going to get hung up on. If the move is accepted I suggest that the displayed shortcut WP:AICATCH be switched for a new WP:LLMSIGNS or WP:LLMTELLS shortcut, and WP:AIC/C should be switched for WP:AIC/S as well. fifteen thousand two hundred twenty four (talk) 13:26, 10 July 2025 (UTC)[reply]
    Support. I also prefer LLM but the AfC template already uses "AI" and I think it's the more common phrasing. qcne (talk) 13:30, 10 July 2025 (UTC)[reply]
    Support, and thanks a lot for your work on it! Chaotic Enby (talk · contribs) 15:41, 10 July 2025 (UTC)[reply]
    You're welcome! I want to also credit User:MrPersonHumanGuy and User:Newslinger, who has done tremendous work expanding the initial essay. Ca talk to me! 17:18, 10 July 2025 (UTC)[reply]
    Support and thanks. -- LWG talk 15:47, 10 July 2025 (UTC)[reply]
    Support as the page also lists punctuation and broken formatting. The current title presumably intends catchphrase as "a signature phrase spoken regularly by an individual", though, rather than "a phrase with which to catch someone". Belbury (talk) 16:01, 10 July 2025 (UTC)[reply]
    Support. I'm glad to see this essay graduate from the development stage. I have a weak preference for "LLM" in the title, as it would be more specific than "AI". — Newslinger talk 17:29, 10 July 2025 (UTC)[reply]
    Support per nom. Paprikaiser (talk) 20:18, 10 July 2025 (UTC)[reply]
    Support - I don't know that we need to specify "LLM", since "AI writing" is ubiquitous with LLMs and probably more recognizable to editors who are not familiar with technical terminology surrounding generative AI. - ZLEA T\C 20:24, 10 July 2025 (UTC)[reply]
    I hate to be contrarian, because obviously moving the page is correct, but I am opposing over the "AI" vs "LLM" split. While referring to them as AI is indeed commonplace in journalism, scholarly sources tend to prefer referring to generative tools by the underlying technology,[1][2][3] meaning in a technical discussion of their behavior it's perhaps better to use the latter phrase.
    This has less to do with any Wikipedian rationale, but I want to point out that we are unfortunately colluding with the marketing of these things by referring to them with such a high-prestige term. People come to this site every day and in good faith make use of LLMs on the understanding that they are intelligent and potentially smarter than them, when they are not. The language we use on the site should reflect the fact that we address these things as tools, and agree with the scholarly (and Wikipedian) consensus that these things are generally unreliable when not deeply scrutinized.
    Obviously the fate of the universe doesn't rest on the name of this one Wikipedia page. I just want everyone who feels apathetic about the name change to understand the subtext and how we're deviating from academic terminology and replacing it with a trendier term born out of a speculative market, which may in time become seen ubiquitously as inaccurate. Altoids0 (talk) 04:24, 12 July 2025 (UTC)[reply]
    Although I agree with changing the page's title to something else, I also think Wikipedia:Signs of LLM use would be a better title than Wikipedia:Signs of AI writing. – MrPersonHumanGuy (talk) 10:57, 12 July 2025 (UTC)[reply]

     You are invited to join the discussion at Wikipedia:Edit filter noticeboard § Edit filters related to logging and blocking AI edits. –Novem Linguae (talk) 05:34, 11 July 2025 (UTC)[reply]

    1. ^ "Pay Less Attention to Deceptive Artifacts: Robust Detection of Compressed Deepfakes on Online Social Networks". arXiv. doi:10.48550/arXiv.2506.20548.
    2. ^ "LLM-based NLG Evaluation: Current Status and Challenges". Computational Linguistics. doi:10.1162/coli_a_00561.
    3. ^ "A Survey on LLM-Generated Text Detection: Necessity, Methods, and Future Directions". Computational Linguistics. doi:10.1162/coli_a_00549.