Wikipedia:Bot requests#Bot request

This is a page for requesting tasks to be done by bots per the bot policy. This is an appropriate place to put ideas for uncontroversial bot tasks, to get early feedback on ideas for bot tasks (controversial or not), and to seek bot operators for bot tasks. Consensus-building discussions requiring large community input (such as request for comments) should normally be held at WP:VPPROP or other relevant pages (such as a WikiProject's talk page).

You can check the "Commonly Requested Bots" box above to see if a suitable bot already exists for the task you have in mind. If you have a question about a particular bot, contact the bot operator directly via their talk page or the bot's talk page. If a bot is acting improperly, follow the guidance outlined in WP:BOTISSUE. For broader issues and general discussion about bots, see the bot noticeboard.

Before making a request, please see the list of frequently denied bots, either because they are too complicated to program, or do not have consensus from the Wikipedia community. If you are requesting that a template (such as a WikiProject banner) is added to all pages in a particular category, please be careful to check the category tree for any unwanted subcategories. It is best to give a complete list of categories that should be worked through individually, rather than one category to be analyzed recursively (see example difference).

Alternatives to bot requests

Note to bot operators: The {{BOTREQ}} template can be used to give common responses, and make it easier to keep track of the task's current status. If you complete a request, note that you did with {{BOTREQ|done}}, and archive the request after a few days (WP:1CA is useful here).


Please add your bot requests to the bottom of this page.
Make a new request
# Bot request Status 💬 👥 🙋 Last editor 🕒 (UTC) 🤖 Last botop editor 🕒 (UTC)
1 Decap "External Links" eraser Undone 31 9 Lee Vilenski 2026-02-17 16:33 Qwerfjkl 2026-02-16 19:51
2 Citation source replacement with {{Cite Köppen-Geiger cc 2007}} 5 4 ActivelyDisinterested 2026-01-17 20:26 Usernamekiran 2025-11-21 13:40
3 Web scrapping 9 3 Headbomb 2026-02-13 12:24 Headbomb 2026-02-13 12:24
4 Redirects related to those nominated at RfD BRFA filed 17 3 Thryduulf 2026-02-24 15:51 Vanderwaalforces 2026-02-09 12:58
5 Update broken Svenska Dagbladet archive URLs 3 2 Saftgurka 2026-01-09 08:22 DreamRimmer 2026-01-08 15:33
6 List of Wikipedians by non-automated/manual edits 1 1 HKLionel 2026-01-09 12:58
7 Website to article mass redirect creation 13 6 Some helpful person 2026-02-09 20:24 Primefac 2026-01-17 11:48
8 Removing links to copyvio site 15 7 LaundryPizza03 2026-01-24 20:55 Tenshi Hinanawi 2026-01-15 01:19
9 Broadcasting magazine citation cleanup request 9 4 Nathan Obral 2026-02-04 20:47 Headbomb 2026-01-21 03:27
10 Move links clean up request 3 3 HurricaneZeta 2026-02-01 20:20 Anomie 2026-01-19 00:01
11 Unnecessary disambiguations 11 6 TenPoundHammer 2026-01-21 00:46 Qwerfjkl 2026-01-20 22:28
12 Automatically add Template:AI-retrieved source 2 2 Chrs 2026-01-20 02:46
13 Infobox links to football statistics BRFA filed 22 2 Vanderwaalforces 2026-02-07 08:33 Vanderwaalforces 2026-02-07 08:33
14 UTM Bot Request 15 4 SeaDragon1 2026-02-24 14:33 Qwerfjkl 2026-02-21 10:50
15 Infobox settlement country label easter egg/overlink delinking BRFA filed 4 3 Canterbury Tail 2026-01-31 23:30 Vanderwaalforces 2026-01-29 16:23
16 Hip-hop hyphens 2 2 Anomie 2026-01-31 19:30 Anomie 2026-01-31 19:30
17 Request for bot to perform automated task related to MOS:POSLINK 19 9 Primefac 2026-02-11 11:44 Primefac 2026-02-11 11:44
18 Mass-delinking of word "population" in articles about Japanese populated places, etc. 3 3 Phuzion 2026-02-02 17:01 Phuzion 2026-02-02 17:01
19 Request for Co-operator for User:FireflyBot 5 4 Legoktm 2026-02-06 01:05 Legoktm 2026-02-06 01:05
20 Useless non-free no reduce tags 5 4 Anomie 2026-02-21 19:55 Anomie 2026-02-21 19:55
21 archive.today cleanup  Template adjusted. 51 12 WhatamIdoing 2026-02-26 03:28 Sammi Brie 2026-02-24 04:55
22 Reference spam detector 1 1 Mathglot 2026-02-24 01:34
23 Iranian village capitalization BRFA filed 8 4 Bunnypranav 2026-02-28 15:36 Bunnypranav 2026-02-28 15:36
24 Fix disambiguate links after page move  Done 3 2 8rz 2026-02-27 00:09 Phuzion 2026-02-26 14:19
Legend
  • In the last hour
  • In the last day
  • In the last week
  • In the last month
  • More than one month
Manual settings
When exceptions occur,
please check the setting first.

Decap "External Links"

[edit]

Decap "External Links" to "External links". Here is the search code (insource:/==External Links==/) I was just going to fix them with JWB but there are quite a lot. At least 7,900, possibly more. Here is the code with different spacing as well (insource:/== External Links ==/) That will generate a different set of results that also need fixing. ~WikiOriginal-9~ (talk) 00:29, 14 October 2025 (UTC)[reply]

WikiOriginal-9, 16,000 (regex times out). — Qwerfjkltalk 08:05, 14 October 2025 (UTC)[reply]
Though this seems fairly trivial, so maybe just add to RegExTypoFix and forget about it? — Qwerfjkltalk 08:06, 14 October 2025 (UTC)[reply]
eraser Undone at WP:AWB/T. phuzion (talk) 02:24, 12 November 2025 (UTC)[reply]
@Phuzion: This new rule will not get the job done. As explained at Wikipedia:AutoWikiBrowser/Typos#Usage, most of the tools that use the typo list do not run the rules on section headings. -- John of Reading (talk) 07:47, 12 November 2025 (UTC)[reply]
Good catch! Given that AWB and other tools won't actually utilize this rule, do you think it's worth removing from the RegExTypoFix list? I've marked the task as undone for now, as well. phuzion (talk) 12:38, 12 November 2025 (UTC)[reply]
@Phuzion: Yes, I've removed it. -- John of Reading (talk) 13:35, 12 November 2025 (UTC)[reply]
Thanks! phuzion (talk) 13:38, 12 November 2025 (UTC)[reply]
A couple of points:
  1. If we limit the search to the article space, i.e. [1], the haul is much less impressive.
  2. I regularly use a prefix in the search and cycle through the alphabet to correct these errors, and the other MOS:HEAD errors that are frequently present on those pages. i..e use a search like this.
So by all means do something to make it easier to fix MOS:HEAD errors in general, but please don't remove this handy signpost for where they can be found. William Avery (talk) 10:19, 14 January 2026 (UTC)[reply]
I wanted to mention that the same thing applies to "See Also", which is meant to be "See also". What is the update regarding this? I have started {{BOTREQ}} this since no one has gotten to it yet, and since AWB can't do it. I just need to know what the update is? Vanderwaalforces (talk) 11:17, 4 February 2026 (UTC)[reply]
It's probably changing == *External +Links *== to ==External links==. For that reason, AWB could probably do it. Steel1943 (talk) 04:54, 5 February 2026 (UTC)[reply]
@Steel1943 I thought someone said Given that AWB and other tools won't actually utilize this rule, do you think it's worth removing from the RegExTypoFix list? :) Vanderwaalforces (talk) 08:31, 5 February 2026 (UTC)[reply]
That AWB cannot work on section headers...? Vanderwaalforces (talk) 08:38, 5 February 2026 (UTC)[reply]
I'm pretty sure AWB currently fixes these two specific titles. I know for sure I've recently had a "See Also" change to "See also" and I'm pretty sure EL is also being fixed. Gonnym (talk) 16:33, 9 February 2026 (UTC)[reply]
@Gonnym, yeah. I’ve actually abandoned this task. Vanderwaalforces (talk) 18:14, 9 February 2026 (UTC)[reply]
For what it's worth, a more proper regex is insource:/== *External +Links *==/. (@Vanderwaalforces: Not sure if this was considered, but pinging you in case.) Steel1943 (talk) 04:49, 5 February 2026 (UTC)[reply]
Noted. Vanderwaalforces (talk) 08:38, 5 February 2026 (UTC)[reply]
I may try to pick up this task as a one-time WP:AWB run. I don't know how to create a bot, but I may be able to perform this request as is with the regex string I determined. Steel1943 (talk) 21:44, 11 February 2026 (UTC)[reply]
I was just using AWB, and it found only 33 such issues in the article space as referenced in this thread ... I'm not sure how 16,000 were found. Either way, I corrected the 33 that AWB found. Steel1943 (talk) 03:11, 12 February 2026 (UTC)[reply]
Putting in == *External +Links *==/ into AWB and running AWBtypos does indeed fix this. See this edit.
FWIW, AWB found just over 5000 such articles, which is a lot, but could probably be chipped away at with AWB. Lee Vilenski (talkcontribs) 22:12, 13 February 2026 (UTC)[reply]
Honestly, I think the regex isn't right. Most of what I found didn't have this on even using the regex. insource:/== *See Also *==/ and insource:/== *External Links *==/ basically found less than 100 between them in article space. Lee Vilenski (talkcontribs) 23:17, 13 February 2026 (UTC)[reply]
(edit conflict) @Lee Vilenski: Odd since when I used AWB to look for that regex string, I received only 33 results (which I have since corrected). Is there some setting or something that needs to be changed to find these pages? Steel1943 (talk) 23:18, 13 February 2026 (UTC)[reply]
I fear we may have edit conflicted, see above. Lee Vilenski (talkcontribs) 23:19, 13 February 2026 (UTC)[reply]
I did find a few more just searching for "External Links", but that is mostly articles saying "look at the section below", which also needs fixing, but even that was only like 150 Lee Vilenski (talkcontribs) 23:20, 13 February 2026 (UTC)[reply]
Yeah, it's odd ... I received more results with "_" instead of "_+" between "External" and "Links". I guess Wikipedia's search function tends to time out quicker/frequently if a more greedy regex string is used, even if the greedy string technically has more applicable results. Steel1943 (talk) 00:58, 14 February 2026 (UTC)[reply]
I see 14,301 pages in the original search link that Qwerfjkl posted. Gonnym (talk) 10:47, 16 February 2026 (UTC)[reply]
Yeah, but how much of that is in article space? I found only a handful which were fixed. If we can find a list of articles, we can go from there. Lee Vilenski (talkcontribs) 12:48, 16 February 2026 (UTC)[reply]
@Lee Vilenski, this more optimised search gives 150 in mainspace. Qwerfjkltalk 19:51, 16 February 2026 (UTC)[reply]
Thanks for that regex. I'll work my way through them. Lee Vilenski (talkcontribs) 19:54, 16 February 2026 (UTC)[reply]
I think these ~1,000 non-talk or user pages (articles, drafts, portal, template, module, help, category, file) should all be fixed. The MoS applies the same as well. Gonnym (talk) 16:13, 17 February 2026 (UTC)[reply]
as much as 1000 is a lot, I don't think we need a dedicated bot run to fix it. I've been doing some AWB runs recently, can add this regex to it. Lee Vilenski (talkcontribs) 16:33, 17 February 2026 (UTC)[reply]

Citation source replacement with {{Cite Köppen-Geiger cc 2007}}

[edit]

Hundreds (thousands?) of mountain articles use as a reference a paper titled "Updated world map of the Köppen-Geiger climate classification" that was published in 2007 in Hydrology and Earth System Sciences, Volume 11, Issue 5. Typically these use {{Cite journal}} passing in the appropriate values. However, the values were not consistently applied and so we have generated references that do not provide all the necessary information each time it's used nor is the information as complete as it could be. Thus, I created the template to provide complete information about the source reference so that it's consistent across Wikipedia. I then searched for "Updated world map of the Köppen-Geiger" to find articles using this reference source and started replacing the citation source text with <ref name=KGcc2007>{{Cite Köppen-Geiger cc 2007}}</ref>. The search found other articles on rivers and human settlements also using this source reference. At this point, I have manually edited over 360 pages to make this change, the high majority being mountain articles but also a few articles about rivers and populated places. The search currently returns over 2800 results. So at this point I think it would be good if a bot could automate this edit to articles using this source reference. Typically in mountain articles, the citation is in a "Climate" section which usually begins with the sentence "Based on the [[Koppen climate classification]],". Other times it's in the lead section. Often the citation is using a named reference, typically "Peel" for the first author, although the paper does have three authors, which is why I chose to use KGcc2007 rather than Peel. Perhaps the reference could be named a bit different to denote it was a bot edit, e.g. <ref name=KGcc2007be>. I have been using "{{Cite Köppen-Geiger cc 2007}}" as the edit summary. RedWolf (talk) 21:00, 12 November 2025 (UTC)[reply]

What is the need to mass replace all of these citations with this template? Tenshi! (Talk page) 13:52, 19 November 2025 (UTC)[reply]
pinging RedWolf —usernamekiran (talk) 13:40, 21 November 2025 (UTC)[reply]
In addition to what I already stated above, many of the raw source references are just using ISSN with a large range which if clicked would return a page with over 13,000 results (i.e. ISSN 1027-4606); I don't understand why editors gave a huge range. The template adds the DOI and BIBCODE as well as an archive link so there are direct links to the source paper. All uses of this source will have consistent and basically complete information. As well it provides all the other benefits with using a template (e.g. what links here for usage count). RedWolf (talk) 17:09, 21 November 2025 (UTC)[reply]
Using this template causes a no target error (see Category:Harv and Sfn no-target errors)) when used in conjunction with short form references. Lake Nipisso as an example[2]. This could be avoided by whitelisting the template (which can be requested at Module talk:Footnotes), but that won't work as long as you're using #invoke for the cite (which short form refs just don't support). -- LCU ActivelyDisinterested «@» °∆t° 20:26, 17 January 2026 (UTC)[reply]

Web scrapping

[edit]

Not sure a bot is best suited for this, but I'd like some form of web scrapper that would crawl this list https://www.elsevier.com/products/journals?query=&page=1&accessType=open-access&sortBy=relevance

And fetch the list of OA journals from Elsevier

  • AACE Clinical Case Reports
  • AACE Endocrinology and Diabetes
  • AI Open
  • AI Thermal Fluids

alongside a sample DOI from the journal, e.g.

  • AACE Clinical Case Reports (10.1016/j.aace.2024.11.008)
  • AACE Endocrinology and Diabetes (10.1016/j.aed.2025.12.012)
  • AI Open (10.1016/j.aiopen.2025.01.002)
  • AI Thermal Fluids (10.1016/j.aitf.2025.100024)

....


Headbomb {t · c · p · b} 18:09, 23 December 2025 (UTC)[reply]

This website loads its content on the client-side using JavaScript, which means scraping would require a browser automation tool such as Selenium, and would likely fail on many pages due to Cloudflare and other anti-bot protections. – DreamRimmer 15:07, 31 December 2025 (UTC)[reply]
@Headbomb,
Qwerfjkltalk 20:33, 31 December 2025 (UTC)[reply]
Thanks this will be very helpful. Headbomb {t · c · p · b} 05:00, 3 January 2026 (UTC)[reply]

@Qwerfjkl: could something similar be done with these: https://freejournals.org/current-member-journals/ ? Headbomb {t · c · p · b} 18:19, 12 February 2026 (UTC)[reply]

@Headbomb, not much to work off here; the site doesn't give the issns. But with a naive lookup on crossref:
No guarantee of correctness here. Qwerfjkltalk 21:47, 12 February 2026 (UTC)[reply]
I was thinking you could load up each journal page, select a random article, and get the DOI root from there? Headbomb {t · c · p · b} 01:45, 13 February 2026 (UTC)[reply]
@Headbomb, that's not feasible here, all of the journals have different websites. Qwerfjkltalk 12:20, 13 February 2026 (UTC)[reply]
AH well, bummer. Headbomb {t · c · p · b} 12:24, 13 February 2026 (UTC)[reply]
[edit]

Per the initial discussion at Wikipedia talk:Redirects for discussion#Avoided double redirects of nominated redirects I believe there is consensus for an ongoing bot task that does the following:

  • Looks at each redirect nominated at RfD
  • Determines whether there are any other redirects, in any namespace, that meet one or more of the following criteria:
    • Are marked as an avoided-double redirect of a nominated redirect
    • Are redirects to the nominated redirect
    • Redirect to the same target as the nominated redirect and
      • Differ only in the presence or absence of diacritics, and/or
      • Differ only in case
  • If the bot finds any redirects that match and which are not currently nominated at RfD, then it should post a message in the discussion along the lines of:
    • Bot note: {{noredirect|Foo Smith}} (talk · links · history · stats) is an avoided double redirect of "Foo Jones"
    • Bot note: {{noredirect|Foo smith}} (talk · links · history · stats) is a redirect to the same target as "Foo Smith"
The bot should not take any actions other than leaving the note, the goal is simply to make human editors aware that these redirects exist.

I don't know how frequently the bot should run, but it should probably wait at least 15 minutes after a nomination before checking or editing so as not to get into edit conflicts or complications as discussions of multiple redirects are often nominated individually and then the discussions manually combined. Thryduulf (talk) 13:11, 17 June 2025 (UTC)[reply]

There is a strong consensus; if there are no objections in the next day or so, I'll file a BRFA. In the meantime I'll code up the bot. GalStar (talk) 17:56, 17 June 2025 (UTC)[reply]
I've just thought of a third case to check for: differences only in hyphenation/dashes. Thryduulf (talk) 21:38, 17 June 2025 (UTC)[reply]
Actually that's generalisable to differences only in punctuation. Thryduulf (talk) Thryduulf (talk) 03:40, 18 June 2025 (UTC)[reply]
@GalStar is there any update on this? Thryduulf (talk) 20:01, 25 June 2025 (UTC)[reply]
I'm still working on it. I'm still getting some of the underlying tooling working, but I should be done soon. GalStar (talk) 16:40, 26 June 2025 (UTC)[reply]
Thank you. Thryduulf (talk) 16:50, 26 June 2025 (UTC)[reply]
If anyone is wondering, I'm currently porting my code to toolforge, so it can run continuously, and without the unreliability of my home network. This is taking longer than I expected however. GalStar (talk) 17:17, 26 June 2025 (UTC)[reply]
BRFA filed GalStar (talk) (contribs) 20:56, 2 July 2025 (UTC)[reply]

Restored from Archive 87. The BRFA mentioned above (Wikipedia:Bots/Requests for approval/GraphBot 2) was abandoned before a working bot was written so the task remains outstanding. Wikipedia talk:Redirects for discussion#Avoided double redirects of nominated redirects received no objections to my reinstating this request. Thryduulf (talk) 20:05, 31 December 2025 (UTC)[reply]

@Thryduulf Is there a current discussion that falls into this category that I can test on? Vanderwaalforces (talk) 17:31, 5 February 2026 (UTC)[reply]
I don't know of any off the top of my head. I can't think of an easy way to find any other than by doing what this requests asks a bot to do (i.e. look through all nominated redirects and check for similar ones). Thryduulf (talk) 19:46, 5 February 2026 (UTC)[reply]
@Thryduulf You’re right, I just put some logics together and tried it; I couldn’t find any too. Maybe there currently isn’t any, but I have coded this task btw, maybe I should file a BRFA? Vanderwaalforces (talk) 19:59, 5 February 2026 (UTC)[reply]
@Thryduulf Take a look at my test at testwiki. I intentionally created a redirect drama. What do you think? Vanderwaalforces (talk) 09:42, 9 February 2026 (UTC)[reply]
BRFA filed. Vanderwaalforces (talk) 12:58, 9 February 2026 (UTC)[reply]
I'm not going to get a chance to look until this evening, sorry. Thryduulf (talk) 13:36, 9 February 2026 (UTC)[reply]

The BRFA has been withdrawn by the operator (Vanderwaalforces) if any other operator wants to take on the task. Thryduulf (talk) 15:51, 24 February 2026 (UTC)[reply]

Update broken Svenska Dagbladet archive URLs

[edit]

Svenska Dagbladet has recently changed the URL structure of its historical newspaper archive, causing many archive links on Wikipedia to break.

Old format (broken):

  • https://www.svd.se/arkiv/YYYY-MM-DD/PAGE/SVD - for example: https://www.svd.se/arkiv/1960-08-17/9/SVD

New format:

  • https://www.svd.se/arkiv/YYYY-MM-DD/SVNY/PAGE - for example: https://www.svd.se/arkiv/1960-08-17/SVNY/9

I request a bot run to automatically update all affected links in articles to the new format so that the archive references become functional again. Saftgurka (talk) 15:17, 8 January 2026 (UTC)[reply]

WP:URLREQ can help with this. – DreamRimmer 15:33, 8 January 2026 (UTC)[reply]
Thanks. I'll try asking there. Saftgurka (talk) 08:22, 9 January 2026 (UTC)[reply]

List of Wikipedians by non-automated/manual edits

[edit]
List of Wikipedians by non-automated edits
There are various lists of Wikipedians by edit count and related stats. One that I'd be interested to see would be Wikipedia:List of Wikipedians by non-automated edit count. I think the presence of such a list might be a small help re Editcountitis, as it'd recognize/incentivize editors whose edit count has not been juiced through tons of (typically low-value) automated edits.

Would anyone be interested in coding a bot to populate and maintain such a page? Courtesy pinging Legoktm and 0xDeadbeef, who run the bot that updates the overall edit count list, in case either of you might be up for it. {{u|Sdkb}}talk 20:48, 19 July 2023 (UTC)[reply]

How is non-automated edit vs automated edit defined? Legoktm (talk) 22:40, 19 July 2023 (UTC)[reply]
This thing knows. Folly Mox (talk) 22:54, 19 July 2023 (UTC)[reply]
Interesting, it's controlled by MediaWiki:XTools-AutoEdits.json. Legoktm (talk) 23:07, 19 July 2023 (UTC)[reply]

Sounds like we need three lists: total, manual and (semi-)automated. Or possibly two: manual and (semi-)automated. The negative/subtraction name "by non-automated edit count" rather should be a positive name "by manual edit count". I'm not convinced automated edits are typically low-value, anyway, manual edits often have the same characteristics. Plus it's so hard to tell since automation can take many forms that are impossible to track. -- GreenC 03:58, 20 July 2023 (UTC)[reply]

XTools doesn't distinguish between semi-automated and automated edits, so if we're using their definitions, we won't either, at least to start. I'm guessing that there may be some blurriness that would make such a distinction tricky.
"manual edit count" sounds fine for the page name.
Regarding your other points, we're never going to come up with a perfect measure of editor contributions. This list will need a caveat lector just like the other — the goal is just to provide another angle that perhaps incrementally reduces the incentive to game the system. {{u|Sdkb}}talk 05:03, 20 July 2023 (UTC)[reply]

The above request, transcluded from WP:Bot requests/Archive 85#List of Wikipedians by non-automated edits, expired after a day in July 2023. Sdkb has agreed that it would be a good idea to request this again. List of Wikipedians by manual edits could be an alternative title and is the one currently linked to in TM:Wikipediholism, which is how I discovered this potentially interesting statistical compilation. Cheers, HKLionel TALK 12:58, 9 January 2026 (UTC)[reply]

Website to article mass redirect creation

[edit]

Many articles on Wikipedia have a Website listed in their infobox and/or as an external link. I was thinking it would be good to have redirects from websites which do not have their own pages to articles on subjects that are hosted on said websites. Many popular articles already have such redirects, such as google.com -> Google Search, however there are a plethora more that do not. Due to the massive scope of this, would it at least be possible to:

  • Go through all articles with a value for Website in its infobox
    • There are different types of infoboxes which all have Website fields; I don't know if that would be a problem or not...
  • Obtain the domain/URL from the listed Website, removing the www and HTTP protocol (e.g. https://www.example.com would become example.com), stopping if it is invalid
    • Some infoboxes link the Website (e.g. Example Official Website), so it would have to get the URL rather than the text somehow (Sorry that I don't know how this works )
  • Create a redirect from that domain to the article (if a page named the domain doesn't already exist) with these redirect categories:
    {{Redirect category shell|
    {{R from domain name}}
    {{R unprintworthy}}
    }}
    
    • For pages with paths (e.g. example.com/page; or would the inclusion of these be excessive?), {{R from URL}} would be used in place of {{R from domain name}}

I initially attempted to do this manually, but quickly realized the sheer scale at which this would need to be done. I don't know if this is a task better suited for AutoWikiBrowser, userscripts, related WikiProjects, or manual editing (though I risk catching editcountitis)... but I think it would be helpful to have these redirects, as it seems searching for some websites on Wikipedia will not even show their relevant articles in the search results. I don't have a readymade list, but I have a feeling this would involve a lot of pages. I'll attempt to compile a list of these pages if needed. Does anyone know how to best do this? Thanks! Some helpful person (talk) 20:54, 11 January 2026 (UTC)[reply]

Seems like something that should be raised at WP:VPR first, to see if the community in general thinks that having a redirect for every domain name is something we even want. Anomie 23:24, 11 January 2026 (UTC)[reply]
Yeah... I don't see this happening, but I could be wrong; demonstrating consensus is a must for this sort of ask/task. Primefac (talk) 00:10, 12 January 2026 (UTC)[reply]
I could see this being done for current domains redirects, but i doubt the mass creation of domain redirects would be supported, unless strongly curtailed by something like 'top 100 most popular websites' or something. Headbomb {t · c · p · b} 00:19, 12 January 2026 (UTC)[reply]
Ah, okay. Didn't realize this would require consensus. I only skimmed through the policies on mass editing, but the mass page creation section of the editing policy gave me the impression that wasn't required for redirects... though I suppose creating redirects for every single website is a little different than creating redirects for certain alternative names. I just thought it made sense to be able to search for a website domain and immediately find the relevant article. I guess I'll stick to adding them on a case-by-case basis for now, as I don't know if I could convincingly pitch it to others in the face of more important things. Some helpful person (talk) 00:38, 12 January 2026 (UTC)[reply]
Well, redirects aren't covered by MASSCREATION specifically, but they are still subject to BOTPOL overall, which does require consensus for any bot task. Headbomb {t · c · p · b} 00:43, 12 January 2026 (UTC)[reply]
Yeah, I probably should have read the policies a little closer before proposing this... but I don't suppose this means I could sit down and create all of these redirects manually either? Some helpful person (talk) 00:56, 12 January 2026 (UTC)[reply]
Some helpful person, that would be a bad idea. — Qwerfjkltalk 16:06, 12 January 2026 (UTC)[reply]
Okay then. I guess I got a little carried away... Some helpful person (talk) 04:13, 13 January 2026 (UTC)[reply]
Nothing wrong with that, it's good to see folk interested in Wikipedia, and new ideas are always welcome (even if they aren't implemented). Primefac (talk) 11:48, 17 January 2026 (UTC)[reply]
This is a useful idea. For example, many citations have |website=google.com but the recommendation and best practice is to use the actual name Google. Currently there is no database the matches "domain = full name". This could be useful for tool makers that carefully use the data. -- GreenC 18:18, 9 February 2026 (UTC)[reply]
User:Some helpful person: if you want a list of the largest domains I can provide that ie. nytimes.com has 370,000 URLs etc.. sorted from largest to smallest. Then choose the top 1,000 for this project. You could probably cover a large percentage of all links with just 1,000 redirects - which I doubt anyone would notice or even care much. -- GreenC 18:23, 9 February 2026 (UTC)[reply]
Oh wow, I didn't even consider that use case. This sounds like a good approach, as it's not every website ever but the ones most referenced on Wikipedia. That list would be super helpful as well if it isn't too much trouble. I assume the redirects must be created manually? Thanks in advance! Some helpful person (talk) 20:24, 9 February 2026 (UTC)[reply]
[edit]

worldradiohistory.com: Linksearch en (insource) - meta - de - fr - simple - wikt:en - wikt:frSpamcheckblacklist hitsMER-C X-wikigs • Reports: Links on en - COIBot - COIBot-Local • Discussions: tracked - advanced - RSN • COIBot-Link, Local, & XWiki Reports - Wikipedia: en - fr - de • Google: searchmeta • Domain: domaintoolsAboutUs.com
Originally requested by LaundryPizza03 at WP:AWBREQ, but Sophisticatedevening suggested it would be easier for a bot. There are 28,012 links to this site in sources in mainspace [3]. This also turns up 1,963 more. I'm not fully sure about the feasibility of this with a bot as I don't know much on the technical side, but I thought it was worth forwarding this here if a bot could knock out at least some of these. In line with the previous chakoteya cite removal, if this is the only citation it should be removed and replaced with a {{cn}}. Courtesy ping to LuniZunie and also see m:Talk:Spam blacklist#worldradiohistory.com. HurricaneZetaC 01:03, 13 January 2026 (UTC)[reply]

Given I am an editor in the topic fields in question (WP:TVS and WP:WPRS) I can provide some assistance.
The vast majority of these links will inevitably be going to Broadcasting magazine (which underwent multiple name changes over the years and is now the website Broadcasting & Cable) or to the Broadcasting yearbooks. Sammi Brie has been utilizing a ProQuest media database she has access to (but regrettably is not in TWL, else I would be using the database myself) to insert replacement tags in the id= fields. My thought is such a bot command would have to be staged and deal with those instances right off the bat. Also of note @HurricaneZeta there's also americanradiohistory.com here and here and, to a much lesser extent, davidgleason.com here and here. Nathan Obral • he/him/🦝 • tc12:47, 14 January 2026 (UTC)[reply]
One thing I would strongly suggest (and I know Sammi Brie would agree) is that the refs should not be junked altogether but to simply have the URLs stripped. The refs can be converted into ones that either have a ProQuest ID tag or are offline. Very rarely are they just bare urls. I do not know if that makes the bot command more complex or simple. Nathan Obral • he/him/🦝 • tc13:10, 14 January 2026 (UTC)[reply]
Hi, I will note that...
  • There are some cases in which the publication is out of copyright, e.g. early years of Television Digest pre-1963 without a copyright notice, or where the author/publisher has contributed the work to WRH, e.g. Duncan's American Radio. These are probably legitimate.
  • Several years ago, in my older GAs and FAs, I overbuilt ProQuest IDs onto WRH references to Broadcasting except in a handful of cases. (I recall some issues in the 1970s that ProQuest did not have.) This was done very deliberately to allow the WRH URLs to be removed at a later time.
    • Unlike Chakoteya, the problem is not reliability of the publications (headlined by a series of broadcasting and music industry trades — Broadcasting (& Cable), Radio & Records, etc.) but their availability online.
  • Some of these sources are unfortunately not available via other means. And even ProQuest poses a problem for all editors I know of but two. I also strongly strongly encourage getting TWL to subscribe to the ProQuest collection Entertainment Industry Magazine Archive. It is not in TWL's ProQuest offerings at present (I have different, institutional access), and obtaining it would help me point editors to legitimate ways to link to Broadcasting and in some cases (their year range is unfortunately fairly limited) Radio & Records. As a result, there are only two people I know of who can access the ProQuest IDs included in many of my articles (the other is Flip Format) and probably hundreds of ProQuest IDs that not even most Wikipedia editors can open.
What I don't want is for innocent editors—who right now have no other way of accessing many of these sources—and their pages to be caught up in this by removing references wholesale from articles. I'd much prefer something like Special:Diff/1158839954 where the references remain intact without WRH links. Sammi Brie (she/her · t · c) 17:14, 14 January 2026 (UTC)[reply]
To second Sammi Brie, I object to having a bot simply purge any and all refs tied to WRH outright. That type of request is incredibly punitive and potentially unnecessary. Broadcasting magazine itself is not a copyvio (and is actually a rather important trade magazine that covered much of the history of radio and television), but the links to it hosted by WRH are a gray area. A case can be made for removing the URLs on all Broadcasting refs linked to WRH (and also those of Radio & Records and Billboard) and converting the refs to {{Cite magazine}} format, not dissimilar to what Mikeblas was doing in the above example. At the time, I objected to Mike's rationale, but over two years later can understand the nuance of it now, which also is a sincere apology to Mike on my end.
Given how different my request is from the OP, I would be willing to make a bot request of my own if that would be appropriate. Nathan Obral • he/him/🦝 • tc19:38, 14 January 2026 (UTC)[reply]
Thank you for pinging me, and thank you for your apology.
I'd love to see a bot take care of these links once and for all. Removing the URLs while leaving behind the rest of the "offline" reference information is compeltely feasible.
I'm made uncomfortable by some of the generalizations here, though. There are lots of different magaiznes scanned at these sites, and I don't think there's any majority -- particularly not a "vast" one.
I see there's talk about blacklisting the site m:Talk:Spam_blacklist#worldradiohistory.com. Is there a reason that it shouldn't be blacklisted at Wikipedia:Spam blacklist instead (or as well)?
Finally, the claims that certain copies of material are or are not copyvio have to be made very carefully and usually can't be easily done. Linking to copyvio material makes the Wikipedia project vulnerable, and linking to external material needs to be done very carefully. (This is all made clear in Wikipedia policy.) -- mikeblas (talk) 01:09, 15 January 2026 (UTC)[reply]
If it is listed at the meta Spam blacklist, it would apply to every Wikimedia wiki, making listing at the local page redundant. Tenshi! (Talk page) 01:19, 15 January 2026 (UTC)[reply]
@Mikeblas: did you create a bot at all for those tasks? I'm trying to remember if you had or not. I'm sorry, brain fail on my end... it had been over two years and I thought you had constructed something. If so I would like to talk with you because I have been mulling multiple proposals on maintenance-related things. Nathan Obral • he/him/🦝 • tc02:19, 15 January 2026 (UTC)[reply]
I don't have a Wikipedia bot. But I do have some C# code that parses things up and removes the links. It converts {{cite web}} to {{cite magazine}}, then renames the parameters that have to be fixed up. -- mikeblas (talk) 02:57, 15 January 2026 (UTC)[reply]
Ahh I gotcha, it makes sense. I based my below proposal off of what you had done prior. :) One of these days I need to implement C# code on my end if but to make some tasks easier. Nathan Obral • he/him/🦝 • tc04:03, 15 January 2026 (UTC)[reply]
One reason I link to worldradiohistory.com is because sometimes they're the only source of information for chart positions, specifically those of Radio & Records and RPM. Our previous source for RPM positions was flaky and incomplete, meaning about half of 1989 and a few positions from 1993 were missing entirely. I feel like wholesale removing links to worldradiohistory would do more harm than good. If there are concerns about copyvio, then they should be handled case by case. Ten Pound Hammer(What did I screw up now?) 17:19, 19 January 2026 (UTC)[reply]
I'm confused by your comment. Radio & Records and RPM are sources, but worldradiohistory is not. -- mikeblas (talk) 23:48, 20 January 2026 (UTC)[reply]
What I mean is that in at least a few cases, worldradiohistory is the only place I can find that information in the first place. The Radio & Records Canada Country charts used to be on WP:BADCHARTS due to a lack of an accessible archive, but the existence of worldradiohistory was enough to get them taken off WP:BADCHARTS. How else can the positions be cited if not through worldradiohistory? Ten Pound Hammer(What did I screw up now?) 01:27, 21 January 2026 (UTC)[reply]
The positions can be cited using the source publication itself. -- mikeblas (talk) 17:53, 24 January 2026 (UTC)[reply]
I think the point here is that you can use a pirate site to access a citation you couldn't obtain otherwise, but you shouldn't link to it in the citation. We don't link to Sci-Hub in references to scientific papers. –LaundryPizza03 (d) 20:55, 24 January 2026 (UTC)[reply]

Broadcasting magazine citation cleanup request

[edit]

This is a counterpropsal to the above bot request, and one that is likely more judicious and practical. Also would be the first of several that are based off of Special:Diff/1158839954:

  • All refs and citations to Broadcasting magazine in various iterations, including Broadcasting-Telecasting and Broadcasting & Cable that have URLs pointing to either worldradiohistory.com, americanradiohistory.com or davidgleason.com have said URLs removed. This is not going to be complicated as the base URL is worldradiohistory.com/Archive-BC/* or americanradiohistory.com/Archive-BC/*
  • If the citation template is cite web or cite news, convert it to cite magazine and the website= or work= field to the magazine= field.
  • If there is no id = field or if that field is empty, add the following invisible comment: |id = <!-- needs ProQuest ID tag -->. This can be a good way to track what refs need to be rebuilt and tended to.
  • I'm not sure if this part is possible but can the refs, if they do not have a name, be given one based on the date field? If the source date is, for example, "July 1, 1932" or "1 July 1932", then tag the ref as <ref name="BC19320701">?

If this is doable and functional, I would like to extend this to other publications used as citations that are currently linked to worldradiohistory. Broadcasting is cited the most from said website so this would be the largest such task, by far. Tagging Sammi Brie and Mikeblas. Courtesy ping to LunieZunie and HurricaneZeta. Nathan Obral • he/him/🦝 • tc03:20, 15 January 2026 (UTC)[reply]

This is fine and better than removing it outright since it retains the citation itself - the objective is just to remove the link. HurricaneZetaC 03:22, 15 January 2026 (UTC)[reply]
Removing the link is a big nono. That's how you find it on webarchive etc... The solution, if a new main site isn't available, is to have |archive-url= added to these templates. I also fail to see how these violate COPYVIO. If they did, these magazines would have long issued a DMCA takedown of WRH and the site wouldn't be winning awards for digital preservation. Headbomb {t · c · p · b} 20:33, 17 January 2026 (UTC)[reply]
I know bots can find archive links - can one be programmed to do that? As for the copyright infringement, the notice linked in the original local spam blacklist request was https://www.worldradiohistory.com/Radio_and_Hobbies.htm. HurricaneZetaC 20:42, 17 January 2026 (UTC)[reply]
Yes, and for those, you get pointed to https://www.worldradiohistory.com/Radio_and_Hobbies.htm, which have been DCMA takedowned. Headbomb {t · c · p · b} 20:48, 17 January 2026 (UTC)[reply]
Fair point, but just from searching I can find a few scans of magazines that are pretty clearly not of copyright and seem to have been reproduced without permission. While some of them are out of copyright one scan I found included the barcode printed at the front and was from 2000, with a copyright notice. HurricaneZetaC 21:01, 17 January 2026 (UTC)[reply]
Fair use allows for archiving copies to be made and distributed. We should not presume copyright violation when none have been shown, especially from an award-winning archival site that has a proven history of respecting DCMA takedowns requests. Headbomb {t · c · p · b} 03:27, 21 January 2026 (UTC)[reply]
@Headbomb Fair use does not cover scanning the entirety of a magazine and uploading PDFs for all to see. Internet Archive learned this lesson to the tune of millions of dollars when it lent out unlimited ebooks per copy, and why Anthropic didn't pay for scans of books it destroyed. The large majority of links to WRH are for scans of copyrighted magazines in clear violation of WP:COPYLINK Mach61 07:03, 22 January 2026 (UTC)[reply]
I would like to circle back on this request, if possible. And to clarify, this request would be the first of multiple bot requests to rectify the issue with WorldRadioHistory.com and to untangle tens of thousands of savable citations. Other bot requests would clean up and sanitize citations from Billboard and Radio and Records, followed by all Broadcasting Yearbook citations and, finally, all citations generated from search inquiries (which I did not include here only because the URL strings are very different). From there, the remainder of specialty magazines and trades linked to WRH can be appraised.
I do want to stress that having the invisible comment in the id field can go a long way to essentially repairing tens of thousands of citations and resolving substantial technical debt. Nathan Obral • he/him/🦝 • tc20:47, 4 February 2026 (UTC)[reply]
[edit]

I think that cleanup bots should replace link's that havent been edited once the target page is moved to somewhere else shane (talk to me if you want!) 23:30, 18 January 2026 (UTC)[reply]

Probably a bad idea for a bot task. See WP:NOTBROKEN. Anomie 00:01, 19 January 2026 (UTC)[reply]
 Not done for the template at the top so this will archive. (do you need to be a botop to mark it?) HurricaneZetaC 20:20, 1 February 2026 (UTC)[reply]

Unnecessary disambiguations

[edit]

Is there a way a bot could find instances of unnecessary disambiguations? Specifically, instances of articles named "Title (parenthetical)" where there isn't currently an article at just "Title". (In other words, something like Floofy (band) existing where Floofy is still a redlink.)

I ask this because sometimes I see people stick (film) or (band) at the end of article names unnescessarily, or sometimes the non-parenthetical gets deleted via AFD or PROD and the parenthetical version is never moved to reclaim the title. Ten Pound Hammer(What did I screw up now?) 17:15, 19 January 2026 (UTC)[reply]

I’ve also seen several instances of this during NPP, I will try coming up with something. Vanderwaalforces (talk) 17:42, 20 January 2026 (UTC)[reply]
I did a quick database query and found a total of 37,887 of these, which is far too large to make any kind of useful report. * Pppery * it has begun... 17:50, 20 January 2026 (UTC)[reply]
Can these be moved by a bot? If better information is needed before a bot run, then maybe sort by disambiguation and move the ones we know for sure can be moved (pages with (film), (country film), etc.). Gonnym (talk) 18:06, 20 January 2026 (UTC)[reply]
Yeah, that might narrow it down. Start with ones that are "Name (film)" in cases where "Name" doesn't exist, then maybe the same with (band), as those are the two I see most often. Ten Pound Hammer(What did I screw up now?) 18:20, 20 January 2026 (UTC)[reply]
All 37,000 can't be moved by a bot because sometimes the actual name of a proper noun includes parentheses, like Barugh (Great and Little) (okay, Barugh technically exists, but I'm not convinced there aren't any like that where the base name is red). Specific parenthetical disambiguators can probably be botted; see Wikipedia:Database reports/Specific unnecessary disambiguations. * Pppery * it has begun... 18:27, 20 January 2026 (UTC)[reply]
TenPoundHammer, from my work on Wikipedia:Missing redirects project I obtained User:Qwerfjkl/sandbox/55, which BD2412 organized. — Qwerfjkltalk 20:57, 20 January 2026 (UTC)[reply]
Well, this is quite it then. Vanderwaalforces (talk) 21:01, 20 January 2026 (UTC)[reply]
@Qwerfjkl: I have been meaning to ask your permission to subdivide that page, as it is of rather unwieldy length. Cheers! BD2412 T 21:49, 20 January 2026 (UTC)[reply]
BD2412, by all means. — Qwerfjkltalk 22:28, 20 January 2026 (UTC)[reply]
@Qwerfjkl: sweet, that's a big help Ten Pound Hammer(What did I screw up now?) 00:46, 21 January 2026 (UTC)[reply]

Automatically add Template:AI-retrieved source

[edit]

Since we have Template:AI-retrieved source, it would be nice if a bot could add this template to refs based on whether they have the utm_source parameter set to a LLM value. (See User:Headbomb/unreliable for a list of these utm_source values).

Sources that were manually verified by someone can simply be marked as "good" by removing the utm_source parameter. Laura240406 (talk) 21:56, 19 January 2026 (UTC)[reply]

I would be interested in developing this (n.b. I see that {{AI-retrieved source}} suggests either adding a |checked= parameter or commenting out the template rather than modifying the source URL). — chrs || talk 02:46, 20 January 2026 (UTC)[reply]
[edit]

For those players that appear in lists such as List of men's footballers with 1,000 or more official appearances, List of men's footballers with 100 or more international caps, List of women's footballers with 100 or more international caps, List of footballers with 500 or more goals or List of women footballers with 300 or more goals, create a link that would lead to that respective page (or its list subsection) directly from the relevant number in the infobox, as is already the case in the Cristiano Ronaldo article when it comes to the international caps statistic, for example.

For lists such as List of men's footballers with 50 or more international goals, List of women's footballers with 100 or more international goals, List of footballers with 500 or more goals or List of women footballers with 300 or more goals, the same may be done, with the caveat that many of their statistics already link to player-specific lists (like in the case of the Barbra Banda article, for example), which is of course preferable and should not be changed.

I have not managed to find examples of such lists in other sports, otherwise they could be included as well.

Thank you a lot for consideration! BasicWriting (talk) 01:18, 25 January 2026 (UTC)[reply]

@BasicWriting To be honest, your request isn't clear, at least to me. Mind explaining better? Vanderwaalforces (talk) 14:50, 29 January 2026 (UTC)[reply]
I'm not sure how better to explain this. This is a further example of an infobox that does have a link and this of one that doesn't. You may also see some of my mechanical contributions to this end, like the last one here, before I realized the task was too broad. Thanks a lot! BasicWriting (talk) 15:37, 30 January 2026 (UTC)[reply]
@BasicWriting So, I took my time to look at this, and I think it is more technical that it seems; we do not have a clear if–then logic yet. I wanted to ask "Which infobox fields are in scope?" but then I realised that the field is not consistent. For example, Ivan Perišić, the International caps field that needs to be linked is | nationalcaps4 =, but in another article, like Barbra Banda, the field was | nationalcaps2 =, and for Robert Lewandowski, it is | nationalcaps3 =; in all, that is pretty inconsistent if you ask me. It is not a problem that it is inconsistent, it is only inconsistent for a bot to actually efficiently and correctly work with. Although, I can see that the equivalent | nationalteam# = for these articles is usually only mentioning the "country", which is something we could use to streamline.
Can you provide a fixed whitelist of list pages the bot may link to? Must the player be explicitly present on the list page?, Is linking to the list page sufficient, or must it be player-anchored?
The request is conceptually valid, but currently underspecified. Vanderwaalforces (talk) 16:09, 2 February 2026 (UTC)[reply]
Thank you for looking into this! The way this particular issue could be streamlined is that, if we take the article about international caps as an example, the international caps that ought, according to my proposal, to link to that respective list, are by definition all the international caps above 100. So the bot could be considering the number itself and linking any number of 100 and above to said list. If only the player articles that the list links to are considered, then those players will very likely be those explicitly present on the list page (and same for other lists), so the whitelist could simply be all the articles being linked to by the nine lists I have listed. However, the important caveat is that, again, some of those statistics do already have links to player-specific lists, like in my List of international goals scored by Barbra Banda case, so I'd envision the bot as checking for those instances too and not changing them. As for anchoring, so far, those infobox links that are present are not anchored so far, but perhaps they could be to make it easier for the viewers.
The only issue with this strategy would be those players, who have completed the milestone playing for several teams. But this seems to not be an issue with, say, those players who competed for both West Germany and later Germany (like Lothar Matthäus), as their infobox statistics is taken together, and it is probably the case in other instances too (such as, the goals in the infobox are listed separately for various teams, but the infobox does provide a total sum).
These are my thoughts, but obviously I am not as gifted in coding as to foresee all possible issues with taking this route. BasicWriting (talk) 19:06, 2 February 2026 (UTC)[reply]
@BasicWriting I am still exploring other ways to correctly identify which | nationalteam# = we can correctly look at. Is it true that most of these Country names are almost always linked to “COUNTRY national football team”? Vanderwaalforces (talk) 19:50, 2 February 2026 (UTC)[reply]
I would think so (in case of the international lists, anyway). BasicWriting (talk) 22:33, 2 February 2026 (UTC)[reply]
@BasicWriting For this list and this one too, take anybody listed there as an example, what value in the infobox should be linked to these lists? Please do the same for the last 4 lists you mentioned. Vanderwaalforces (talk) 22:54, 3 February 2026 (UTC)[reply]
Of the lists you've mentioned: for the first one, | totalcaps =, for the latter, | totalgoals =, at least that is my base understanding. In the latter case, the link would look better if it were not taking the parentheses the goals are shown in, but I think that does indeed happen automatically already. Not every player does have the sum listed in the infobox, but that should not be not an issue, I imagine. (Or we might write a different bot that first sums the infobox statistics, but that is a different task altogether.)
For the list you've mentioned earlier, it would indeed show a country name after | nationalteam# =, but some of those might be names of non-existing countries. It might be practical to make it navigate to the largest number among the national teams.
In case of women footballers, much of the same holds with the further caveat that their infobox statistics seem to be oftentimes lacking, so the bot won't link many instances.
In any case, I would, as a precaution, make the bot first look at the number it is linking, and check whether it is truly equal or greater than 1000, 500, 300, 100 etc. in those respective situations. And, again, not apply any of the above in the case there already is a link present in the infobox field, which is going to be the case mostly with | nationalgoals# =.
In some ways, I have changed my mind and I don't think it would be a good idea to work on the club statistics at all and just keep our focus on the international statistics. The reason for that is that the club statistics do indeed follow various methodologies and the numbers seem way more inconsistent within the lists and the infoboxes. Possibly, they could know more about that in the respective portal.
Secondly, and I am even less certain of it, given that some of the international goals fields in the infobox already link to player-specific lists, providing a link in the cases they don't to the general list could possibly border with WP:EASTEREGG, among some interpretations of that rule.
The cases where, I think, we should definitely go forward without any qualms are thus List of men's footballers with 100 or more international caps and List of women's footballers with 100 or more international caps, even though those might be the hardest to program.
Thank you a lot for your help! BasicWriting (talk) 11:12, 4 February 2026 (UTC)[reply]
@BasicWriting As a matter of fact, those are the only two lists I think we can correctly work on with a bot. Because I looked at List of men's footballers with 1,000 or more official appearances and found Tommy Hutchison, for example, in the list table, it says "1,178+", but in his article we only have "983". The other lists cannot work because they will very much require human editorial judgment which would not be appropriate for a bot. Same thing applies to the 300-500 or more goals lists.
I think we can also work on/with the XX or more international goals since they usually accompany the international caps lists. But just to be clear, did you say you do not want the international goals to be linked if the international caps is linked? Vanderwaalforces (talk) 12:38, 4 February 2026 (UTC)[reply]
No, that's not what I said. What I was saying in the original request was that the international goals themselves already link to player-specific lists, and changing those should be avoided (so the bot should just for numbers above a certain threshold and ones that are not part of a link already. BasicWriting (talk) 13:32, 4 February 2026 (UTC)[reply]
Ah, got it! I also want to say that, the figures in some of the articles are different from the ones in the list. Take Eseosa Aigbogun for example and the List of women's footballers with 100 or more international caps. Vanderwaalforces (talk) 13:59, 4 February 2026 (UTC)[reply]
It does seem that her particular case might be one of a missing update. But this is again why I said the bot ought to first check the actual number in the infobox. If it is above 100 (in the case of the two lists we ought to go through with), and it is a player whose article is being linked by the list, there is a high probability they did appear more than a hundred times and thus the number ought to be linked through. The men's list does mention some of the cases where the number differs and this is due to whether a country's national football team is a member of FIFA or not and different approaches in counting the appearances against them. Something like that might be the case here as well. So, to conclude, in cases like hers, the bot should not link the article to the list. BasicWriting (talk) 15:13, 4 February 2026 (UTC)[reply]
Coding... Vanderwaalforces (talk) 16:02, 4 February 2026 (UTC)[reply]
@BasicWriting Check these diffs, it worked like magic; Special:Diff/1336581922, Special:Diff/1336581949, Special:Diff/1336581978. Vanderwaalforces (talk) 16:28, 4 February 2026 (UTC)[reply]
These are truly epic! Thank you for your work!
The handballers might indeed be a similar case, but if we take a look at List of female handballers with 1000 or more international goals, I do notice some eventualities, given for example how Jasna Kolar-Merdan played for multiple national teams and her infobox doesn't provide a sum for the appearances. But that is fine if we forgo the cases where the sum is not shown. The male list, which, by the way, has to be moved to "List of men's handballers with 1000 or more international goals" (which will go on to request myself), seems to have some fringe cases too, like that of Frank-Michael Wahl or Talant Dujshebaev, where the sums are missing (but these instances are few and far between and may be addressed manually). BasicWriting (talk) 22:15, 4 February 2026 (UTC)[reply]
If this is okay, then I can go ahead and file a BRFA for this task. Vanderwaalforces (talk) 16:29, 4 February 2026 (UTC)[reply]
I also observed that List of handballers with 1000 or more international goals might just be in the same situation. Vanderwaalforces (talk) 21:37, 4 February 2026 (UTC)[reply]
BRFA filed. Vanderwaalforces (talk) 02:35, 5 February 2026 (UTC)[reply]
@BasicWriting I will not be working with the handballers (men's and women's), I think those are small number of entries altogether and can be manually worked on. Vanderwaalforces (talk) 08:08, 5 February 2026 (UTC)[reply]
Dear Vanderwaalforces, I have finished working on the handballers, and it led me to another (perhaps less complicated) suggestion: we could also try to link all players in these lists to them by the virtue of the "See also" section, where we would list the corresponding lists they're in (again, excluding those links already present). This way we could capture even the lists that are for various reasons we've talked about above not applicable for the infoboxes. BasicWriting (talk) 11:37, 6 February 2026 (UTC)[reply]
@BasicWriting That makes sense. I have added that functionality now. Vanderwaalforces (talk) 08:33, 7 February 2026 (UTC)[reply]

UTM Bot Request

[edit]

Hi! I have a bot idea, and since I don't have the knowledge of how to code it, I'll share my idea:

links = document.getElementsByTagName("a");
for (var link of links) {
    var href = link.getAttribute("href");
    if (!href) continue;
    if (href[0] == "#") href = location.href + href;
    else if (href[0] == "/") href = location.protocol + "//" + location.pathname;
    var source = new URL(href).searchParams.get("utm_source");
    if (source !== null) {
        // Add page to Category "Pages with utm_source={source}"
    }
}

Iterate through all namespaces, then through all pages in that namespace. Please share any questions with me. Thanks! SeaDragon1 (talk) — Happy new year! 14:01, 27 January 2026 (UTC)[reply]

@SeaDragon1 Does that category exist? What problem are we solving with this? Vanderwaalforces (talk) 12:55, 29 January 2026 (UTC)[reply]
It makes sorting easier. SeaDragon1 (talk) — Happy new year! 14:20, 29 January 2026 (UTC)[reply]
Besides, I'm pretty sure there are a lot of non-existent categories that pages still put themselves in. SeaDragon1 (talk) — Happy new year! 14:24, 29 January 2026 (UTC)[reply]
I'm not entirely sure what you're sorting. I have a bot task that removes utm_source tags from URLs, is that what you're wanting? Primefac (talk) 20:15, 1 February 2026 (UTC)[reply]
Well, I mean we can know WHICH pages have WHICH utm_source tags, so we don't have to scour every single page. SeaDragon1 (talk) 20:17, 1 February 2026 (UTC)[reply]
Hello? SeaDragon1 (talk, contribs, happy birthday!) 17:24, 19 February 2026 (UTC)
HELLO? SeaDragon1 (talk, contributions) 16:09, 20 February 2026 (UTC)[reply]
@Primefac? SeaDragon1 (talk, contributions) 16:09, 20 February 2026 (UTC)[reply]
I personally don’t understand this request tbh, it’s not well put together at least to me. Vanderwaalforces (talk) 17:44, 20 February 2026 (UTC)[reply]
What I mean is:
We can organize pages based on what UTM source the links in the article have.
For example, this would be in Category:Pages with utm_source=chatgpt.com:
[https://example.com/page?utm_source=chatgpt.com] SeaDragon1 (talk, contributions) 18:43, 20 February 2026 (UTC)[reply]
Primefac, is there any worry of it removing a useful flag of an AI-generated reference? — Qwerfjkltalk 19:57, 20 February 2026 (UTC)[reply]
I'm not particularly concerned about the content of the utm values, I suspect a large number have been removed but utm tracking doesn't necessarily mean the site itself is AI-generated. Primefac (talk) 23:11, 20 February 2026 (UTC)[reply]
Primefac, no, but it could suggest that the source may not verify the text. — Qwerfjkltalk 10:50, 21 February 2026 (UTC)[reply]
Do we have a consensus, or..? SeaDragon1 (talk, contributions) 14:33, 24 February 2026 (UTC)[reply]
[edit]

Way back in the early days of the project Template:Infobox settlement was created, and all was good. This template was inserted into many thousands of pages before someone realized "hey, maybe we shouldn't include the link subdivision_type = [[List of sovereign states|Country]] and it should be just be subdivision_type = Country for reasons of A) it's an WP:EASTEREGG link and B) people know what a country is, it doesn't need linking. Discussions were had on the template talk page, and the link was swiftly removed. However this was not before the template had been copied onto many thousands of settlement article pages, and this link still persists there to this day on many. I've been removing it when I see it, but this seems like a very simple and uncontroversial job for a bot to perform. It appears simple (it's only in the Template:Infobox settlement section of an article) and it's always the same field. Thoughts? Canterbury Tail talk 18:12, 28 January 2026 (UTC)[reply]

Canterbury Tail, search gives 150,000 articles. I can take care of this, but I've a bit busy so it would have to wait until the weekend. If anyone else wants to have a go before then feel free. — Qwerfjkltalk 22:40, 28 January 2026 (UTC)[reply]
BRFA filed. Vanderwaalforces (talk) 16:23, 29 January 2026 (UTC)[reply]
Thank you all kindly. Canterbury Tail talk 23:30, 31 January 2026 (UTC)[reply]

Hip-hop hyphens

[edit]

Although Wikipedia's categories for the genre of music use the spelling hip-hop with a hyphen, the redlinked category report sees a constant infusion of pages where somebody has either erroneously used the unhyphenated "hip hop" on a new article instead of the existing categories, or tried to flip longstanding articles that were in the existing categories back to the unhyphenated spelling for some reason. There are always at least one or two, and sometimes several more, "hip hop" categories for which the exact same category already exists at the "hip-hop" form, on that report pretty much every single time it updates with new redlinked categories.

So I wanted to ask if there would be any support for ensuring that any category with a hyphenated hip-hop spelling also has a categoryredirect from the unhyphenated hip hop, so that this stops being my problem to fix and can be left to the bots that fix categoryredirect errors instead? And if so, then is there a bot that could be set loose on creating the redirects? Bearcat (talk) 15:54, 31 January 2026 (UTC)[reply]

Establishing support for these 800-some category soft-redirects be created would probably be better done at WP:Village pump (proposals), or maybe WT:WikiProject Hip-hop. P.S. You may want to rename Category:Comedy hip hop musical groups. Anomie 19:30, 31 January 2026 (UTC)[reply]
[edit]

I'm requesting that a task be created for a bot to perform the following action per MOS:POSLINK that has very little chance of having any false positives:

Replace [[Foo|Foo's]] with [[Foo]]'s.

This can probably be a task that runs an indefinite amount of times, probably once every month or so. Steel1943 (talk) 21:38, 31 January 2026 (UTC)[reply]

Is there any consensus for a bot to mass replace these for MOS:POSLINK though? Tenshi! (Talk page) 21:43, 31 January 2026 (UTC)[reply]
WP:BOTREQUIRE doesn't require that, and the content of MOS:POSLINK was formed by a RfC. But, if that route needs to be taken, by all means, bureaucracy away! Steel1943 (talk) 21:48, 31 January 2026 (UTC)[reply]
This seems like a WP:COSMETICBOT to me. * Pppery * it has begun... 21:52, 31 January 2026 (UTC)[reply]
(edit conflict) Interesting, didn't know that existed. (But, then again, I don't read the bot policy page that much.) Whatever happens here happens, but either way, this policy was recently formed via consensus and this type of change has ... almost no false positive chance. Steel1943 (talk) 21:55, 31 January 2026 (UTC)[reply]
Technically not WP:COSMETICBOT, as it does make a difference to the rendering of the link. Anomie 00:22, 1 February 2026 (UTC)[reply]
It does not make a difference in the rendering of the link. Vanderwaalforces (talk) 05:38, 1 February 2026 (UTC)[reply]
It does, though. Foo's is different from Foo's. —Myceteae🍄‍🟫 (talk) 06:06, 1 February 2026 (UTC)[reply]
@Vanderwaalforces: It may be difficult to see, but notice the differences between how the "'s" is linked in both the links Myceteae made above. Steel1943 (talk) 20:18, 1 February 2026 (UTC)[reply]
Maybe I was understanding "rendering" differently. But why I said there's no difference is that both of them have the same href= and title= values, only the display text is different. Vanderwaalforces (talk) 21:30, 1 February 2026 (UTC)[reply]
COSMETICBOT rules are primarily about whether the rendered HTML differs between the two edits. Removing whitespace does not affect how the page is rendered. Changing the formatting of a URL from Foo's to Foo's does affect the HTML output of the page. In my opinion, this does not count as a cosmetic edit. phuzion (talk) 17:07, 2 February 2026 (UTC)[reply]
(edit conflict) I'm not objecting to this, was just wondering if there was or not. Tenshi! (Talk page) 21:53, 31 January 2026 (UTC)[reply]
Oh, I was thinking that your question was based off of it being some kind of prerequisite to perform such a task (as I would've expected from a place that has editors that are probably rather technical.) My apologies for the misunderstanding. Steel1943 (talk) 21:57, 31 January 2026 (UTC)[reply]
As I understand, there is no requirement that bot requests must have consensus beforehand, although for contentious tasks (e.g. an adminbot that deletes G13'd drafts) it would likely need a consensus to run such a bot. Tenshi! (Talk page) 22:09, 31 January 2026 (UTC)[reply]
I think for a task that is borderline COSMETICBOT like this one, even if not strictly required, it would be a good idea to get consensus beforehand. — Qwerfjkltalk 14:10, 1 February 2026 (UTC)[reply]
It looks like there was an RFC that created a subsection of the MOS that indicates that we shouldn't have the 's inside of links. I'd say that's a pretty good consensus. I'll see if I can come up with something. Primefac (talk) 20:18, 1 February 2026 (UTC)[reply]
I'd say that's a pretty good consensus. Agreed. This bot would straightforwardly correct clear cut violations of MOS:POSLINK, a recent addition to the MOS which made following a well-attended and near-unanimous RFC. Thanks @Primefac for working on this and thank you to @Steel1943 for suggesting this fix. —Myceteae🍄‍🟫 (talk) 20:38, 1 February 2026 (UTC)[reply]
Any updates on this? Steel1943 (talk) 18:52, 10 February 2026 (UTC)[reply]
No. It's still on my list. Primefac (talk) 11:44, 11 February 2026 (UTC)[reply]

Mass-delinking of word "population" in articles about Japanese populated places, etc.

[edit]

Please see Wikipedia:AutoWikiBrowser/Tasks § Mass-delinking of word "population" in articles about Japanese populated places. I was sent here but there are so many odd things going on that this is just a pointer, for now. Graham87 (talk) 08:56, 2 February 2026 (UTC)[reply]

@Graham87 Tbh, I think the move request discussion makes sense, and that what's needed now is for the fate of the current population and population (human biology) articles to be determined; I would support the move.
But then, if the move happens, then it articles linking to the current population that are not related to human biology would still need to be updated, which makes the whole thing really complicated as you've mentioned. Vanderwaalforces (talk) 09:45, 2 February 2026 (UTC)[reply]
This will likely be a very simple bot run. I'd be happy to knock this out once we have determined the fate of those two articles. Please feel free to ping me if/when we come to a conclusion. phuzion (talk) 17:01, 2 February 2026 (UTC)[reply]

Request for Co-operator for User:FireflyBot

[edit]

Hello, Bot afficiandos,

I regularly handle stale drafts for the project and for the past year, I have gotten nervous about FireflyBot's operation. I check every six months to make sure that it is operating daily because, in the past, when it has stopped working, it can take months before anyone notices its inactivity. As long as I have been around, its malfunctioning, at least regarding CSD G13 notices, has been rare, maybe 3 times over the past five years, but it performs this essential but rarely noticed task which is informing content creators when their drafts and sandboxes are hitting the 5 month mark for inactivity. When the calendar hits 6 months, the draft then becomes eligible for speedy deletion, CSD G13. My one issue with FireflyBot is that it only does the 5 month notification once for drafts and I wish it would do so every time the inactivity level hits 5 months but let's leave that matter for another discussion. But these 5 month notices are very useful, it helps content creators keep track of their work and lessens the number of restoration requests at WP:REFUND from editors who want their expired drafts to be restored after a CSD G13 deletion.

The reason why I am posting this here is that the bot oprerator, User:Firefly has not been active as an editor in over a year and also no longer has admin privileges. The last time I emailed them, back in early 2025, I didn't get a reply and I think they have moved on from working on the project. Even though FireflyBot is very reliable, I was wondering if any knowledgeable editor would be willing to sign on as a co-operator of this bot in case there are any problems in the future. I don't anticipate any trouble but it would put me at east if I knew there was a backup bot expert I could turn to if there were any emergencies. I don't know if this requires approval from Firefly or we could just make this an informal agreement but I thought I would be proactive and make the request now instead of waiting until a problem emerges at some point in the next few years. Thanks, in advance, for any assistance offered. Liz Read! Talk! 00:11, 6 February 2026 (UTC)[reply]

I wouldn't mind being a maintainer since I also am looking after a tool of Firefly's, however the major problem is that the Toolforge Standards Committee will in most cases refuse to give other editors the bot passwords or OAuth consumer keys in adoption request transfers, which in essence is what is needed to log in as FireflyBot. Essentially, if an error occurred and FireflyBot stopped working and if Firefly isn't around, a full takeover of its tasks would be needed by a new bot. Tenshi! (Talk page) 00:23, 6 February 2026 (UTC)[reply]
DreamRimmer bot III is already approved as a backup bot. – DreamRimmer 00:31, 6 February 2026 (UTC)[reply]
(edit conflict) That is useful to know that there's a backup already. Has the G13 notifications stopped after that BRFA at all? Tenshi! (Talk page) 00:40, 6 February 2026 (UTC)[reply]
It's probably worth setting up notifications via Wikipedia:Bot activity monitor if you/we are worried that the bot might silently stop working. Legoktm (talk) 01:05, 6 February 2026 (UTC)[reply]

Useless non-free no reduce tags

[edit]

{{Non-free no reduce}} is intended to keep a bot (I think it is DatBot (talk · contribs)) from downsizing large non-free images to no more than 100k square pixels. Therefore, it is redundant and pointless if an image with size ≤100k has this tag. I recently found an image File:Gangbusters title.png that had this tag with a size of 418×239 px (99902 px2). –LaundryPizza03 (d) 21:16, 10 February 2026 (UTC)[reply]

This page is for bot requests. Are you requesting a bot? SeaDragon1 (talk, contributions) 16:11, 20 February 2026 (UTC)[reply]
Yes, because this is an issue that is trivial to handle and likely to recur in the future. –LaundryPizza03 (d) 04:40, 21 February 2026 (UTC)[reply]
So from a bot perspective, how often (generally speaking) does this sort of thing happen? Primefac (talk) 12:56, 21 February 2026 (UTC)[reply]
Looks like there are currently 204 such images, 41 with the latest file revision in 2025 and one so far in 2026. There are also four audio files with the tag: File:J Dilla - Don't Cry.ogg, File:J Dilla - Time, The Donut of the Heart.ogg, File:Jane Remover - Dreamflasher.ogg, and File:Kanye West - Blood on the Leaves.ogg.
I note 57 of the 204 are in Category:Sports uniforms, 54 of them last uploaded in 2024. Many appear to be templated images showing two variations of a uniform, in contrast to others (over 100000px2) that have three variations (e.g. File:ECM-Uniform-PHI.png vs File:ECA-Uniform-DET.png), which makes me suspect they started tagging all new template images with {{Non-free no reduce}}. Anomie 19:55, 21 February 2026 (UTC)[reply]

archive.today cleanup

[edit]

Cross-posting Wikipedia:Link rot/URL change requests § Migration away from archive.today, but as a non-standard URL replacement I think this would be more appropriate for a new bot task.

After an RFC, links to archive.today have been deprecated and should be removed as expeditiously as possible. The current instructions are at WP:ATODAY. It would be amazing if a bot could loop through all links to dot-today and:

  1. Get the page being archived (chopping off the https://archive.(fo|is|li|md|ph|today|vn)/<numbers>/ prefix from archive URLs, like https://archive.today/20120710094053/http://freespace.virgin.net/howard.anderson/loospreparations.htm)
  2. If the URL is available at archive.org, replace the dot-today link with a dot-org link. Dot-org has an API for fetching already-archived URLs, and it also returns the appropriate date for the |archive-date= (from {{cite xxx}}) or |date= (from {{webarchive}})) parameters.
  3. If the page is not archived at archive.org:
    1. If the original URL is still live, try to archive it at archive.org (after a lot of digging, I found the API instructions for saving a page). If archiving works, replace the dot-today link with dot-org; otherwise, remove the WP:EARLYARCHIVE.
    2. If the original URL is dead, tag the link with {{New archival link needed|date={{subst:monthyear}}|bot=insert your bot's name here}}

Thanks a million, in advance. Best, HouseBlaster (talk • he/they) 21:50, 22 February 2026 (UTC)[reply]

This is not an appropriate task for a bot. sapphaline (talk) 22:08, 22 February 2026 (UTC)[reply]
I disagree; if we manage to run things like IAbot and WaybackMedic, which have to deal with the same hurdles, we can find a way to make it work. Best, HouseBlaster (talk • he/they) 22:11, 22 February 2026 (UTC)[reply]
IABot and WaybackMedic archive live URLs; of course doing so is 1000 times easier than trying to archive dead URLs. sapphaline (talk) 22:14, 22 February 2026 (UTC)[reply]
They... do in fact archive dead URLs? Unless you select Add archives to all non-dead references (Optional), IABot only adds archives to dead URLs. WaybackMedic task 3 and 4 do likewise. Best, HouseBlaster (talk • he/they) 22:23, 22 February 2026 (UTC)[reply]
You have to verify that the content relevant to the citation is actually available at the live url and/or archive.org. This is not always true, and it is not possible to reliably do this by bot. A few days ago I found an instance where the URL was dead and the archive link was of a 404 page, fortunately an older archive snapshot actually had the content saved but this required my abilities as a human to determine. Thryduulf (talk) 22:51, 22 February 2026 (UTC)[reply]
It should be possible to automatically extract and compare content, and replace the archive.today link with the alternative link if the match percentage is very high, but this would certainly be more resource-intensive than a search-and-replace bot. Bit of brainstorming here: Wikipedia talk:Archive.today guidance#Bot for checking for identical text. Dreamyshade (talk) 23:26, 22 February 2026 (UTC)[reply]
To build off of this, Wikipedia talk:Archive.today guidance#Most common links suggests that there are 600+ instances that can be fixed by a bot swapping from archive.today to the Internet Archive along with 1,000+ instances where a page changed urls and could be updated by changing the link with a bot. --Super Goku V (talk) 10:24, 23 February 2026 (UTC)[reply]
  • I think a bot is premature. I'm feverishly trying to edit pages from my watchlist (that contain archive.today) in the hopes I can get them all done before "some bot" comes along and yanks them out from under me. I'm occasionally using the archived archive.today link (on a browser with uBlock Origin) to see the page. Sometimes it's worthless (like a 404 page copied) and I can just remove it from the Wikipedia article, but other times I've been able to use other URLs and title wording found on that archive.today page that have led to me successfully finding (a) actual live pages with newer URLs, and (b) archive.org pages which weren't available to me with only the information found in the Wikipedia article. To resolve about 10% (estim.) of the archive.today links, I've needed to view the archive.today copy.   ▶ I am Grorp ◀ 23:14, 22 February 2026 (UTC)[reply]
    This proposed bot would not entirely remove any links. It would replace links to dot-org ones, and tag ones that can't be replaced. Best, HouseBlaster (talk • he/they) 23:29, 22 February 2026 (UTC)[reply]
    Okay. That's acceptable.   ▶ I am Grorp ◀ 23:55, 22 February 2026 (UTC)[reply]
  • A couple big points: We need a bot - the task is too large to do manually, and the RfC close is unambiguous that these links need to go. The bot is technically difficult - We have plenty of comments from people who know what they're talking about indicating that the full task is not possible to automate without many errors. I definitely think we want a bot that gets used in a staged approach. As a first step, peeling the archive off of links we believe are alive leaving the live link solves a large fraction of the problem. After that we can start working on the rest website by website to figure out if other archive sites are consistently good, consistently impossible, or messy, and handle the first 2 automatically while leaving the third group for a manual cleanup. Tazerdadog (talk) 00:09, 23 February 2026 (UTC)[reply]
    Remember that while removing archives from a live link may solve today's hot-button issue it is simultaneously creating a different problem for the future if the link is not archived somewhere else. Thryduulf (talk) 01:18, 23 February 2026 (UTC)[reply]
    Yeah, a run from IAbot over the affected articles after it stops linking to archive.today would probably be in order. That said, the archive.today issue is absolutely the urgent one. Tazerdadog (talk) 01:56, 23 February 2026 (UTC)[reply]
    I strongly disagree that we should wilfully create future problems without regard for how to solve them. The urgency of replacing archive.today is entirely artificial. Thryduulf (talk) 02:01, 23 February 2026 (UTC)[reply]
    There is consensus that it should be removed as quickly as possible because it is an unreliable source containing malware. That sounds pretty urgent to me. Best, HouseBlaster (talk • he/they) 02:21, 23 February 2026 (UTC)[reply]
    No, there was consensus to remove links "as soon as practicable" due to alleged verifiability issues. Even if there was consensus that the code was malware and/or that it required action (and there was no consensus on that matter alone) the "urgency" is entirely chosen by some of the participants in the RFC. A "solution" that creates new problems without regard to solving them is not a practicable one. Thryduulf (talk) 03:03, 23 February 2026 (UTC)[reply]

    There is a strong consensus that Wikipedia should not direct its readers towards a website that hijacks users' computers to run a DDoS attack, and then linking to our guideline about malware, sounds like there is consensus that the link is malware. I haven't heard a single argument (other than proof by assertion) to rebut the arguments based on the literal definition of malware. To reiterate, we define malware as any software intentionally designed to cause disruption to a computer, server, client, or computer network. It is software (obviously), it causes disruption to a computer network (that's what a DDOS is), and it was intentional. It is malware.

    Many edits might create future problems. But we are worried about the present. I don't see why a solution which might cause isolated problems at some indeterminate point in the future is not practicable. Best, HouseBlaster (talk • he/they) 03:12, 23 February 2026 (UTC)[reply]
    Regardless of the technicalities of whether archive.today's code is or isn't malware, simply removing links is more than "might" cause "isolated problems" it's will cause problems of unknown magnitude, potentially tomorrow. WP:V is a core, non-negotiable policy that we absolutely must have regard for because there is a big difference between accidentally creating problems on a small scale through good-faith ignorance and wilfully and recklessly causing known problems on an unknown scale through haste. Thryduulf (talk) 03:23, 23 February 2026 (UTC)[reply]
    Correct, and that's why removing links to an unreliable source is urgent. HouseBlaster (talk • he/they) 03:26, 23 February 2026 (UTC)[reply]
    Except removing the links without replacement isn't urgent for any reason other than the desire of a few self-selecting Wikipedians, and it isn't beneficial or desirable for any reason at all. Further, if you believe that any of the problems being discussed here is links to an unreliable source then you have grossly missunderstood the issues. The issue at hand is ensuring that WP:V will be continued to be met in the future, long after the present moral panic has blown over and cooler heads once again prevail. This is an issue that we have no choice but to address ideally before but certainly no later than the time we address the removal of the existing archive. Thryduulf (talk) 03:42, 23 February 2026 (UTC)[reply]
    Houseblaster responded to my same concerns above: This proposed bot would not entirely remove any links. It would replace links to dot-org ones, and tag ones that can't be replaced.   ▶ I am Grorp ◀ 03:44, 23 February 2026 (UTC)[reply]
    That's good, but it isn't part of this proposal by Tazerdog, As a first step, peeling the archive off of links we believe are alive leaving the live link solves a large fraction of the problem. Thryduulf (talk) 05:19, 23 February 2026 (UTC)[reply]
    @Thryduulf Having material sourced to a website that we know (you get to interact with the evidence of alterered screenshots - specifically compare [4] with with archive.today website, if you'd need to) is a WP:V issue. We need to ensure WP:V is met - but just because I can verify the claim to a screenshot of a newspaper somebody put on reddit, alongside a "trust me bro, this is legit" doesn't mean I it's guaranteed protection under the verification policy. In fact, it isn't - if the claim is related to living people, and I believe it to be contentious, then it's 100% not guaranteed and policy dictates that we remove or better source the material without delay. Now, whether a bot it is the best way to do that, I'll leave to people who actually have experience with dealing with bots on Wikipedia. But to say something to the effect of "we have to keep a website we know has forged material around because they're the only way I can verify these sensitive BLP claims (one of the few things for which we actually require an inline citation)" is, quite frankly, ludicrous. GreenLipstickLesbian💌🧸 05:01, 23 February 2026 (UTC)[reply]
    Almost none of that is related to the actual reality of the situation, as opposed to unsubstantiated and unverified claims about what the archive.today might be doing. When all the hyperbole and exaggeration is put to one side, it is equally important that we ensure WP:V is met tomorrow as well as today as it is to remove archive.today links that there is a tiny possibility might not be accurate (and in none of the cases has anyone demonstrated any changes that are relevant to any material verified, I don't even recall any allegations of such but there are so many accusations in so many parallel discussions I may have missed some). Your final sentence also makes it clear you haven't actually read and/or understood much of what I'm saying. Thryduulf (talk) 05:17, 23 February 2026 (UTC)[reply]
    @Thryduulf, if you looked through the list at WP:NOTGOODSOURCE, and compared it to what we know about this collection of archives, do you think you'd be left feeling like it's a reliable archive?
    • "It has a reputation for fact-checking and accuracy" – Um, nope. In fact, the owner appears to be going out of their way to ruin their reputation.
    • "It is published by a reputable publishing house" – Heading determinedly into the "disreputable" territory, wouldn't you say?
    • "It has a professional structure in place for deciding whether to publish something, such as editorial oversight or peer review processes" – Professionalism is a good word to describe what's missing here.
    Remember Lugnuts' parting claim that he'd deliberately falsified information in some articles, and how that has haunted some editors for years, even though there's no evidence that Lugnuts ever did that to any article? This situation is worse than that. We have proof that this website is doing that, and the risk is enough to turn most editors against it. We don't meet WP:V's requirements by having an unreliable archived copy of a now-dead website.
    Also, have you noticed the pattern? We say "at least they're not doing X" on one day, and the next day, they do X. We say, "well, they might be doing X, but at least they're not doing Y", and the next day, they do Y. So please think about WP:BEANS before you post any more comments about how they haven't transgressed some further bright line yet. It's already bad enough. WhatamIdoing (talk) 05:44, 23 February 2026 (UTC)[reply]
    WAID has, as expected, said this far more eloquently than I could. One additional point, though, @Thryduulf -- I've stuck to what archive.today has done and threatened to do, and taken those threats seriously. I have made more speculative posts, but over all, not that many.. But I could, if you'd like? For example, let's take the statement we got from the WMF:

    We know that WMF intervention is a big deal, but we also have not ruled it out, given the seriousness of the security concern for people who click the links that appear across many wikis.

    [5]
    Hopefully now we, as a few self-selecting Wikipedians, have taken steps to limit the spread of links, the WMF won't need to take further action. But they might. Forcibly removing all links is something they 100% have the right - some may argue an obligation - to do. I'd rather that we, as a community, deal with the links first, making efforts to switch the links to alternative archives or alternative sources when possible. On our own terms. Because the WMF has made it clear that they haven't ruled out acting, and I think we'll do a better job. GreenLipstickLesbian💌🧸 06:48, 23 February 2026 (UTC)[reply]
    I agree with your interpretation of the WMF's statement. However, it's not the only way this decision could be taken out of our hands. Global blacklisting was proposed today at m:Requests for comment/Deprecate archive.today. WhatamIdoing (talk) 06:59, 23 February 2026 (UTC)[reply]
    Stupid question - I know the enWiki blacklist and (and the edit filter, in this particular care) ideally doesn't prevent us from editing pages with the blacklisted link, only from adding the link. (Which I've gotten around when requesting cv-revdels by spelling out the "www.example.com" as "www dot example dot com" I think that gets a bit messed up when it comes to reverts. Does anybody know if the global blacklist works similarly, or is the only difference the fact that it impacts all Wikipedia projects? Does it impact all Wikipedia projects, actually? I'm assuming it blocks all namespaces - I think our filter has exceptions for archive .is /today links for project-space areas (which incidentally facilitates cleanup). Have I made a stupid mistake in that assumption? GreenLipstickLesbian💌🧸 08:05, 23 February 2026 (UTC)[reply]
    Special:AbuseFilter allows per-namespace options. MediaWiki:Spam-blacklist unfortunately does not. I believe the only thing that's different about the global blacklist is that ours only affects us, and the global one affects all the wikis. WhatamIdoing (talk) 08:19, 23 February 2026 (UTC)[reply]
I am currently workshopping a related proposal at the village pump idea lab. tl;dr: the tasks of GreenC_bot/WaybackMedic and IABot can probably do some work on the second part, but it's somewhere between unlikely and unclear whether full automation is technically feasible and desirable. Adding the new cleanup tag specific to this problem (its name may be changed to Template:Deprecated archive per the VPI discussion) and removing problematic links should be the bare minimum and both of those tasks can be automated. mdm.bla 01:13, 23 February 2026 (UTC)[reply]
Also @HouseBlaster: Currently the tag does not have a |bot= parameter. Can that just be added into the template? mdm.bla 01:15, 23 February 2026 (UTC)[reply]
It can. It (probably) wouldn't change the output; it would just alert editors viewing the page source code that the tag was placed by a bot. Best, HouseBlaster (talk • he/they) 01:17, 23 February 2026 (UTC)[reply]
 Template adjusted. [new archival link needed] now displays properly with the bot parameter. mdm.bla 01:34, 23 February 2026 (UTC)[reply]
I think you misunderstood the way it normally works: usually, the bot parameter has zero effect on the template. It just exists, and doesn't trip the unknown parameter check. Best, HouseBlaster (talk • he/they) 02:31, 23 February 2026 (UTC)[reply]
@HouseBlaster: Thanks for the adjustments; I'm not a prolific template creator and overthought that whole thing wayyyyyy too much. mdm.bla 05:17, 23 February 2026 (UTC)[reply]

Three domain names that we can remove

[edit]

As Dreamyshade mentioned, about ~12K of these links go to three websites:

  • nytimes.com
  • newspapers.com
  • washingtonpost.com

These two newspapers maintain their own archives (and so do major public libraries). The middle one is an archive itself. I suggest that unwanted archive links to all three of these could simply be removed by bot. It's only ~2% of the links, but since 100% of the archive links to these domain names are unnecessary for WP:V purposes, then nothing more than a simple removal is needed, and we should get that 2% of the job done. WhatamIdoing (talk) 05:24, 23 February 2026 (UTC)[reply]

If a bot removes archive-urls from articles in my watchlist purview, I'll revert them to restore the archived links.   ▶ I am Grorp ◀ 05:43, 23 February 2026 (UTC)[reply]
@Grorp, Newspapers.com is an archive. The source is the newspaper. Newspapers.com is an archive of the source. Archive.today is an archive of an archive of the source. Why do you need an archive of an archive of the source? WhatamIdoing (talk) 05:46, 23 February 2026 (UTC)[reply]
You don't. This subsection and your post above was ambiguous. It seemed to suggest a bot should remove ALL |archive-urls of things like nytimes.com... not just remove archive.today links for nytimes. My bad if I read it as the wrong fork of the ambiguity.   ▶ I am Grorp ◀ 05:51, 23 February 2026 (UTC)[reply]
We are only talking about removing "unwanted archive links" (and in my case, only ones that are 100% unnecessary for WP:V purposes). That includes archive.today but also "archive dot lots of other things", since that same owner has multiple domain names. WhatamIdoing (talk) 06:08, 23 February 2026 (UTC)[reply]
Who has conclusively decided what archive links are unnecessary/unwanted? There are multiple Wikipedia pages dedicated to combating link rot, many of which are even encouraging users to archive everything to help preserve verifiability. And evidently according to WP:PLRT the pervasive threat of link rot has become such a concern that all new links added to Wikipedia are automatically archived. This messaging from WP:DEADREF and WP:MDLI is very confusing. --skarz (talk) 18:12, 25 February 2026 (UTC)[reply]
@Skarz, do we need someone to "conclusively" decide every single thing? Or do you think that editors could use their own best judgement to make decisions? Imagine someone saying "Gee, that's a common university textbook. I remember lugging it around in my backpack. Amazon still sells hardback copies. Okay, we probably don't need a link to an online 'archive' for that book". WhatamIdoing (talk) 20:58, 25 February 2026 (UTC)[reply]
You completely sidestepped my legitimate question and gave me some bogus strawman argument. Nice. Reminder that your own words are that you are removing archives that are "100 unnecessary." Nevertheless, I am not aware of entire textbooks being archived on Wikipedia nor do I support that endeavor. I am, however, aware that Wikipedia policy encourages webpage archival to prevent link rot which in turn protects WP:V. Thanks, and have a great night. --skarz (talk) 01:09, 26 February 2026 (UTC)[reply]
Well, unlike you, I have seen whole books get archive links spammed into the citation (and yes, I did complain to the bot op at the time), and I do think those are 100% unnecessary.
Link rot is a potential problem for websites. Link rot is not a problem for dead-tree media. For example, every source archived in Newspapers.com is a scanned copy of a physical newspaper page. Those sources exist on paper. A URL that allows you to read a dead-tree source online is a Wikipedia:Convenience link. It is not a necessity. WhatamIdoing (talk) 03:01, 26 February 2026 (UTC)[reply]
Here's an example of me removing an archive link for a whole text book. The book's available on Amazon for about $25, and Wikipedia:Reliable sources/Cost applies. WhatamIdoing (talk) 03:28, 26 February 2026 (UTC)[reply]
We also had a formally closed well attended RFC close with a need to remove all of the archive.today links. We're trying to be courteous and do that in a way that minimizes future linkrot, but we do have to do it. In this case, the live links are expected to be stable for these domains. If you want to run behind the bot with a fixed archive bot, great! But standing in the way because removing some links that absolutely need to be removed might cause some issues in the future is unpersuasive when we have the strongest consensus you can get on Wikipedia that these links as is are causing major issues right now. Frankly, how/if/with what priority these links get rearchived is out of scope here - we have to remove these, and the discussion of how to get them rearchived can be separated out. Tazerdadog (talk) 21:47, 25 February 2026 (UTC)[reply]
I am hardly "standing in the way" of anything, I was asking clarification on which archive links are considered unnecessary per WP:PLRT. --skarz (talk) 01:18, 26 February 2026 (UTC)[reply]
Some Newspapers.com archive links are necessary because the publication is no longer available there. For instance, WSWG extensively cites a newspaper Newspapers.com only had available for a few days. It is not even searchable now. Sammi Brie (she/her · t · c) 04:55, 24 February 2026 (UTC)[reply]
So? We're not citing the newspapers.com archive. We're citing the newspaper itself. If you're citing The Mulberry Advance, and it happened to be available |via= the Newspapers.com archive, and now it's not available through that archive, then – who cares? It's still a valid printed-on-paper source, even if you can't see it anywhere online. WhatamIdoing (talk) 21:57, 25 February 2026 (UTC)[reply]
Yes please! This is a good start - it's a meaningful chunk of sources, and it should help us dial in the process. Tazerdadog (talk) 06:02, 23 February 2026 (UTC)[reply]

Reference spam detector

[edit]

I would like to see a bot that would detect likely Reference spam, and generate a confidence score internally, the way User:Cluebot NG does. Ideally, at high levels of confidence, perhaps it could just revert as Cluebot does, but in any case, it ought to generate a project page with a table or log of rated edits so that humans could review the results, comment, perhaps define a confidence threshold for auto-reverts, and of course, provide data for refining and tuning the algorithm.

I seem to be spending more and more time analyzing and reverting WP:REFSPAM, and a lot of them are very obvious and really should not need human intervention. If someone is a new editor, adds substantially the same citation to multiple articles, with no added content (or brief, near-identical content), and has few or no edits outside one topic area (i.e., an WP:SPA), odds are very high they are a ref spammer. Mathglot (talk) 01:34, 24 February 2026 (UTC)[reply]

Iranian village capitalization

[edit]

Trivial I know but the exact string of:

insource:/"|settlement_type        = village"/

returns at minimum 10,000 incorrectly capitalized infoboxes. They all pretty much seem to be Iranian village stubs created by User:Carlossuarez46. The lowercased village should be changed to Village. Example: [6] ~WikiOriginal-9~ (talk) 21:24, 24 February 2026 (UTC)[reply]

This should be super easy for someone with an AWB bot. I'm busy for about a month, but you can ping me at the end of March if you have trouble finding a bot op and I can start the process then. (The first step in the process is a BRFA.) –Novem Linguae (talk) 05:10, 25 February 2026 (UTC)[reply]
I can do a BRFA. Novem, do you think something like will be approved without much hiccups? Wondering maybe if concerns about cosmetic edits or similar would be raised. ~/Bunnypranav:<ping> 10:06, 25 February 2026 (UTC)[reply]
I don't think WP:COSMETICBOT will be a problem. That policy is for edits that don't change the visual appearance of the page. This bot would be capitalizing a letter, which is a visual change. For reasons unrelated to COSMETICBOT, I'd estimate there's a 50% change the BRFA gets approved easily, and a 50% chance that they ask for a consensus somewhere such as a village pump first. –Novem Linguae (talk) 10:27, 25 February 2026 (UTC)[reply]
Tip: If you can point to recently approved BRFAs for similar replacements (in both type and number of articles affected) that were completed without running into community complaints, that can help avoid being asked to find consensus elsewhere. If you can point to a reasonably well-attended WikiProject discussion, that can help too. Anomie 13:33, 25 February 2026 (UTC)[reply]
Thanks. I will file a brfa in a day, after I get a bit free from real life. ~/Bunnypranav:<ping> 15:42, 25 February 2026 (UTC)[reply]
MOS:HEADCAPS is clear: "Capitalize the first character of the first element if it is a letter" ~WikiOriginal-9~ (talk) 14:47, 25 February 2026 (UTC)[reply]
BRFA filed, please do tell if I missed something, filed it just before I sleep. ~/Bunnypranav:<ping> 15:36, 28 February 2026 (UTC)[reply]
[edit]

After a recent page move several pages link to a disambiguation page. Requesting link change from Tennis performance timeline comparison to Tennis performance timeline comparison (women) (1978–present) for pages listed and sorted here. Note:first time performing a request of this nature, which may not be properly worded. 8rz (talk) 11:40, 26 February 2026 (UTC)[reply]

 Done with AutoWikiBrowser and JWB. phuzion (talk) 14:19, 26 February 2026 (UTC)[reply]
Many thanks. 8rz (talk) 00:09, 27 February 2026 (UTC)[reply]