Wikipedia talk:Database reports

From Wikipedia the free encyclopedia

Requests: Please list any requests for reports below in a new section. Be as specific as possible, including how often you would like the report run.

Database reports from searches[edit]

I don't know if there's any way to do this efficiently, but there are a couple searches I have devised that reliably turn up a lot of busted formatting. They are not obtained by querying the database directly, but is there any way to get them on a page such as these? Here are a couple examples:

  • [1], which is insource:/\[1\]\[2\]/ in mainspace, i.e. the string "[1][2]" appearing in the page's source. This almost always means that someone has messed something up and copypasted a sentence from their browser into the edit window, destroying references.
  • insource:/\<sup\>\{\{.itation needed/ in mainspace. This detects when someone has used {{citation needed}} in superscript tags.
  • The big daddy of them all: insource:"citation needed" -insource:"needed|date" -insource:"needed|reason" -insource:/\{\{.itation .eeded\}\}/ -insource:"needed span" -insource:"needed lead" -insource:"needed paragraph" -insource:"needed section" -insource:/on-ne/ -insource:/ded \(Wi/ in mainspace. This gives busted {{cn}} attempts, where somebody just typed "[citation needed]" or "(citation needed)" etc into an article instead of invoking the template. I have a huge regex to fix a few dozen of the most common types of this error in my JWB settings.

Et cetera, et cetera. Usually I fix these myself from JWB but I feel like others would enjoy helping with this as well. Is there a way to set up a bot to do search reports for stuff like these? jp×g🗯️ 22:08, 29 October 2023 (UTC)[reply]

Hi @JPxG: I think we can just have a page that is a collection of these search links and maybe have a bot that updates the hit count daily (to track the approximate number of pages)? The search function gives instant results, which is probably preferable over a page updated periodically by bots. 0xDeadbeef→∞ (talk to me) 10:09, 30 October 2023 (UTC)[reply]
We can query the search index replicas via Toolforge (see See wikitech:Help:CirrusSearch elasticsearch replicas). It exposes some features which are not available form web UI search. toolforge:global-search is one of the few tools that use it, but doesn't seem to expose the extra features, and doesn't provide a way to restrict results to enwiki. Would it be useful to have a {{search report}} template analogous to {{database report}}? – SD0001 (talk) 11:09, 30 October 2023 (UTC)[reply]
Oooh, given that elasticsearch replicas exist (TIL!) it would be nice if we can make use of the extra features. Though if the web UI search is sufficient in some cases I still don't think bot reports would be necessarily beneficial? 0xDeadbeef→∞ (talk to me) 11:37, 30 October 2023 (UTC)[reply]
Rethinking this, its actually probably quite beneficial to have a community maintained list of search queries where a bot would come by and update periodically. Its better at tracking stuff and makes it better for editors to navigate. 0xDeadbeef→∞ (talk to me) 08:59, 23 November 2023 (UTC)[reply]

Top new article reviewers report code needs to be updated[edit]

There was a recent change to PageTriage, where the logging of reviews is split based on whether the target is an article or a redirect. This is causing the Wikipedia:Database reports/Top new article reviewers report to give wrong results. Please change any queries in the code to replace instances of log_action = 'reviewed' with log_action in ('reviewed', 'reviewed-article', 'reviewed-redirect') This should fix the problem. -MPGuy2824 (talk) 03:30, 9 November 2023 (UTC)[reply]

cc @MusikAnimal, who is probably the maintainer of Community Tech bot, which generates that report. –Novem Linguae (talk) 06:01, 9 November 2023 (UTC)[reply]
@MPGuy2824 @Novem Linguae Thanks for the ping! Partially fixed with 80b6552, but I think the counting of redirects is still sort of broken. My understanding (please correct me if I'm wrong): For historical data, we need to still go by page.page_is_redirect, but for where data is available, we should sum where log_action = 'reviewed-redirect'. Is that correct? MusikAnimal talk 15:49, 10 November 2023 (UTC)[reply]
Yes, that is correct. Since this report calculates data over the previous 365 days, we can remove the code that takes care of historical data only after that time. I've set a reminder for myself via W-Ping. Thanks for the quick fix, btw. -MPGuy2824 (talk) 03:11, 11 November 2023 (UTC)[reply]

/Blocked users in user group[edit]

Good evening fellow Wikipedians, so the database report above is no longer updated since October 6 of last year. The not who was updating it, BernsteinBot (talk · contribs), hasn't edited since October 12, 2022. Should we archive the report or get another not to take over the updating? Toadette (Happy Thanksgiving!) 18:34, 22 November 2023 (UTC)[reply]

I suppose I can finally get around to looking into how {{database report}} works (no interest in running a bot ever again after the way mine was treated). The query in the configuration is severely out of date - besides the schema changes, it doesn't cull the extendedconfirmed group, currently at 5867 blocks, and I'm sure it was close to that when BernsteinBot was still running - but that's easy enough to fix. —Cryptic 19:07, 22 November 2023 (UTC)[reply]

If you look at the history of Wikipedia:Database reports/Unused templates (filtered)/1, you can see the size of the report jumping up and down from day to day, starting on 15 November. It should be much more steady. Pages appear on and disappear from the report for no apparent reason. Clues or fixes are welcome. – Jonesey95 (talk) 15:05, 23 November 2023 (UTC)[reply]

Anyone? Pinging Legoktm and 0xDeadbeef, the listed operators of HaleBot, the user that updates this report. – Jonesey95 (talk) 16:32, 28 November 2023 (UTC)[reply]
And now the report has stopped updating. No updates since 2 December. Help? – Jonesey95 (talk) 16:44, 5 December 2023 (UTC)[reply]
Ugh, that's wild. It might be a few days before I can look in depth. I wonder if one of the DB replicas is out of sync with the others...or maybe something changed and our query is just busted now. Legoktm (talk) 07:29, 6 December 2023 (UTC)[reply]
This still isn't working properly. Any idea what the issue is? Gonnym (talk) 09:07, 18 December 2023 (UTC)[reply]
OK so I tracked down phab:T354089, which seems to be that the replica has fallen out of sync with production, causing some weirdness, but there's more to the story, I'm still debugging. Legoktm (talk) 04:45, 29 December 2023 (UTC)[reply]
@Jonesey95, @Gonnym: I've applied a fix to the query logic, I'm not sure if this will fully address the issue but it should surface some more unused templates. Legoktm (talk) 04:53, 29 December 2023 (UTC)[reply]
There are 1,675 templates listed on the report at this writing, which is probably about the right number. We'll see if it fluctuates into the 200–300 range, as it has been doing, or if it stays relatively stable. Thanks for continuing to track down this strange problem. It's challenging to debug a problem when you are not convinced that you have found the actual cause of the problem. – Jonesey95 (talk) 14:13, 29 December 2023 (UTC)[reply]
@Jonesey95, et al: how do the updates over the past few days look - are we OK to call this resolved? Legoktm (talk) 06:32, 3 January 2024 (UTC)[reply]
So far seems good. Thanks! Gonnym (talk) 09:12, 3 January 2024 (UTC)[reply]
Yes, the updates appear to be working correctly. I check them daily. Thanks! – Jonesey95 (talk) 14:57, 3 January 2024 (UTC)[reply]

Wikipedia:Database reports/Uncategorized templates typically updates once a week on Monday. It is now 25 hours overdue. HaleBot's talk page redirects to this page. Pinging Legoktm and 0xDeadbeef, the listed operators. – Jonesey95 (talk) 17:20, 5 December 2023 (UTC)[reply]

Sorry, this is my fault. Should be fixed now and I just kicked off a run. I'll be back online in like 6 hours in case it didn't work to debug further. Legoktm (talk) 21:06, 5 December 2023 (UTC)[reply]
Legoktm, There are numerous weekly reports that haven't updated since November 26/27. --DB1729talk 01:15, 6 December 2023 (UTC)[reply]
Yes, I believe he fixed it for all reports. 0xDeadbeef→∞ (talk to me) 01:21, 6 December 2023 (UTC)[reply]
Great, and so does someone need to "kick off a run" for each of them now? DB1729talk 01:25, 6 December 2023 (UTC)[reply]
I believe when he said "kicked off a run" he meant for all reports. 0xDeadbeef→∞ (talk to me) 01:38, 6 December 2023 (UTC)[reply]
Ok thanks. I can be patient:) I only mentioned it because the one discussed above, Wikipedia:Database reports/Uncategorized templates, updated several hours ago. While the others have not yet updated. DB1729talk 01:47, 6 December 2023 (UTC)[reply]
And some daily reports have not updated since Dec. 2. Hopefully the same fix will have them back on track. Dicklyon (talk) 03:10, 6 December 2023 (UTC)[reply]
There are still some issues I'm debugging, but more reports should be updating now... Legoktm (talk) 06:27, 6 December 2023 (UTC)[reply]
OK, I think all the reports are up to date, except the article streak ones. If anything did not get an update, please let me know and I can look again when I wake up in a few hours. Legoktm (talk) 07:31, 6 December 2023 (UTC)[reply]
Thank you!:) DB1729talk 11:13, 6 December 2023 (UTC)[reply]
@Legoktm - Wikipedia:Database reports/Orphans with incoming links has stopped running, and last ran 03:00, 9 December 2023. JoeNMLC (talk) 19:53, 16 December 2023 (UTC)[reply]
JoeNMLC, it's up to date now. That report's done by DannyS712 bot, run by DannyS712. If it happens again, he's probably the best person to contact. BlackcurrantTea (talk) 08:54, 17 December 2023 (UTC)[reply]

Wikipedia:Database reports/Unused templates (filtered) update related to Module:Pagetype[edit]

A recent change to Module:Pagetype has caused some pages to register a self transclusion (but they are still unused). Can Wikipedia:Database reports/Unused templates (filtered) be modified to now check if the template's only transclusion is itself and if so keep it on the report? Gonnym (talk) 12:39, 6 February 2024 (UTC)[reply]

Looking for a template with no transclusions is much easier than just looking for one that happens to be a self-transclusion...I'm thinking of how to restructure the SQL query to accommodate this, if anyone wants to propose a better query that handles this, please do. Legoktm (talk) 04:02, 16 February 2024 (UTC)[reply]
I don't see why it would need it? Just add a clause to the templatelinks join; you already have the template page's page_id. quarry:query/80586. Also note the backslashes in the LIKEs; underscore is a metacharacter. —Cryptic 06:17, 22 February 2024 (UTC)[reply]
And quarry:query/80588 lets you get rid of the postprocessing and all those secondary queries. —Cryptic 07:01, 22 February 2024 (UTC)[reply]
@Cryptic: awesome, I'm glad you're better at SQL than me :) Would you like to submit a PR with your improved query? Otherwise I'll get to it shortly. Legoktm (talk) 05:11, 23 February 2024 (UTC)[reply]
I'd rather not deal with another identity at github. —Cryptic 05:53, 23 February 2024 (UTC)[reply]
Can we switch that page to use {{Database report}}? I set up a sample page at User:Jonesey95/self-transcluded-templates and it seems to work OK. I probably don't understand the larger implications though. – Jonesey95 (talk) 06:25, 23 February 2024 (UTC)[reply]
We have another issue which could be related to this change. Template:Anarchism US shows a transclusion at its talk page but it's not used there. So the updated code should also check if the self transclusion is from its own talk page. Gonnym (talk) 07:49, 13 March 2024 (UTC)[reply]
That's a strange one. I'm guessing that one of the "new pages" lists causes this check somehow. I wonder if the problem will resolve itself after the new template page (created March 11) falls off of the list eight days after its creation. – Jonesey95 (talk) 16:29, 14 March 2024 (UTC)[reply]

HaleBot has not edited for a couple of days[edit]

I'm not panicking yet, but HaleBot has not edited for a couple of days. Over 48 hours, if my math is right. It averages about 45 edits per day, so a two-day break is unusual. – Jonesey95 (talk) 05:01, 22 February 2024 (UTC)[reply]

See T358175. It's trivial to restart, but I've left it in a broken state in case it makes it easier for Toolforge admins to diagnose the underlying root cause. Legoktm (talk) 05:23, 22 February 2024 (UTC)[reply]

Polluted categories[edit]

I wanted to ask if it's possible to generate an earlier-than-usual update on a report. I hadn't personally done a runthrough on Wikipedia:Database reports/Polluted categories in about a month or two while assuming that other people were staying on top of it, but it turns out they weren't — so when I went back to it this morning there were 1,000 categories on it, which is its generation limit, and that limit had only gotten it to the letter P, meaning that there are potentially dozens or hundreds more categories hiding on the other side of the wall.

So I've trudged my way through cleaning up what was there (pity me), but wanted to ask if it's possible to run an early update to catch the post-1,000 stuff instead of having to wait three more days for the regular weekly update. Bearcat (talk) 16:32, 26 March 2024 (UTC)[reply]

Weekly potential U5s database report not updated for nearly a year[edit]

This seems to be broken somehow.

"Potential U5s; data as of 06:56, 25 May 2023 (UTC). This report is updated every 7 days" Wikipedia:Database reports/Potential U5s/1 Flounder fillet (talk) 19:28, 14 April 2024 (UTC)[reply]

I'm looking at the query and I don't see how it ever worked. (Besides doing things very inefficiently, it can't see user pages created after late July 2018 - intentionally, though I can't fathom why - nor users who have any deleted edits, which is probably accidental.) I'll see if I can't come up with something that does what I think it was trying to. —Cryptic 19:50, 14 April 2024 (UTC)[reply]
(quarry:query/81990 if you want some pages to look at in the meantime. It currently sorts alphabetically and cuts off in the B's, which is... not ideal.) —Cryptic 20:19, 14 April 2024 (UTC)[reply]
Now sorting by whether it's a redirect first then by page length, which, while not as good as user creation time, is more likely to be useful than alphabetical. (Sorting by redirect is needed to make it reasonably fast, and there's only a handful of user page redirects that meet the other criteria. And they're likely all problematic anyway.) @Legoktm: This query can be dropped into /dbreps2/src/enwiki/webhostpages.rs without other changes, or I can take over this report with SDZeroBot's {{database report}} if you prefer. —Cryptic 20:40, 14 April 2024 (UTC)[reply]
(Also pinging User:0xDeadbeef, since Lego hasn't edited in a month and I'd, uh, forgotten you also maintain HaleBot. Oops.) —Cryptic 21:00, 14 April 2024 (UTC)[reply]
I have been quite busy these weeks. Feel free to open a GitHub issue/pull request, or ping me here again on the weekends to nudge me.. 0xDeadbeef→∞ (talk to me) 13:57, 18 April 2024 (UTC)[reply]
Is there a reason to prefer pinging someone here with instructions over a GitHub pull request? --MZMcBride (talk) 08:10, 15 April 2024 (UTC)[reply]
Based on what I encountered, the deleted edit thing was probably supposed to make it not list user pages of users whose only contributions are creations of deleted pages and who have warnings and stuff on their User: instead of User talk: for some reason (example: User:ISpeakTruth). Doesn't seem to be that many of those, and these situations can no longer occur, so it's probably not needed. Flounder fillet (talk) 21:40, 14 April 2024 (UTC)[reply]
That's possible, but I really don't think it is. The way it was programmed makes it look like it was accidental - it checks the current total of non-deleted edits in user: and user talk: and compares them against the user_editcount field. If it were intentional, directly checking for deleted edits in the archive table would be a more natural way to do it, more accurate, (much) faster, and could be made to only exclude users with deleted edits in non-user/usertalk namespaces besides. —Cryptic 21:52, 14 April 2024 (UTC)[reply]
I wrote some versions of this query. Of course the query worked previously, the proof is in the page history. My off-hand guess for why it broke is that some query planner got worse or some index got changed and the query is now taking too much time or CPU to generate, but who knows. Sometimes it's a database field that's been renamed, sometimes it's something else entirely.
Cryptic, you should have access to look at the logs yourself, but if you don't for some reason, that seems like the real issue here. I don't know why you'd need to ping Legoktm and others, that seem very silly.
This query made some heuristic choices for finding these types of potentially problematic user pages. These choices obviously have trade-offs. In particular, I happened to be focused on older and longer pages, which is why I added restrictions on page.page_len and page.page_id. I personally also wanted to only find cases where the user had only edited in two specific namespaces, at least to start. However, there are lots of cases that won't be included as a result of making these choices. If a user made a single spam edit to a real article as well as spamming their user page, they wouldn't be included in this report as-written. In cases where user.user_editcount is wrong, this report could omit some pages. In cases where the page length is 498 bytes and still promotional spam that should be deleted, it wouldn't be included here. And so on.
Improvements to this and any other database report are always welcome. I thought the archive table was no longer available in database replicas, but I may be mistaken. Let's see you all do better. Please. :-) --MZMcBride (talk) 07:48, 15 April 2024 (UTC)[reply]
Quarry says my show tables; query against enwiki_p has been queued for 21 minutes now, but I was able to run this query against a database in a different cluster and archive and friends are still available. I guess I was thinking of something else. I'm doubtful it will be efficient or quick to use the archive_userindex table or similar, but I'm very interested to see what you all come up with to uncover more pages to be reviewed and potentially deleted. --MZMcBride (talk) 08:08, 15 April 2024 (UTC)[reply]