A while back, in another thread, I made this offer:
I've done a comprehensive but somewhat simplistic audit. Here's what I found. I need some help in moving forward from here.
The numbers
Out of 788 Wiki pages audited:
- 204 pages (26%) have titles that match the standard format
- 223 pages (28%) are missing any title at all
- 361 pages (46%) have titles that don't match the standard format and will need manual review
The standard I was working from is that a file like .../action/Comment should have a title like "Comment Action" β derived from the filename and its namespace.
It turns out that the largest group of pages (46%) have titles that exist but don't follow that standard in some way or another. Those will each need a human decision at some level (perhaps in bulk) of: whether they should conform to the standard, whether the existing title is OK as is, or whether the difference between the current title and the "standard" exposes that the current title needs a different tweak, other than simply conforming to the standard. For now, there is something there, so they are a lower priority.
As for the missing titles, I haven't fixed anything yet. I expected maybe 20 or 30 missing titles and thought I'd just work through them. 223 is a different proposition, and 361 needing review is a project that requires input from others, not a simple afternoon's excercise.
How I did the audit
The audit runs in two scripts. The first does a Wiki search for all pages in a given namespace using curl and from the resulting page, generates a list of URLs. The second script loads each URL from a given list, one at a time via curl, and greps for the HTML title tag, <H1>. From the filename it generates the standard title and compares that to what was found as the H1 text.
The output is a Markdown table β one row per URL. The first column is a clickable link to the page. The second column shows the status: OK, MISSING, or ERR (with the actual, found title shown for ERR entries). The third column, for MISSING and ERR rows, contains a correctly formatted standard title, ready to paste directly into the Wiki editor.
I'll be sharing both scripts in this thread in later posts. They're well commented and would be a reasonable starting point for anyone doing similar audits here or elsewhere.
What I'm asking for
For the MISSING pages, the workflow is straightforward enough that it might be worth building a KM macro to handle the repetitive steps: copy the formatted new title, open the link to the Wiki page, open the WIki editor, paste in the title, save it. I haven't built that yet, If someone could volunteer to build the macro and someone (else?) could volunteer to use it to fill in even part of the 223 missing titles, that would help complete this first pass pretty quickly.
The ERR pages are a different matter. Those need someone to look at each one and decide whether the existing title is fine, needs a tweak, or needs to be replaced with the standard format. That's judgment work and I'd welcome thoughts on how to organize that. I expect that there will be some large chunks, possibly whole namespaces, that can be identified as "just fine as is".
I will share the full tables here so that you can see what the ERR pages are, in the context of other page titles, but I'd rather share them somewhere that they could be and now that, thanks @peternlewis, has helped me make Wiki posts here, they will be jointly editable. That way the status can be updated as the MISSING entries are fixed and as the ERR entries are resolved.
Does the Wiki support hosted files that editors can update? Otherwise I'm thinking a shared Google Sheet with edit access to anyone with the link. Suggestions are welcome.
[Update:] As discussed in the next few posts, the Forum provides a feature of converting a post into a Wiki that anyone can edit. This should make it easy to keep track of what's been edited and what is still to do.
