Script to pull pub date and author(s) from URL in front window of Safari?

An advantage of querySelectorAll is that it returns a nodeList of matches - where there are no matches the list is simply empty – and mapping a function over an empty array just returns another empty array – error handling doesn't need to arise.

I suppose it is a matter of preference. I prefer a positive indicator that the data was NOT found, vs an empty result which may make the user wonder if anything happened.

Another variant (probably more useful if you want to do some JS post-processing) might be sth like:

from:

(() => {
    'use strict';

    // show :: a -> String
    const show = x => JSON.stringify(x, null, 2);

    return show(
        Array.from(document.querySelectorAll(
            'meta[name*="author"], meta[name*="published"]'
        ))
        .map(x => ({
            key: x.name,
            value: x.content
        }))
    );
})();

Hey Chuck,

Just to add some more mud to the mix I thought I’d take the opportunity to fool with the JSON in the given WSJ page.

Parsed the JSON string out with JavaScript.

Parsed the JSON itself with AppleScriptObjC.

Run the script from the Applescript Editor to see it work.

-Chris

------------------------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2017/04/20 18:00
# dMod: 2017/04/20 18:30 
# Appl: Safari
# Task: Parse Wall Street Journal Page for JSON elements.
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @ASObjC, @Safari, @JavaScript, @JSON, @Parse, @WSJ
# URLs: https://www.wsj.com/articles/how-to-be-the-best-deputy-when-second-best-is-best-1492529374
------------------------------------------------------------------------------
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
------------------------------------------------------------------------------

set jsCMD to "
var docSrc = document.body.parentNode.outerHTML;
var regEx = /<script type=\"application\\/ld\\+json[^•]+?<\\/script>/ig;
var reMatchArray = docSrc.match(regEx);
reMatchArray[0];
"
set jsonStr to doJavaScriptInSafari(jsCMD)
set AppleScript's text item delimiters to linefeed
set jsonStr to (paragraphs 2 thru -2 of jsonStr) as text
set nsDict to its convertJSONToDictionary:jsonStr

set thePublisher to (nsDict's valueForKeyPath:"publisher.name")
set thePublisher to item 1 of ((current application's NSArray's arrayWithObject:thePublisher) as list)

set theHeadline to (nsDict's valueForKeyPath:"headline")
set theHeadline to item 1 of ((current application's NSArray's arrayWithObject:theHeadline) as list)

set theAuthor to (nsDict's valueForKeyPath:"author.name")
set theAuthor to item 1 of ((current application's NSArray's arrayWithObject:theAuthor) as list)

set datePublished to (nsDict's valueForKeyPath:"datePublished")
set datePublished to item 1 of ((current application's NSArray's arrayWithObject:datePublished) as list)

{thePublisher, theHeadline, theAuthor, datePublished}

------------------------------------------------------------------------------
--» HANDLERS
------------------------------------------------------------------------------
on convertJSONToDictionary:jsonString
   set aString to current application's NSString's stringWithString:jsonString
   set theData to aString's dataUsingEncoding:(current application's NSUTF8StringEncoding)
   set {theDict, theError} to current application's NSJSONSerialization's JSONObjectWithData:theData options:0 |error|:(reference)
   if theDict is missing value then error (theError's localizedDescription() as text) number -10000
   return theDict
end convertJSONToDictionary:
------------------------------------------------------------------------------
on doJavaScriptInSafari(jsCMD)
   try
      tell application "Safari" to do JavaScript jsCMD in front document
   on error e
      error "Error in handler doJavaScriptInSafari() of library NLb!" & return & return & e
   end try
end doJavaScriptInSafari
------------------------------------------------------------------------------

Chris, thanks!

For this WSJ article, Kroger Seeks to Cut Costs With Voluntary Retirement Plan - WSJ, https://www.wsj.com/articles/kroger-seeks-to-cut-costs-with-voluntary-retirement-plan-1481840238?tesla=y, here’s what I get with your script.

{“Wall Street Journal”, “Kroger Seeks to Cut Costs With Voluntary Retirement Plan”, “Annie Gasparro”, “2016-12-15T22:17:00.000Z”}

So, it works!

But, I couldn’t get it to run in script editor. I received this message in a dialogue box:

“The document “Untitled.scpt” could not be autosaved. C and Objective-C pointers cannot be saved in scripts. Compiling the script will reset property values and may resolve this issue.”

Compiling did NOT resolve the issue.

Also could NOT run it using “Execute AppleScript” in KM.

Got this result:

“/var/folders/t4/prgynlv57zl1qw5s9w14lfm40000gn/T/Keyboard-Maestro-Script-4B62C60B-8EFC-4313-B15F-78B551838455:2327:2368: execution error: The data couldn’t be read because it isn’t in the correct format. (-10000)”

That said, let me share - briefly - what my KM macro can do all cobbled together. I trigger the macro and get this, in this order, formatted like this in 4 seconds.

Annie Gasparro, “Kroger Seeks to Cut Costs With Voluntary Retirement Plan,” Wall Street Journal, 2016-12-15, accessed April 20, 2017, https://www.wsj.com/articles/kroger-seeks-to-cut-costs-with-voluntary-retirement-plan-1481840238?tesla=y.

By hand, I shorten the author’s first name to an initial with a period (A. Gasparro). By hand (still figuring how to take the ISO 8601date format into long format. For now, I use KM to clean up the date by deleting unneeded information), I turn the YYYY-MM-DD date into a long date, December 12, 2015. Final form is:

A. Gasparro, “Kroger Seeks to Cut Costs With Voluntary Retirement Plan,” Wall Street Journal, December 15, 2016, accessed April 20, 2017, https://www.wsj.com/articles/kroger-seeks-to-cut-costs-with-voluntary-retirement-plan-1481840238?tesla=y.

I’m on deadline, so it will be a few weeks before I can post my cobbled together KM Macro, but I’m surprised and gratified at how much interest this has generated. Way beyond my skill set to grab the date and the author from web pages, so thanks to you, JMichaelTX and ComplexPoint for digging into this.

On writing breaks, I’m searching the forum and compiling my to-dos, reading lists, etc. I started hacking hex code in Word Perfect on a CPM machine in graduate school in 1983, and have had the power user bug to do stuff like this since. Switched to mac 5 years ago (ah the wasted years!), and immediately found KM. By far, my most used, beneficial and loved tool.

Thanks!

Hey Chuck,

4 seconds runtime plus manual intervention does't really cut it in an automated process for this sort of thing.

I've got the citation working the way you want.

The only thing I'm having trouble with is that Zulu time is getting converted to local.

See if it works for you, and let me know if you want me to fix the time issue.

-Chris


WSJ ⇢ Create Article Reference v1.01.kmmacros (8.8 KB)

Chris, tons faster! I’m assuming that’s so b/c it gathers all of the data in one pass, as opposed to my cobbled together method of accessing the browser for 1 bit of the citation, pasting it in the editor, then back to the browser for more, then pasting, wash, rinse, repeat.

Very cool. I can follow apple script, but Objective-C, well, I just don’t know it. That said, I can see - I think - the logic of what you’re doing everything with your script. Elegant!

Date - I’m getting the correct date from WSJ articles, but you’ve coded the date to “y-MM-dd” format. Can you change it to MMMM d, yyyy format for “April 14, 2017,” instead of “2017-04-14?” I looked at your code to see if I could do it, but there’s too much that you’re doing that’s too far above my skill set.

Italics? Is it possible to format Wall Street Journal as Wall Street Journal in italics?

What I can contribute here?

Well, thanks to perusing Dr. Drang’s LeanCrew.com where he blogs extensively about KM and dates, here is a link to a book, Calendrical Calculations, by Dershowitz and Reingold, which is described on Amazon as follows:

A valuable resource for working programmers, as well as a fount of useful algorithmic tools for computer scientists, this new edition of the popular calendars book expands the treatment of the previous edition to new calendar variants: generic cyclical calendars and astronomical lunar calendars as well as the Korean, Vietnamese, Aztec, and Tibetan calendars. The authors frame the calendars of the world in a completely algorithmic form, allowing easy conversion among these calendars and the determination of secular and religious holidays. LISP code for all the algorithms are available on the Web.

Given the number of questions and detailed responses on this forum regarding dates, date formats and date calculations, this might prove to be a good resource for someone.

Thanks Chris!

Hey Chuck,

Sure. Re-download the macro (now v1.01).

I had the date set up that way to begin with, but then I misread your post and changed it back to 2017-04-14.

The v1.01 macro corrects this.

Yes.

Everybody has that problem to start. The learning curve is pretty fierce.

Yes.

This can be done with AppleScriptObjC, although I don't know how.

The easiest way is to take the text – reformat it as HTML – and then run it through the textutil command-line-tool to convert to RTF on the clipboard.

I took the approach I used with this macro to learn a little more about JSON handling in AppleScriptObjC – and because I figured it likely the WSJ would standardize the JSON citations in all their articles. (The latter remains to be seen but will hopefully prove to be true.)

-Chris

Hey Chuck,

Here’s an example of turning text to HTML to RTF:

------------------------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2017/04/25 19:50
# dMod: 2017/04/25 20:04
# Appl: ASObjC & the Shell
# Task: Convert Text to HTML – then to Convert to RTF – Place Result on Clipboard.
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @ASObjC, @Convert, @Text, @HTML, @RTF, @Italicize
------------------------------------------------------------------------------
use AppleScript version "2.4"
use framework "Foundation"
use scripting additions
------------------------------------------------------------------------------
set theHTML to text 2 thru -1 of "
<!DOCTYPE html>
<html lang=\"en\">
  <head>
    <meta charset=\"utf-8\" /> 
    <title>
      HTML-to-RTF
    </title>
  </head>
  <body>
    <font face=\"Menlo\" size=\"4\">
      --stub-- 
    </font>
  </body>
</html>
"
set theCitation to "Sue Shellenbarger, \"How to Be the Best Deputy: When Second Best Is Best\", Wall Street Journal, April 18, 2017, accessed April 25, 2017, https://www.wsj.com/articles/how-to-be-the-best-deputy-when-second-best-is-best-1492529374"

set theCitation to its cngStr:"(\", )" intoString:"$1<i>" inString:theCitation
set theCitation to its cngStr:"(<i>.+?)(?=,)" intoString:"$1</i>" inString:theCitation
set theHTML to its cngStr:"--stub--" intoString:theCitation inString:theHTML

set shCMD to "echo " & quoted form of theHTML & " | textutil -format html -convert rtf -inputencoding UTF-8 -stdin -stdout | pbcopy -Prefer rtf"
do shell script shCMD

------------------------------------------------------------------------------
--» HANDLERS
------------------------------------------------------------------------------
on cngStr:findString intoString:replaceString inString:dataString
   set anNSString to current application's NSString's stringWithString:dataString
   set dataString to (anNSString's stringByReplacingOccurrencesOfString:findString withString:replaceString ¬
      options:(current application's NSRegularExpressionSearch) range:{0, length of dataString}) as text
end cngStr:intoString:inString:
------------------------------------------------------------------------------

-Chris

Chris, thanks, worked great. And thank you for the AppleScript HTML to RTF example. Again, many thanks! I’m learning a ton.