Need Macro/JavaScript to Extract Contents of Code Block

Hey Rob, if you're up for some more fun, I've been thinking about this, and doing a bit of research. Nothing urgent. What you already did is working well enough for most cases.

If I knew the web DOM better, I'd tackle this myself. But I'm still on a steep learning curve there.

My thought is this: Can we design a script that considers a number of possibilities, and then executes on the one that fits?

From my research, a lot of sites use the <pre> tag to start the code block. So maybe that is a common usage worth coding for.

The interesting thing is that when I use the "Inspect Elements" tool I can almost always spot by eye where the code block starts. If I can do it by eye, surely we (well, really you) can come up with a design to find the code.

Here's the results of my research. I hope it helps:

EN forum:
https://discussion.evernote.com/topic/4046-importing-from-apple-mailapps-notes/?p=197286
<pre class="prettyprint prettyprinted">

Veritrope:
http://veritrope.com/code/evernote-list-of-note-titles-exporter/
<div class="codecolorer-container applescript default" style="/* overflow:auto; *//* white-space:nowrap; *//* height:300px; */">

MacScripter:
http://macscripter.net/viewtopic.php?id=44101
<blockquote><div class="incqbox">

StackOverflow.com
http://stackoverflow.com/questions/31759322/applescript-new-file-shortest-way
<pre><code>

http://macosxautomation.com/
http://macosxautomation.com/applescript/linktrigger/index.html
<pre>                           for some
<p class="code”>          for others


github.com
https://github.com/RobTrew/tree-tools/blob/master/FoldingText%20scripts/Expand%20collapse/ExpandFT-ToLevelN-008.applescript
<table class="highlight tab-size js-file-line-container" data-tab-size="8">

Well, no need to learn any more about the DOM, all you need to get a rough sense of is XPATH, which is a very rewarding (and actually rather small) search language.

A learning exercise ?

In the Chrome Inspect Element view of the HTML source, you can experiment directly:

  • ⌘F
  • type a very simple XPATH like //a (all links, at any level of nesting in the HTML)
  • see the matches instantly hilighted
  • ( refine and repeat )

Thanks for all your help, Rob. I clearly need to go do my homework so I can better work these kind of problems. I've bought a couple of KIndle JS books, so I need to dig into one of them soon.

Hey JM,

Let's speed that up a bit.

This is only for Safari — other browser support is left as an exercise for the reader.

** I have verified that the script will copy and compile properly, and I even tried the script on itself.

-Chris


set _date to short date string of (current date)

tell application "Safari"
	tell front document
		
		set selectedText to do JavaScript "
(function () {
  var oSeln = window.getSelection(),
    nodeTable = document.evaluate(
      './ancestor-or-self::code',
      oSeln.anchorNode,
      null, 0, 0
    ).iterateNext(),
    rngDoc = nodeTable ?
      document.createRange() : null;

  if (nodeTable) {
    oSeln.removeAllRanges();
    rngDoc.selectNode(nodeTable);
    oSeln.addRange(rngDoc);
  }
})();

window.getSelection()+'';

"
		set safariTitle to name
		set safariURL to URL
		
	end tell
	
end tell


set scriptHeader to "(*
====================================================
	TITLE
====================================================

DATE: " & _date & "
AUTH: 
REFR: " & safariTitle & "
    : " & safariURL & "
*)
"

tell application "Script Editor"
	activate
	set newScript to make new document with properties {contents:scriptHeader & return & return & selectedText}
	tell newScript to check syntax
end tell

Hi Chris.

Thanks for the code.

I'm not quite sure what your intent is here. Is your code supposed to work on pages that Rob's JS doesn't work on?

The macro I posted is working fine on the sites that Rob's JS is able to expand the selection. It opens the code up in AppleScript Editor.

Sorry I'm so confused. :dizzy_face:

While we are in mid-August optimisation and tweaking territory, glancing again at my JS code I notice a few things that could be adjusted or tightened up.

  1. If we're only looking for one match, no need to heat the CPU in boreal summer by searching for more. Instead of leaving it to the default XPathResult.ANY_TYPE and using .iterateNext() to collect just the first (if any) match, we can look up the Result Type constants, specify XPathResult.FIRST_ORDERED_NODE_TYPE, and collect any match with .singleNodeValue().
  2. XPATH expressions are chains of alternating 'Steps' and 'Filters'. We can broaden out the filter here with as many 'or' operators as we need. The simplest example might be: ./ancestor-or-self::*[self::code or self::pre]
  3. We can include text collection inside the function with return oSeln.toString().

(and then perhaps, drop it into an Execute JS in Safari (and/or Chrome) action, directing the output to the clipboard).

Select and copy code.kmmacros (19.2 KB)

(function () {
  var oSeln = window.getSelection(),
    nodeCode = document.evaluate(
      './ancestor-or-self::*[self::code or self::pre]',
      oSeln.anchorNode,
      null, XPathResult.FIRST_ORDERED_NODE_TYPE, 0
    ).singleNodeValue,
    rngDoc = nodeCode ?
      document.createRange() : null;

  if (nodeCode) {
    oSeln.removeAllRanges();
    rngDoc.selectNode(nodeCode);
    oSeln.addRange(rngDoc);
  }
  
  return oSeln.toString();
})();

Hmmm, I don't know about you, but I'm avoiding the hot, humid dogs days of August by hunkering down in my cold basement with lots of cold beer on ice. I'm fully optimized on being cool :sunglasses:

But thanks for the update. It looks good as it adds more coverage.
I knew you could do it. :wink:

Same code seems to work in Chrome as well.
Or am I missing something?

I just tried it on the Evernote forum, and it worked well. :+1:
I'll test it on my other candidate sites.

Yes – that kind of code should always work unchanged in either browser.

OK, this is working for:

Fails for:

Sorry I can't be more technical help right now -- but I'm glad to be the gopher, researcher, & tester. :smile:

If you need anything other than coding JS, let me know.

OK, I have updated my macro, which opens the code in AS Editor, with your latest JS:

BRW Open Web Page Code Block in Apple Script Editor.kmmacros (27.6 KB)

The cases really look a bit too diverse for a single XPATH

MacScripter

//blockquote/div/p

Veritrope:

//span[@class='coMULTI']
//div[@id="content"]/*[self::pre or (self::p and @class='code')]

Github is a bit special. You might be better off clicking the RAW button:

(The raw page uses a <pre>)

On the pretty pages extend select would capture all the line numbers, so if you really wanted to copy from there you would have to write a slightly different, row-by-row (code cell but not number cell) function. Perhaps something roughly like:

(function () {
  var xrLines = document.evaluate(
        '//td[contains(@class, "blob-code")]',
        document,
        null, 0, 0
    ),
  oLine = xrLines.iterateNext(),
  lstLines = [];

    while (oLine) {
    lstLines.push(oLine.textContent);
    oLine = xrLines.iterateNext();
  }
  
  return lstLines.join('\n');
})();

So, is it possible to build a SWITCH statement that includes these cases?

Yes, I think you should be able to switch/branch on the .URL of the front tab of the active browser …

That’s too specific.

I’d rather switch based on what’s available on the page in question.
Is this possible?

Not sure I’m following you. As far as I can see the XPATH patterns vary by website …

You can know the paths on which particular site hold their code, but I don’t think there’s any way of travelling in the opposite direction – looking at the myriad branches and pathways of an HTML tree, and intuiting that some locations are holding code.

But I’m probably misunderstanding you : - )

What I’d like to do, if possible, is develop a number of switch cases that capture the majority of popular web sites that offer code snippets.

It would be great that as we (or the user of this macro) identify more cases of popular sites, we can easily add to the JS cases.

Does this make sense?

For the cases where extend-select works, and all that needs to change is the XPATH you could use the pattern of this macro:

http://forum.keyboardmaestro.com/uploads/default/original/2X/5/516deefedadf787efaf3c2436fcabc18ecdd1fd5.kmmacros

to place the XPath which matches a particular site or set of sites in a KM variable, and execute a standard function in the relevant browser, with only the XPath changing.

I think you'd have to do a lookup on the url of the site you were visiting to see whether it kept code on a known XPATH, or was a member of a groups of sites whose code section were on a particular path.

Thanks. I think we're getting closer. :smile:

But I would like to avoid tracking on an URL basis, unless there is no other choice. IOW, I'd like to examine the structure of the current page, and determine if it fits one of a number of "common" setups for displaying code snippets.

I'm sure there will be some sites that will never fit, but hopefully those are in the minority of sites.

So, what do you think? Can we do something like this?

Alas no : - )

Page structure analysis and automatic identification of code would be a very big project.

Another approach, assuming that you have selected (or are hovering over) a node in the code that interests you, would simply be to cycle through a list of XPATHs, reaping a harvest if a hit is found, and reporting [perplexity|novelty] if no harvest is forthcoming.