Need Macro/JavaScript to Extract Contents of Code Block

Since I’m not yet on Yosemite, and don’t have a JS development setup yet, could some kind person (hint, hint @ComplexPoint :wink: ) help me out with what I think is a simple JS function to extract the contents of a code block on a web page?

By code block I mean like the one used in this forum, and many other web pages. Short code blocks aren’t a problem to copy, but long code blocks that scroll are more of a challenge.

Example:

    repeat with currentNotebook in EVNotebooks
        set currentNotebookName to (the name of currentNotebook)
        copy currentNotebookName to the end of listOfNotebooks
    end repeat

For a real example of a target page, see:
Evernote Note Title Exporter

Many, many, TIA.

Well, that depends on the discipline within which the page is written – less predictable than one might hope, alas.

This site happens to be rather standards-compliant, and simply wraps code in a <code> block. That means that if we click somewhere in the code, we should then be able to use an extend select macro like this:

Select whole code block on web page.kmmacros (20.8 KB)

But nothing guarantees that other sites will use <code> blocks to wrap what they present as code.

If you open the veritrope site which you link to above in Chrome, and ctrl-click somewhere in that syntax coloured block, choosing Inspect Element from the drop down menu, you will find that it is simply a lot of formatting markup, without a code block in sight.

This macro won't help in that kind of case, and you would find yourself having to write a different macro for every site that adopted its own particular approach …

Good luck !

Wow! Many thanks for the quick responce, Rob. As always, you are outstanding.

Thanks for the tip on using the "Inspect Element" tool. Very cool!
I found that I could change the HTML on the page just by unchecking the boxes in the Style column:

This allowed me to turn off the scrolling and turn on word-wrapping, making it much easier to select/copy.

The Veritrope site seems to use this for code blocks:

<div class="codecolorer-container applescript default" style="/* overflow:auto; *//* white-space:nowrap; *//* height:300px; */">

How hard would it be to use that to extract the code?
I'm wondering if it is reasonable to design a macro where the HTML tags denoting start of "code block" could be input/changed?

Sorry, I haven't looked at your macro/code yet. Should have done that first. Going to do that right now. :blush:

Rob, thanks again for your help.
I can make good use of what you've already provided.
Please don't put any more time on this.

I feel like an idiot. I completely missed this button/link on the Veritrope site, which opens the code in the AS editor: :dizzy_face:

Rob, I've added a wrapper around your JS code to open the Code Block in Apple Script Editor. Thanks again.

BRW Open Web Page Code Block in Apple Script Editor.kmmacros (27.6 KB)

Hey Rob, if you’re up for some more fun, I’ve been thinking about this, and doing a bit of research. Nothing urgent. What you already did is working well enough for most cases.

If I knew the web DOM better, I’d tackle this myself. But I’m still on a steep learning curve there.

My thought is this: Can we design a script that considers a number of possibilities, and then executes on the one that fits?

From my research, a lot of sites use the <pre> tag to start the code block. So maybe that is a common usage worth coding for.

The interesting thing is that when I use the “Inspect Elements” tool I can almost always spot by eye where the code block starts. If I can do it by eye, surely we (well, really you) can come up with a design to find the code.

Here’s the results of my research. I hope it helps:

EN forum:
https://discussion.evernote.com/topic/4046-importing-from-apple-mailapps-notes/?p=197286
<pre class="prettyprint prettyprinted">

Veritrope:
http://veritrope.com/code/evernote-list-of-note-titles-exporter/
<div class="codecolorer-container applescript default" style="/* overflow:auto; *//* white-space:nowrap; *//* height:300px; */">

MacScripter:
http://macscripter.net/viewtopic.php?id=44101
<blockquote><div class="incqbox">

StackOverflow.com
http://stackoverflow.com/questions/31759322/applescript-new-file-shortest-way
<pre><code>

http://macosxautomation.com/
http://macosxautomation.com/applescript/linktrigger/index.html
<pre>                           for some
<p class="code”>          for others


github.com
https://github.com/RobTrew/tree-tools/blob/master/FoldingText%20scripts/Expand%20collapse/ExpandFT-ToLevelN-008.applescript
<table class="highlight tab-size js-file-line-container" data-tab-size="8">

Well, no need to learn any more about the DOM, all you need to get a rough sense of is XPATH, which is a very rewarding (and actually rather small) search language.

A learning exercise ?

In the Chrome Inspect Element view of the HTML source, you can experiment directly:

  • ⌘F
  • type a very simple XPATH like //a (all links, at any level of nesting in the HTML)
  • see the matches instantly hilighted
  • ( refine and repeat )

Thanks for all your help, Rob. I clearly need to go do my homework so I can better work these kind of problems. I’ve bought a couple of KIndle JS books, so I need to dig into one of them soon.

Hey JM,

Let’s speed that up a bit.

This is only for Safari — other browser support is left as an exercise for the reader.

** I have verified that the script will copy and compile properly, and I even tried the script on itself.

-Chris


set _date to short date string of (current date)

tell application "Safari"
	tell front document
		
		set selectedText to do JavaScript "
(function () {
  var oSeln = window.getSelection(),
    nodeTable = document.evaluate(
      './ancestor-or-self::code',
      oSeln.anchorNode,
      null, 0, 0
    ).iterateNext(),
    rngDoc = nodeTable ?
      document.createRange() : null;

  if (nodeTable) {
    oSeln.removeAllRanges();
    rngDoc.selectNode(nodeTable);
    oSeln.addRange(rngDoc);
  }
})();

window.getSelection()+'';

"
		set safariTitle to name
		set safariURL to URL
		
	end tell
	
end tell


set scriptHeader to "(*
====================================================
	TITLE
====================================================

DATE: " & _date & "
AUTH: 
REFR: " & safariTitle & "
    : " & safariURL & "
*)
"

tell application "Script Editor"
	activate
	set newScript to make new document with properties {contents:scriptHeader & return & return & selectedText}
	tell newScript to check syntax
end tell

Hi Chris.

Thanks for the code.

I’m not quite sure what your intent is here. Is your code supposed to work on pages that Rob’s JS doesn’t work on?

The macro I posted is working fine on the sites that Rob’s JS is able to expand the selection. It opens the code up in AppleScript Editor.

Sorry I’m so confused. :dizzy_face:

While we are in mid-August optimisation and tweaking territory, glancing again at my JS code I notice a few things that could be adjusted or tightened up.

  1. If we're only looking for one match, no need to heat the CPU in boreal summer by searching for more. Instead of leaving it to the default XPathResult.ANY_TYPE and using .iterateNext() to collect just the first (if any) match, we can look up the Result Type constants, specify XPathResult.FIRST_ORDERED_NODE_TYPE, and collect any match with .singleNodeValue().
  2. XPATH expressions are chains of alternating 'Steps' and 'Filters'. We can broaden out the filter here with as many 'or' operators as we need. The simplest example might be: ./ancestor-or-self::*[self::code or self::pre]
  3. We can include text collection inside the function with return oSeln.toString().

(and then perhaps, drop it into an Execute JS in Safari (and/or Chrome) action, directing the output to the clipboard).

Select and copy code.kmmacros (19.2 KB)

(function () {
  var oSeln = window.getSelection(),
    nodeCode = document.evaluate(
      './ancestor-or-self::*[self::code or self::pre]',
      oSeln.anchorNode,
      null, XPathResult.FIRST_ORDERED_NODE_TYPE, 0
    ).singleNodeValue,
    rngDoc = nodeCode ?
      document.createRange() : null;

  if (nodeCode) {
    oSeln.removeAllRanges();
    rngDoc.selectNode(nodeCode);
    oSeln.addRange(rngDoc);
  }
  
  return oSeln.toString();
})();

Hmmm, I don’t know about you, but I’m avoiding the hot, humid dogs days of August by hunkering down in my cold basement with lots of cold beer on ice. I’m fully optimized on being cool :sunglasses:

But thanks for the update. It looks good as it adds more coverage.
I knew you could do it. :wink:

Same code seems to work in Chrome as well.
Or am I missing something?

I just tried it on the Evernote forum, and it worked well. :+1:
I’ll test it on my other candidate sites.

Yes – that kind of code should always work unchanged in either browser.

OK, this is working for:

Fails for:

Sorry I can’t be more technical help right now – but I’m glad to be the gopher, researcher, & tester. :smile:

If you need anything other than coding JS, let me know.

OK, I have updated my macro, which opens the code in AS Editor, with your latest JS:

BRW Open Web Page Code Block in Apple Script Editor.kmmacros (27.6 KB)

The cases really look a bit too diverse for a single XPATH

MacScripter

//blockquote/div/p

Veritrope:

//span[@class='coMULTI']

MacOSXautomation.com

//div[@id="content"]/*[self::pre or (self::p and @class='code')]

Github is a bit special. You might be better off clicking the RAW button:

(The raw page uses a <pre>)

On the pretty pages extend select would capture all the line numbers, so if you really wanted to copy from there you would have to write a slightly different, row-by-row (code cell but not number cell) function. Perhaps something roughly like:

(function () {
  var xrLines = document.evaluate(
        '//td[contains(@class, "blob-code")]',
        document,
        null, 0, 0
    ),
  oLine = xrLines.iterateNext(),
  lstLines = [];

    while (oLine) {
    lstLines.push(oLine.textContent);
    oLine = xrLines.iterateNext();
  }
  
  return lstLines.join('\n');
})();

So, is it possible to build a SWITCH statement that includes these cases?

Yes, I think you should be able to switch/branch on the .URL of the front tab of the active browser …

That’s too specific.

I’d rather switch based on what’s available on the page in question.
Is this possible?

Not sure I’m following you. As far as I can see the XPATH patterns vary by website …

You can know the paths on which particular site hold their code, but I don’t think there’s any way of travelling in the opposite direction – looking at the myriad branches and pathways of an HTML tree, and intuiting that some locations are holding code.

But I’m probably misunderstanding you : - )