Macro Request: Get table from web page and save as CSV file

Hey JM,

I can't get that to work with those instructions. Can you?


Yeah but it's a bit flaky.

For some dumb reason, Excel will Not read any type of text file for the web query except the old MS-DOS encoding. So you have to use MS Word (I couldn't get TextEdit or TextWrangler to save properly) to save the URL as a MS-DOS text file.

Then it will open in Excel.

Where did it fail with you?

It was a bit slow, but it finally pulled the table from

The table at the site linked by the OP pastes really nicely – no drama – no Microsoft weirdness – to Numbers (3.5), and can be exported at once to CSV. Even the PDF links work in the Numbers sheet.

Hey JM,

Ah, I missed the MS-DOS text step — so it completely failed to do the query.

Hey Ed,

Oh, nice! I've never liked Numbers, but I do appreciate when things just work.

I did it the hard way:

I ran the source of the page through html2text from Macports to get the table nice and pretty, and then I scarfed the links with AppleScript/JavaScript.

curl -L --user-agent 'Opera/9.70 (Linux ppc64 ; U; en) Presto/2.2.1' \
--url '' \
| html2text -style pretty
# Extract Links from front Safari Window
set regexStr to "display=pdf"
set linkList to safari_links(regexStr, "*", "href")

on safari_links(regexStr, tagName, tagType)
  set js to "function in_array (array, item) {
  for (var i=0; i < array.length; i++) {
    if ( array[i] == item ) {
      return true;}}
  return false;}
  var a_tags = document.getElementsByTagName('" & tagName & "');
  var href_array = new Array();
  var reg = new RegExp(/" & regexStr & "/i);
  for (var i=0; i < a_tags.length; i++) {
    var href = a_tags[i]." & tagType & ";
    if ( reg.test(href)) {
      if ( !in_array(href_array, href)) {
    tell application "Safari" to set linkList to do JavaScript js in document 1
    if linkList = missing value then set linkList to {}
  on error
    set linkList to {}
  end try
  return linkList
end safari_links

All of this is fairly fast, but there's cleanup and assembly work to do.

I like the Excel and Numbers methods better in this case.


1 Like

The GUI route to getting a copyable selection can take a little bit of slightly viscous dragging and selection boundary fiddling on a large webpage table.

Doing it with this table prompted me to sketch a simple 'extend selection to whole table' macro for Safari and Chrome:

1 Like

Thanks to everyone who contributed to this thread!
The route that CP takes is probably the way I’ll go.
Curiously if I cut and paste that into TextEdit (in a rich textbook) it keeps the links.
CP, I wondered whether there was a way to grab the table in javascript from the table tag. The problem with that would be if there are more than one table.
I can automate the process by locating the window in a certain position and then selecting a cell based on that. Is there a cleaner way to do that?
Extracting the pdf links to put then in a database maybe more difficult but thanks very much for the kickstart.

If you, or your office, are happy to do a little JavaScript writing, you could adapt the ‘select whole table’ script:

  1. Use the same process to get a reference to the whole of a selected table, and then
  2. walk through the TR rows of the table, harvesting either the .textContent, or the .innerHTML, of each TD cell, or the .href and .text of each enclosed <A> link element.

Hey Rob,

Not so curious. This feature has been around a long time. TextEdit and other Cocoa RTF-aware editors like Jedit X do a very respectable job of converting data copied and pasted from Web Browsers

I'm sure there's a way to find and select a specific cell with JavaScript, but there's a very simple way to manage for the moment:

A) Set the FIND PASTEBOARD with Keyboard Maestro and then type Cmd-G to find next.
      • In the case of your trading table a unique item to find is "Headline".

B) Activate CP's select-table macro.

As for getting the links to the PDFs that's really easy to do just by parsing the HTML. It's cleaner to do with JavaScript, but you have to know how. :smile:

It would take me about 15-20 minutes to write an AppleScript to parse all the elements out of that page, but I would use the Satimage.osax AppleScript Extension to provide the regular expression support.


Thanks Chris, Awesome!
This gives me lots to think about. For the moment I’m pottering through using KM macros and mixing them with excel macros step by step. I would say it was going well except for Excel’s unexpected behaviour especially with the clipboard It’s a PITA. If I select a cell and paste, it pastes one something. If I place the cursor in the cell and paste it pastes something else.
ATM I’m using Excel for a UDF on extracting those links and for arranging the file name in the format ASXAnns150807 and then saving as csv. Each of those steps would be easy if it weren’t for the peculiarities of Excel . But I’m getting there.
My problem is that because I don’t do it everyday I have to refamiliarise myself with each of the coding. KM, Excel macros, Javascript, Applescript and Filemaker when I use that. But I’m loving the journey.
When I get this done I’ll start again and use Javascript and Applescript and see if I can get rid of Excel in the process.
Right now I’m going to take a break from it and look at Smile. Thanks to you! Cheers and thanks Rob

If you are thinking of learning more about one of these, and you often need to manipulate data from a web page, then JavaScript is probably a good choice.


Thanks for that JM. I’m waiting on Javascript - the Good Parts to come from the Book Depository. I ordered it a few days ago after reading about it on another thread on here.
I’m keen to learn (Ha, I’m in my 60s) but life or work will probably get in the way. And next time I need it I’ll need to famil myself with it all over again.
I will try to keep at it every week because I enjoy it.
We’ll see.

Rob, consider using Kindle. It is almost always cheaper than the printed edition, and it makes the book instantly available on every device you own.

I'm just getting started reading this, but I am very impressed with the unique approach of this book:
A Smarter Way to Learn JavaScript -- by Mark Myers

Hey Rob,

Here's why it's generally better to do things as directly as possible.

I used a fairly brute-force approach here, because I find the table-rows I want and then systemically strip off the junk I don't want.

If I were doing this for myself I would eventually streamline the script, but this way I was able to do the job in about 20 minutes or less.

You open the page in Safari and run the script.

A fraction of a second later you have a tab-delimited, date-stamped report on your Desktop ready to be imported or pasted into Excel (and that could be automated too).

Note that you don't need Smile — just the Satimage.osax — not that I'm discouraging anyone from using Smile — it's a very powerful if difficult to master automation tool.

The Satimage.osax installer installs 1 osax file:


It's safe. I've been using it since 2003.


Edit: 2015/08/09 15:03 — I am highly P.0.'d to find that the forum is rendering text in a code-block and mangling it. Therefore I am posting a text script (.applescript) file for download. (1.5 KB)

-- Create Tab-Delimted Report on Desktop from:
set _text to safariOuterHTML()
set foundList to fnd("(?m)^<tr class=\"(?:altrow)?\">[\\n\\r]<td>(?-i)[A-Z]{3}(?i)</td>.+?</tr>", _text, true, true) of me
set foundList to cng("\\r", "\\n", foundList) of me
set foundList to cng("<tr class=\"(?:altrow)?\">\\n", "", foundList) of me
set foundList to cng("</?td>", "", foundList) of me
set foundList to cng("</tr>", "", foundList) of me
set foundList to cng("<td class=\"pricesens\">.+title=[[:punct:]](price sensitive)[[:punct:]]>", "\\1", foundList) of me
set foundList to cng("<a onclick.+?href=[[:punct:]]", "", foundList) of me
set foundList to cng("[[:punct:]]>PDF</a>", "", foundList) of me
set foundList to cng("(/asx/)", "\\1", foundList) of me
set foundList to cng("&amp;", "&", foundList) of me
set foundList to cng("^[[:blank:]]*$\\n", "", foundList) of me
set foundList to cng("\\A\\s+|\\s+\\Z", "", foundList) of me
set foundList to cng("\\n", "\\t", foundList) of me
set foundList to join foundList using return
set reportName to "ASX Announcements " & (strftime (current date) into "%Y.%m.%d · %H.%M.%S")
writetext foundList to ("~/Desktop/" & reportName) & ".txt"

on cng(_find, _replace, _data)
  change _find into _replace in _data with regexp without case sensitive
end cng
on fnd(_find, _data, _all, strRslt)
	find text _find in _data all occurrences _all string result strRslt with regexp without case sensitive
  on error
	return false
  end try
end fnd
on safariOuterHTML()
  tell application "Safari"
	tell front document to return do JavaScript "window.document.documentElement.outerHTML"
  end tell
end safariOuterHTML
1 Like

Thanks Chris.
I’m having trouble running the script. Does it matter which editor I am are using? In both the Apple Script editor and Smile I get syntax errors
Found unknown token
Syntax error
expected “,” or “)” but found identifier

…and so on.

Feeling quite dumb that I don’t know this.
Thanks again for your help.

Hey Rob,

Don't feel dumb. Apparently the KM Forum software (Discourse) is rendering text in a code-block — which should never happen. I will be posting a bug report to them shortly.

That's what's causing this unknown token error.

To compensate I have posted a downloadable text-script in the original post right above the script.


Yes, a curious bug, it replaces slosh-slosh-1 with slosh-backquote. Almost certainly it is doing a regex substitution for slosh-1.

I edited your post to use applescript Fenced code blocks to see if that would work, but it suffers the same problem. And I had to change the comments character from # to – or it does not highlight the comments correctly.

I left the change, but it didn’t help, although the syntax coloring is nice.

Hmm, the applescript Fenced code blocks do support # comments, just not that particular # comment.

Reported to Discourse meta.

Peter, exactly how did you do that?
I have tried this:

but I don't see any difference with the "applescript".

I read the Fenced code ref, but it didn't help. :frowning:

--this uses the "applescript" keyword
set strTest to the clipboard
--this does NOT use the "applescript" keyword
set strTest to the clipboard

Here are three examples, with info strings: xml, ruby, applescript. They are all rendered slightly differently.

set strTest to the clipboard
test( "hello" );
set strTest to the clipboard
test( "hello" );
set strTest to the clipboard
test( "hello" );