Macro Request: Get table from web page and save as CSV file

The GUI route to getting a copyable selection can take a little bit of slightly viscous dragging and selection boundary fiddling on a large webpage table.

Doing it with this table prompted me to sketch a simple 'extend selection to whole table' macro for Safari and Chrome:

1 Like

Thanks to everyone who contributed to this thread!
The route that CP takes is probably the way I’ll go.
Curiously if I cut and paste that into TextEdit (in a rich textbook) it keeps the links.
CP, I wondered whether there was a way to grab the table in javascript from the table tag. The problem with that would be if there are more than one table.
I can automate the process by locating the window in a certain position and then selecting a cell based on that. Is there a cleaner way to do that?
Extracting the pdf links to put then in a database maybe more difficult but thanks very much for the kickstart.
Cheers
Rob

If you, or your office, are happy to do a little JavaScript writing, you could adapt the ‘select whole table’ script:

  1. Use the same process to get a reference to the whole of a selected table, and then
  2. walk through the TR rows of the table, harvesting either the .textContent, or the .innerHTML, of each TD cell, or the .href and .text of each enclosed <A> link element.

Hey Rob,

Not so curious. This feature has been around a long time. TextEdit and other Cocoa RTF-aware editors like Jedit X do a very respectable job of converting data copied and pasted from Web Browsers

I'm sure there's a way to find and select a specific cell with JavaScript, but there's a very simple way to manage for the moment:

A) Set the FIND PASTEBOARD with Keyboard Maestro and then type Cmd-G to find next.
      • In the case of your trading table a unique item to find is "Headline".

B) Activate CP's select-table macro.

As for getting the links to the PDFs that's really easy to do just by parsing the HTML. It's cleaner to do with JavaScript, but you have to know how. :smile:

It would take me about 15-20 minutes to write an AppleScript to parse all the elements out of that page, but I would use the Satimage.osax AppleScript Extension to provide the regular expression support.

-Chris

Thanks Chris, Awesome!
This gives me lots to think about. For the moment I’m pottering through using KM macros and mixing them with excel macros step by step. I would say it was going well except for Excel’s unexpected behaviour especially with the clipboard It’s a PITA. If I select a cell and paste, it pastes one something. If I place the cursor in the cell and paste it pastes something else.
ATM I’m using Excel for a UDF on extracting those links and for arranging the file name in the format ASXAnns150807 and then saving as csv. Each of those steps would be easy if it weren’t for the peculiarities of Excel . But I’m getting there.
My problem is that because I don’t do it everyday I have to refamiliarise myself with each of the coding. KM, Excel macros, Javascript, Applescript and Filemaker when I use that. But I’m loving the journey.
When I get this done I’ll start again and use Javascript and Applescript and see if I can get rid of Excel in the process.
Right now I’m going to take a break from it and look at Smile. Thanks to you! Cheers and thanks Rob

If you are thinking of learning more about one of these, and you often need to manipulate data from a web page, then JavaScript is probably a good choice.

See

Thanks for that JM. I’m waiting on Javascript - the Good Parts to come from the Book Depository. I ordered it a few days ago after reading about it on another thread on here.
I’m keen to learn (Ha, I’m in my 60s) but life or work will probably get in the way. And next time I need it I’ll need to famil myself with it all over again.
I will try to keep at it every week because I enjoy it.
We’ll see.
Cheers
Rob

Rob, consider using Kindle. It is almost always cheaper than the printed edition, and it makes the book instantly available on every device you own.

I'm just getting started reading this, but I am very impressed with the unique approach of this book:
A Smarter Way to Learn JavaScript -- by Mark Myers

Hey Rob,

Here's why it's generally better to do things as directly as possible.

I used a fairly brute-force approach here, because I find the table-rows I want and then systemically strip off the junk I don't want.

If I were doing this for myself I would eventually streamline the script, but this way I was able to do the job in about 20 minutes or less.

You open the page in Safari and run the script.

A fraction of a second later you have a tab-delimited, date-stamped report on your Desktop ready to be imported or pasted into Excel (and that could be automated too).

Note that you don't need Smile — just the Satimage.osax — not that I'm discouraging anyone from using Smile — it's a very powerful if difficult to master automation tool.

The Satimage.osax installer installs 1 osax file:

     /Library/ScriptingAdditions/Satimage.osax

It's safe. I've been using it since 2003.

-Chris


Edit: 2015/08/09 15:03 — I am highly P.0.'d to find that the forum is rendering text in a code-block and mangling it. Therefore I am posting a text script (.applescript) file for download.


Tab-Delimted-Report.applescript.zip (1.5 KB)


------------------------------------------------------------
-- Create Tab-Delimted Report on Desktop from:
-- http://www.asx.com.au/asx/statistics/prevBusDayAnns.do
-- REQUIRES SATIMAGE.OSAX
-- http://tinyurl.com/dc3soh
-- USE THE DOWNLOADABLE FILE ABOVE AND NOT THIS TEXT!!!!
------------------------------------------------------------
set _text to safariOuterHTML()
set foundList to fnd("(?m)^<tr class=\"(?:altrow)?\">[\\n\\r]<td>(?-i)[A-Z]{3}(?i)</td>.+?</tr>", _text, true, true) of me
set foundList to cng("\\r", "\\n", foundList) of me
set foundList to cng("<tr class=\"(?:altrow)?\">\\n", "", foundList) of me
set foundList to cng("</?td>", "", foundList) of me
set foundList to cng("</tr>", "", foundList) of me
set foundList to cng("<td class=\"pricesens\">.+title=[[:punct:]](price sensitive)[[:punct:]]>", "\\1", foundList) of me
set foundList to cng("<a onclick.+?href=[[:punct:]]", "", foundList) of me
set foundList to cng("[[:punct:]]>PDF</a>", "", foundList) of me
set foundList to cng("(/asx/)", "http://www.asx.com.au\\1", foundList) of me
set foundList to cng("&amp;", "&", foundList) of me
set foundList to cng("^[[:blank:]]*$\\n", "", foundList) of me
set foundList to cng("\\A\\s+|\\s+\\Z", "", foundList) of me
set foundList to cng("\\n", "\\t", foundList) of me
set foundList to join foundList using return
set reportName to "ASX Announcements " & (strftime (current date) into "%Y.%m.%d · %H.%M.%S")
writetext foundList to ("~/Desktop/" & reportName) & ".txt"

------------------------------------------------------------
--» HANDLERS
------------------------------------------------------------
on cng(_find, _replace, _data)
  change _find into _replace in _data with regexp without case sensitive
end cng
------------------------------------------------------------
on fnd(_find, _data, _all, strRslt)
  try
	find text _find in _data all occurrences _all string result strRslt with regexp without case sensitive
  on error
	return false
  end try
end fnd
------------------------------------------------------------
on safariOuterHTML()
  tell application "Safari"
	tell front document to return do JavaScript "window.document.documentElement.outerHTML"
  end tell
end safariOuterHTML
------------------------------------------------------------
1 Like

Thanks Chris.
I’m having trouble running the script. Does it matter which editor I am are using? In both the Apple Script editor and Smile I get syntax errors
"`"
Found unknown token
onclick
Syntax error
expected “,” or “)” but found identifier

…and so on.

Feeling quite dumb that I don’t know this.
Thanks again for your help.

Hey Rob,

Don't feel dumb. Apparently the KM Forum software (Discourse) is rendering text in a code-block — which should never happen. I will be posting a bug report to them shortly.

That's what's causing this unknown token error.

To compensate I have posted a downloadable text-script in the original post right above the script.

-Chris

Yes, a curious bug, it replaces slosh-slosh-1 with slosh-backquote. Almost certainly it is doing a regex substitution for slosh-1.

I edited your post to use applescript Fenced code blocks to see if that would work, but it suffers the same problem. And I had to change the comments character from # to – or it does not highlight the comments correctly.

I left the change, but it didn’t help, although the syntax coloring is nice.

Hmm, the applescript Fenced code blocks do support # comments, just not that particular # comment.

Reported to Discourse meta.

Peter, exactly how did you do that?
I have tried this:

but I don't see any difference with the "applescript".

I read the Fenced code ref, but it didn't help. :frowning:

--this uses the "applescript" keyword
set strTest to the clipboard
--this does NOT use the "applescript" keyword
set strTest to the clipboard

Here are three examples, with info strings: xml, ruby, applescript. They are all rendered slightly differently.

set strTest to the clipboard
test( "hello" );
set strTest to the clipboard
test( "hello" );
set strTest to the clipboard
test( "hello" );

OK, so apparently applescript code highlighting was not turned on by default.

I’ve turned it on and restarted the server, and now it highlights @ccstone’s original comments (# line comments) correctly.

It does not fix the rendering bug though, that is a separate issue.

There are a bunch of languages turned on by default, but the only two I think we commonly use on here are applescript and javascript, both of which are now on (javascript was on originally).

1 Like

Thanks! It's nice to know that I was neither crazy nor blind! :sunglasses:
Well, at least not totally so. :wink:

Thanks for everyone’s help. And, now you have the hiccups sorted out I’m feeling less dumb.

I realize this is very old thread but If anyone is still interested and you happen to have an old PC laying around or can access one there is a very simple way to get tables from a web site with MS excel by using it's get external data/web query main menu option. I have an old pc running windows xp and excel 2002 that I use to create web query's and then I export the workbooks to my MacBook running excel for Mac 2011. The query's will run just fine on the Mac even though excel for Mac will not create or edit them. I use a VB macro to refresh all of the query's on a sheet at once. Now if I just need to figure out how to create a KM macro that would run the refresh macro at a certain time each day.

hey @edischy good idea!
I got Windows Excel 2007 running inside Crossover, but have yet to try the extra capabilities such as the web query.

For the refresh, there's the Time of Day trigger: