Extracting HI-Res JPEGS from a website given a URL

Whew! This was difficult, but I did it. You may need to customize it for yourself however. I use it to download high resolution jpeg images from a given URL. I've only tested it on the weekly photos from The Atlantic magazine (I have a subscription in case you're wondering) and use it as desktop wallpaper. It was a lot harder than I realized when I started the project. It uses wget in a shell script to download the images, then renames and numbers the files. Here it is. It could probably be improved, but it works for me as is.

This is an updated working version (3/20/2023).
Added support to eliminate jpegs with <= 600 horizontal pixels.
Added support to eliminate Dropbox thumbnails.
This should work on simple sites that only have jpeg repositories, but beware, every site is different.

Extract JPEGs from URL.kmmacros (26.3 KB)

1 Like

Here is a working version. Still some rough edges, however:

use framework "Foundation"
use scripting additions

--IRN - 2023-03-10 (with help from the LateNightSW website)
--This script extracts the large jpegs at a URL and places the links to them in the clipboard 

tell application "Keyboard Maestro Engine"
	set thePage to value of variable "thePage"
end tell

set myList to {}
set i to 0

set the clipboard to "#!/bin/bash" & linefeed
set the clipboard to (get (the clipboard) & "## run this from the directory in which the images will be stored" & linefeed & "##" & linefeed)

set myList to its searchFor:"jpg" inURL:thePage

repeat with anItem in myList
	set i to i + 1 -- to enumerate the image names
	-- add the URL to the list
	set the clipboard to (get (the clipboard) & "wget --output-document=original-" & (i as text) & ".jpg  " & anItem & linefeed)
end repeat

--display dialog (i as text) & " images to be downloaded, script is in the clipboard"
set fileName to "MyScript.sh"
try
	tell application "Finder" to make new folder at desktop with properties {name:"New JPEGs"}
end try

set thePath to POSIX path of (path to desktop)
set myFile to thePath & "New JPEGs/" & fileName

--display dialog "myFile is " & myFile

set myScript to the clipboard -- copy the clipboard into a variable

-- if the file exists alreay, no problem.  This will overwrite it
try
	set fileDescriptor to open for access myFile with write permission
	set eof of fileDescriptor to 0 -- Delete current contents of the file
	write myScript to fileDescriptor starting at eof
	write linefeed & "exit" & linefeed to fileDescriptor
	close access myFile -- Close the file
end try

-- display dialog "Done"

return i

-- all the major work is done here
-- This is magic
--
on searchFor:searchTag inURL:URLString
	set theURL to current application's |NSURL|'s URLWithString:URLString
	set theSource to current application's NSString's stringWithContentsOfURL:theURL encoding:(current application's NSUTF8StringEncoding) |error|:(missing value)
	set dataDetector to current application's NSDataDetector's dataDetectorWithTypes:(current application's NSTextCheckingTypeLink) |error|:(missing value)
	set linkArray to dataDetector's matchesInString:theSource options:0 range:{location:0, |length|:theSource's |length|()}
	set URLList to (linkArray's valueForKeyPath:"URL.absoluteString") as list
	set resultList to {}
	
	tell application "Keyboard Maestro Engine"
		set startingAt to value of variable "startingAt"
		set endingAt to value of variable "endingAt"
	end tell
	
	repeat with anItem in URLList
		if searchTag is in anItem then
			if (contents of anItem) contains "theatlantic.com" then
				-- specialized for theAtlantic
				-- could specialize for others as well
				if ((contents of anItem) contains "1500") and ((contents of anItem) contains "thumbor") then
					-- display dialog (contents of anItem)
					set end of resultList to (contents of anItem)
				end if
			else
				-- more specialization could go here
				set end of resultList to (contents of anItem)
			end if
		end if
	end repeat
	
	-- next, only return the sublist from startingAt to endingAt, pruning all the rest
	set prunedList to {}
	if (length of resultList) ≥ (endingAt - startingAt + 1) then
		set prunedList to (items startingAt thru endingAt) of resultList
	else
		set prunedList to resultList
	end if
	
	return prunedList
	
end searchFor:inURL:

use AppleScript version "2.4" -- Yosemite (10.10) or later
use scripting additions

set my_imageSet to ""

tell application "Keyboard Maestro Engine"
	set my_imageSet to value of variable "imageSet"
	-- display dialog my_imageSet
	tell application "Finder"
		set _files to selection as alias list
		if _files is {} then return display alert "Nothing selected!" as warning giving up after 5
		repeat with f in _files
			set theoldstring to name of f
			set thenewstring to findAndReplaceInText(theoldstring, "original", my_imageSet) of me
			set name of f to thenewstring
			delay 0.5
		end repeat
		
	end tell
end tell

on findAndReplaceInText(theText, theSearchString, theReplacementString)
	set AppleScript's text item delimiters to theSearchString
	set theTextItems to every text item of theText
	set AppleScript's text item delimiters to theReplacementString
	set theText to theTextItems as string
	set AppleScript's text item delimiters to ""
	return theText
end findAndReplaceInText


return

1 Like