So here's the first usage for this new knowledge that I thought I would share with the community on how to use this information.
How many times have you been on an Amazon.com page and wanted to share the name of the product and URL with someone?
For me, that happens a lot.
The first part is easy: the product name is basically the title of the web page (although, as you know, many of these titles are absurdly long and filled with keyword for SEO).
The second part is trickier. While you could just take the URL from your browser, most of the time it would be filled with a LOT of cruft that you neither want nor need. BUT! In the HTML source for every Amazon page is a line like this:
<link rel="canonical" href="https://www.amazon.com/MacBook-Release-Kuzy-Version-Display/dp/B07K8ZC6Y3" />
That <link rel="canonical"
means that the URL that Amazon considers to be the “official” URL for this page is https://www.amazon.com/MacBook-Release-Kuzy-Version-Display/dp/B07K8ZC6Y3
Although, as many people know but many others do not, the part between amazon.com
and /dp
is merely descriptive. You can put anything in there, or nothing at all. For example, this URL will work:
https://www.amazon.com/i-like-bananas-and-puppies-and-sunsets-and-silly-examples/dp/B07K8ZC6Y3
as will this one:
https://www.amazon.com/dp/B07K8ZC6Y3
So, I wanted to get the canonical URL. But how?
Previously, I used a shell script, and what would happen was that I would press my Keyboard Maestro macro key and the shell script would take the URL and send it to curl
which would try to fetch the same page that I was already looking at.
Obviously that's slow and inefficient. since I already have the information in Safari right now, but what was even worse is that (as you might expect) Amazon really, really, really does not want you doing any sort of automated “scraping” of their website (which is, in effect, what I was doing, although not for nefarious reasons). So my script would fail, often.
Now that I can use the HTML that is already in Safari, this is what I can do instead:
#!/usr/bin/env zsh -f
# this just helps the shell find utilities
PATH="/usr/local/bin:/usr/bin:/usr/sbin:/sbin:/bin"
# this is AppleScript inside a shell script using `osascript`
# each item inside "-e 'single quotes'" is the
# same as if they were on separate lines in an
# AppleScript script
SOURCE=$(osascript -e 'tell application "Safari"' \
-e 'set oTab to current tab of window 1' \
-e 'set HTMLStr to source of oTab' \
-e 'end tell')
# Now the variable '$SOURCE' has all of the HTML from Safari
# This part says "use the variable '$SOURCE'"
# narrow it down to just the line that matches '<link rel="canonical" href="'
# then replace everything up to and including "https://www.amazon" with
# "https://smile.amazon" (if you don't use "smile.amazon.com") you can
# just replace the word "smile" with "www"
URL=$(echo "$SOURCE" \
| fgrep '<link rel="canonical" href="' \
| sed 's#.*https://www.amazon#https://smile.amazon#g ; s#" />##g')
# This part says "use the variable '$SOURCE'"
# narrow it down to just the line that matches '<title>'
# then do the following:
# remove the '<title>'
# remove the '</title>'
# remove "Amazon.com: "
# replace any '&' with '&'
TITLE=$(echo "$SOURCE" \
| fgrep '<title>' \
| sed -e 's#<title>##g' \
-e 's#</title>##g' \
-e 's#Amazon.com: ##g' \
-e 's#&#\&#g')
# at this point, the variables we have are:
# $SOURCE = the entire HTML of the page
# $TITLE = the full title of the page
# $URL = the official / canonical URL
# Now, what do you want to do with those things?
# for me, I want a Markdown link, which means that I want the
# title in [brackets] and the URL in (parenthesis)
# and then I want that copied to the clipboard / pasteboard, so
# I would use this:
echo -n "[$TITLE]($URL)" | pbcopy
# the '-n' after 'echo' just says 'do not add a "newline" at the end
# if you wanted the script to output the $TITLE on one line
# and the $URL on another, you could use this:
# the '\n' says "add a line-break here"
# echo "${TITLE}\n${URL}"
# technically the {brackets} are not required, but I find it makes
# it easier to read the two variables separated by the '\n'
# compared to this
# echo "$TITLE\n$URL"
# but, functionally, they are the same
# again, we don't technically need this to end the script
# but I find it makes a nice marker for "this is the end"
exit 0
Anyway, I hope someone might find that useful, or at least interesting.