Stripping Tracker Portions From a URL (Clean URL Parameters)

I'd like some help, please.

I am trying to create a macro that automatically strips away the tracker info that is placed on the end of URLs so that I can add the trimmed web link to a Facebook or Twitter feed or send the link to somebody else without the tracking. For example, I'm testing the macro on this sample URL:

http://singularityhub.com/2016/06/03/facial-recognition-tech-will-soon-end-your-anonymity-in-public/?utm_content=buffer1cf00&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer

All the content from the ? to the end of the URL is tracker code. I want to end up with same URL so that it looks like this:

http://singularityhub.com/2016/06/03/facial-recognition-tech-will-soon-end-your-anonymity-in-public/

The problem I'm having is that I don't know the RegEx that does that. I've tried variations on stripping the URL in a system clipboard in a Search and Replace Clipboard using a regular expression (ignoring case) by using the regex ?$ and replacing its with nothing. But that doesn't seem to work.

Here's what I have in the macro so far:

Select the URL in the Safari URL field using Command A
Copy the url to the clipboard
Pause for 0.5 seconds to give the clipboard a chance to get the data ready
Search the system clipboard for ?$ and replace what I think is the chunk starting with the ? with nothing
Pause for 0.5 seconds to give the clipboard a chance to process
Paste the 'trimmed' URL back into the Safari field.

Except is doesn't work; I just get the original URL pasted back untrimmed.

Don''t know where I'm going wrong, but I assume it's got something to do with the regex.

Thanks.

Here's your regex expression:

([^?]*)

You can see it explained here: https://regex101.com/r/fW8mM1/1 (look for the "Explanation" section).

Here's one way to use it:

1 Like

Dan, that's a very clever, concise, expression. :thumbsup:
I would not have thought of it.

For the benefit of all, including me, that expression will capture everything from the beginning of the string up to, but not including, the specified character, a "?" in this case.

BTW, one minor, alternate, KM action is to use Variable Search and Replace, so that you do not have to create another KM variable, if you modify your RegEx expression slightly:

([^\?]*).*$

This pattern matches the entire original URL, but replaces it with the first capture group, which is prior to the "?".

I love RegEx. There are so many different ways to do things, and it is so flexible, powerful, and concise.

The bad news is that I only understand about 5% (if that much) of RegEx.

Ooh, I like yours better. Don't know why I didn't think of it - apparently I only understand about 5% of KM! :stuck_out_tongue:

I mentioned the URL that explains the regex, but here's the info anyway:

Love regex101.com. Easy to test regex expressions, they give good explanations, and you can save your examples and share the link. I forget who initially pointed me to this site, it might have been you! In any case, Love. It.

And yeah, I don't know if anyone knows more than 5% of regex. :stuck_out_tongue:

One other approach can be to read the pre-tracking address straight from a web page element.

1 Like

What I love about this forum is that various responses teach, explain and innovate -- and we get to use the best bits and pieces from everybody to create wonderful macros!

In my macro, I've used DanThomas's RegEx (and I agree with MichaelTX that it's a great algorithm), I've used MichaelTX's tweak and approach, for the exact reasons he's given and because I didn't want the ? left over. And the combination works like a charm (It's literally magic!!)

I've also bookmarked the Explanation link offered by DanThomas to help me with future RegEx (so I can get beyond 0.001% of my RegEx knowledge and head for the giddy 5% heights. I've also absconded with ComplexPoint's approach (and macro) which eliminates 'referrer noise' (a wonderful phrase) by not having it sully my Safari in the first place.

In addition, I've used Peter Lewis's thoughtful macro step, from his useful Safari group of actions, of obtaining the Safari URL itself for the macro without the need to mouse and click into the URL field to then select and copy it to the clipboard.

Here's the macro I came up with.

Keyboard Maestro “Safari - Delete URL tracker info” Macro

Safari - Delete URL tracker info.kmmacros (3.4 KB)

By the way I've yet to decide whether to leave it as it is, or add a Press the Key Return at the end to force Safari to load the website with just a clean version of the URL as well as leave the clean URL in the Clipboard for other uses. I'll see which way works better for me as I use my (our) spiffy new macro!

Thanks, everyone!

3 Likes

@Peter_Morgan, just so you know, you don't need to do a COPY.
You can set the Clipboard directly to the Safari URL:

Good luck, and let us know how it goes.

1 Like

If you just want to delete everything from the ? onwards, then you can just do:

That is, search and replace ? followed by zero or more any character (except end of line characters).

Note if the variable might contain multiple lines, then this will not work as "." in regex does not match end of line characters unless you turn on the (?s) flag with either (?s)?.* or ?(?s:.*).

2 Likes

Great! Better still. So many RegEx solutions.

Hey Peter,

If you’re always dealing with a ‘?’ as the query character then AppleScript makes this task very simple.

set theURL to "http://singularityhub.com/2016/06/03/facial-recognition-tech-will-soon-end-your-anonymity-in-public/?utm_content=buffer1cf00&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer"
set AppleScript's text item delimiters to "?"
set baseURL to text item 1 of theURL

On my own system I’d probably employ the Satimage.osax AppleScript Extension, and do something like this:

# The Satimage.osax MUST be INSTALLED for this to function.
set theURL to "http://singularityhub.com/2016/06/03/facial-recognition-tech-will-soon-end-your-anonymity-in-public/?utm_content=buffer1cf00&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer"
using terms from scripting additions
   tell (get URL info for theURL) to return scheme & "://" & host & path
end using terms from

Now let’s turn the first bit of code onto a bonafide macro:

Run this from an Execute an AppleScript action.

set AppleScript's text item delimiters to "?"
tell application "Safari" to set theURL to text item 1 of (get URL of front document)
set the clipboard to theURL

It will get your URL from Safari, massage it, and put it on the clipboard.

-Chris

Another method... I use this AppleScript to expand shortened URLs and then remove Google tracking codes from it. I have it in a KM macro that then pastes the results. This lets important non-tracking-related parameters in the URL survive the experience, rather than just killing all parameters.

Expand and Clean URL.scpt.zip (5.4 KB)

An alternative (that's less fun) is the StretchLink app, which does all of this quite nicely. (It was the inspiration for my script.)

http://stretchlinkapp.com/

1 Like

Hey,
I made this:
[g] OnClipboardChange.kmmacros (9.3 KB)

I bet it can be optimised, but there we go.

In the case of an Amazon URL it still contains a reference ID, for example "ref=sr_1_5"

How do I remove that as well?

Pre-Processed URL
https://smile.amazon.com/Heloideo-10000mAh-Portable-External-Lightning/dp/B07B8J6N2L/ref=sr_1_5?s=wireless&ie=UTF8&qid=1527891027&sr=1-5&keywords=power+bank+portable+charger+with+built+in+cable&dpID=41iWej4r4HL&preST=SY300_QL70&dpSrc=srch

Processed URL
https://smile.amazon.com/Heloideo-10000mAh-Portable-External-Lightning/dp/B07B8J6N2L/ref=sr_1_5

Desired URL
https://smile.amazon.com/Heloideo-10000mAh-Portable-External-Lightning/dp/B07B8J6N2L/

Use a Search and Replace action.

Search Clipboard with RegEx:
ref=.+

Replace with empty string.

For a bit broader search, you could use:
\p{L}[\p{L}\d]*=.+

that would catch (match) any query string that starts with a unicode letter and then zero or more letters or digits.

For more info, see Unicode Regular Expressions.

Thank you — I also utilized the regex logic provided by @peternlewis

image

I just noticed that when I use this, it adds a line break after the URL when pasting into Google Doc.

How do I resolve this?

The KM Search and Replace did NOT add anything. It removed all text from "ref=" to the end of the string.

Sounds like there was already a CR or LF in the string, which should not have been if you just copied the URL

To allow for a end of line character, you could use this:
ref=.+\R?

\R is the same as (?:\r?\n|\r)

When I remove the last action that removes the reference number - the extra line issue goes away

Note: I tried your the regex you just provided, but to no avail

Please post your exact macro, image and file. Can't diagnose what I can't see.

image