An Automatic URL Decrufter for Defined Hosts

A few years back, a friend and I were discussing the amount of cruft in many URLs. As an example, here's a URL from a recent Monoprice email newsletter. Note: I changed a few of the characters so these URLs will not work; they're for demonstration purposes only.

http://enews.emails.monoprice.com/q/LLMxJBmP1xo0XEpHXEE5PmLbqHbNxpVsuheZcOJcm9iZ0Bnc3lmZnN3ZWIuY29tw4gZyWuu1KtfpSL58gax3zGue2fhqQ

If you load that in your browser, here's what you'll see in the URL bar:

https://www.monoprice.com/product?p_id=11297&trk_msg=8OLJJ72Q87O4PCIC9HC71NL7VS&trk_contact=T3KPNYKTNJTD6NU53HNJCDD7T4&trk_sid=FGHVB9JP6L3MVLERLES7DLEGEG&trk_link=9HBDB7KBCAEKT949P8P8JQG5J4&cl=res&utm_source=email&utm_medium=email&utm_term=View+product+recommended+for+you&utm_campaign=210902_thursday

But everything after the "p_id..." bit is simply tracking information; this is the actual URL:

https://www.monoprice.com/product?p_id=11297

You may not wish to share all that tracking information with companies every time you click one of their URLs. Enter the URL Decrufter…

__ The Decrufter 8.6 Macros.kmmacros (403 KB)

The Decrufter works whenever you copy a URL in one of the apps it's active in. It first checks the domain to see if it's one you want to decruft, and if so, cleans it up and optionally opens it in your browser (and puts it on the clipboard and saves it for future lookups).

The process of cleaning the URL is tricky, as the source URL (the enews.emails.monoprice... one above) doesn't even contain the final domain or tracking information. The first thing The Decrufter does is use curl to find the actual final destination, which is the one shown in the second URL above, with all the tracking information attached. Then, finally, The Decrufter uses a series of regex filters to try to clean up those URLs; here's what the Monoprice filter looks like:

After cleaning, The Decrufter puts the clean URL on the clipboard, optionally opens it in a browser, and saves it to a database history file—if you ever try to decruft the same URL again, it'll load instantly from the database.

Please read the instructions in the very first macro in the group, usefully named ━━ The Decrufter 8.6 ━━ for more details on using the macro. If you have questions, please ask!

Latest Release

8.6 (Dec 9 2023): This is a huge update that removes the use of a large global variable for tracking decrufted URLs. Instead, there's a new database, which is faster, safer, and much easier to use. (I removed about 10 regex search/replace functions thanks to the database.) There's also a new curl "progress bar" to let you know that the macro is still waiting on curl for a response. I also fixed a lot of other little things to make it run faster and fail more nicely.

Older Releases

8.5 (Oct 9 2023): A minor update that changed some variable names to match my convention for other public macros I've written, and that updates the flying.com decrufting routine due to a new URL.

8.4 (May 5 2023): This version has the Facebook typo fix, a simple YouTube decrufter, and some improvements in logic in a few routines.

8.3 (Apr 11 2023): Though I fixed the custom URL feature in 8.2, that fix broke everything else. Whoops. Should be all better now.

8.2 (Apr 11 2023): This version fixed an issue with custom domains for both filtering and non-curling, added aliexpress as a decrufting domain, and updated the in-macro help comments.

I'd love your feedback, and if you have domains that you'd like to see filtered, feel free to contact me with the copied URL (the source URL from the email, etc.—not the final URL!) and I'll try to get them in the macro.

-rob.


Tags: @clean, @sanitize, @url, @privacy, @email, @strip, @tracking @shorten


11 Likes

Important update today! (No need to add "robservatory" to the domain list, not yet, anyway. :slightly_smiling_face:)

1 Like

I forgot I had posted this here :). The blog post that Laine linked to has the current-version download, or just click here to get the latest version.

The version here is quite out of date now; it still works fine, but the updated one is much faster and smarter about a number of things.

-rob.

2 Likes

Since the last post here, the decrufter has been revised with a couple key changes: One in how copied URLs are opened (it's no longer automatic), and the structure has been modified to make it easier to update for users who customize the macro. (Though there will be some one-time work on this update to enable easier future updates.)

-rob.

2 Likes

Would you link your URL Decrufter toggler macro, please? Thanks.

It exists anecdotally in the comments to the blog post, but maybe we could add it here. I can, if you like. Thanks.

Sure. If you want to toggle the defcrufter off and on, make a macro with this single step, and assign it to your trigger of choice:

If condition: The Decrufter is enabled
Then condition: Disable macro The Decrufter
Else condition: Enable macro The Decrufter

Basically toggles it from whatever state it's in to the other state. (Mine also displays a notification so I know which state it's presently in.)

-rob.

1 Like

Having a little trouble adding Youtube. If I add youtube to the custom list, copy and rename a "dn-..." macro, then clear the d_finalURL in the copied macro (no matter which I choose), the resulting copied URL remains the same as before I tried to customize. The final URL is always "YouTube".

Is it the login that complicates it? Something else? Can you make it work? Thanks.

I used the "most common" method at https://regex101.com, and got the result I wanted. Why doesn't that work for the Decrufter?

I've never seen a "crufty" YouTube URL, so I have no idea what you're working with. Can you provide an example?

-rob.

Thanks!

pre-decrufted:
https://www.youtube.com/watch?v=i7AoBLgKyns&list=PLi4l3wxwkqyy9EDOH4VzhhEL-mjOUoaa8&index=12
or
https://www.youtube.com/watch?v=inWLsKbbTIk&list=PLi4l3wxwkqyykJWG7gUjw0nh-Mv_Xy_00&index=3

aiming for:
https://www.youtube.com/watch?v=i7AoBLgKyns
or
https://www.youtube.com/watch?v=inWLsKbbTIk

I have a few customized "dn-..."s, what confuses me about this one is that the default ([^&]*) works as expected when I use it against that pre-decrufted string at regex101.com.

Thanks again.

Lantro

I can't get some curl-bypassing URLs to be successfully decrufted. I deleted my existing macro group and reinstalled URL Decrufter from the link provided. The example Monoprice URL you included in your original post was decrufted as described, but the following example from my email wasn't:

https://www.monoprice.com/products/product.asp?c_id=303&cp_id=30307&cs_id=3030727&p_id=5384&seq=1&format=2

Same problem for the Youtube URLs included in my first reply, although I've successfully added Youtube to the list for decrufting, added it to the list for curl bypassing, and given Youtube a renamed and reconfigured copy of of the "dn-Woot" macro. According to rubular.com, the following regex appeared to give me the URL I need for the Youtube example, but doesn't seem to work for URL Decrufter:

^[^&]*

But even that doesn't give me the URL needed for the URL from my Monoprice email, either with URL Decrufter or rubular.com. That URL would apparently need regex to shorten it by only the last two "&" strings.

I'm not too worried about the Monoprice URL, but getting that Youtube URL to decruft would help a lot.

Thanks again.

Lantro

Not quite done with the Monoprice process, but I think I got the Youtube taken care of, finally. It appears that the "Domain <-> filter matching" macro wasn't quite right. Thanks again.

The problem is that the companies are changing their cruft structure all the time, and it becomes a game of wack-a-mole—one that I'm not finding I have the time to keep up with very well.

For YouTube, the part after the "v" for video is just the playlist it's on, it's not really carrying personal information about you as encoded in a regular crufty URL, which is what I wrote the thing for in the first place (to handle such URLs from mailing lists). But it would be trivial to chop it, as you just need to drop everything at and after the ampersand. This regex does just that:

^(.*?)&.*

WIth The Decrufter, you always need a capture group in the regex, because that's what gets passed along to the next step—it's stored in the d_trimmedURL variable. Your example above doesn't have a capture group, so nothing is captured.

As for Monoprice, that one's tricky because you're going to need to build a new URL on the fly, grabbing bits of the original (after bypassing curl). The actual product URL for the item in your example appears to be this:

https://www.monoprice.com/product?p_id=5384

Using that, and remembering that it's early and I only took one shot at this, so I doubt it's the simplest version possible, this should work as the regex:

^(.*product)s\/product.asp(\?).*&(p_id=[0-9]+).*

That turns your long URL into the actual product URL. Note: You'd want to set the "All:" field in the search/replace command, not the "1:" (or "2:" or "3:") fields to d_trimmedURL. The regex creates three separate capture groups to build the URL, and using "All:" combines them into one string that's stored in d_trimmedURL.

Hope that helps;
-rob.

1 Like

Yes, of course, it helps very much! Thanks so much again.

Sincerely,
Lantro

Hi Rob,

Could you help with URLs starting at aliexpress.com? The resulting URLs are always redirected to aliexpress.us, I think. If you browse and select any link, I think you'll see what I mean. Thanks.

There's good news and bad news about that one ... the good news is that the final URL is in the crufty URL, which makes it easy to process. The bad news is I found a slight glitch in 8.1 that means any user-added domains and non-curl domains will not work (doh!).

I'll ship 8.2 soon with a fix for custom domains, but for now, if you want to make aliexpress work, just add it (just the word aliexpress) to the list of domains in macro 12] Set up domains to decruft and 14] Set up domains that bypass curl. When 8.2 comes out, it will be in that list already, so it will continue to work.

-rob.

1 Like

And 8.2 is live now, checking for updates in the 8.1 version should see and download it.

-rob.

2 Likes

And 8.3 is now live, because I do stupid things like breaking everything else when fixing some other thing. Should all be better now.

-rob.

4 Likes

Thanks a million, Rob!