How to trim URL to root domain?

Basically, I needed to check the age of the domain. When you copy any URL, it should get converted from http://www.google.com/xyz to google.com

Then I can tell Keyboard Maestro to launch Terminal, type in whois %CurrentClipboard% pause a sec, CMD+F, and search for “Creation Date” to get the domain age quickly.

PS: There’s no reliable good Google Chrome Extension for checking domain age.

I’m struggling to convert URL to root domain. Maybe via regex search and replace clipboard?
Got a regex - (\w+)\.(com|net|gov|edu|co) to match the root domain. Not sure how to proceed.

Maybe anyone has even more elegant way to find the domain age perhaps in format like 1 Years, 45 days old as opposed to mm-dd-yyyy format which is easy on my eyes?

If you are on the web page of interest, then the easiest is to use this simple JavaScript:
`

JavaScript

document.domain;

If you really need to get the domain from a URL, just do a google search for "regex extract domain from url" and I'm sure you'll get many hits.

2 Likes

Yeah, that's the tricky part. For easy stuff like the provided sample (http://www.google.com/xyz) something like this should work:

^(?:.*://)?(?:www\.)?([^:/]*).*$

This should cover URLs like:

It will not work with subdomains other than "www" like in "files.google.com/xyz". To make a regex work with all kind of URLs it seems you need a complete list of TLDs (because of TLDs like "co.uk"). See also this post on Stackoverflow.

The best bet would probably be using perl with the URI module, or something similar.


Well, here is a macro with the aforementioned regex. It also does the Whois query and the conversion of the creation date to age:

[test] Get Domain Age.kmmacros (4.8 KB)

To test it paste your URL into the green action. To use the content of your clipboard paste %CurrentClipboard% into the field.


Edit/PS:

If you do have to deal with subdomains other than "www" and you do not have TLDs consisting of two parts (co.uk etc.) then try this regex:

^(?:.*://)?(?:.*?\.)?([^:/]*?\.[^:/]*).*$

This handles URLs like

but it will fail with

1 Like

@AkshayHallur, since @Tom stepped out with a full solution, he inspired me to work on the RegEx.

I found a two-step solution that should work with ANY TDL, and any Server (like "www" or "test", or none).

First RegEx to Extract Server Name:
(I started with @Tom's RegEx)
^(?:.*:\/\/)?([^:\/]*).*$

See regex101: build, test, and debug regex

Second RegEx to Extract Domain Name from Server Name:
[^\.]+\.[^\.]+$

See regex101: build, test, and debug regex

Example Output

Given this URL: https://server1.google.somelongTDL/xyz

Here's the macro:

##Macro Library   @RegEx Extract Domain Name [Example]


####DOWNLOAD:
<a class="attachment" href="/uploads/default/original/2X/2/2837417161ccf5c0db822e7a9ed602419a281562.kmmacros">@RegEx Extract Domain Name [Example].kmmacros</a> (3.3 KB)
**Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.**

---


<img src="/uploads/default/original/2X/1/1b012163fa9c81df2bf4e913d944c9452902bf34.png" width="459" height="873">
2 Likes

Probably you have not seen the Edit of my post.

Your regex seems to work similar, that is, it reliably strips all subdomains ("servers"), but it fails with 2-part TLDs like "co.uk".

Nope, didn't see your edit until now.

So, would all 2-part TLD's end 2 character string?
Or, would they all be preceded with "co."?

So, I go back to my OP: If you are on the web page of interest, just use the JavaScript:
document.domain

rather that copying or getting the URL via %ChromeURL%, then trying to parse it.

I like this php function from your SO link:

function parse_url_all($url){
    $url = substr($url,0,4)=='http'? $url: 'http://'.$url;
    $d = parse_url($url);
    $tmp = explode('.',$d['host']);
    $n = count($tmp);
    if ($n>=2){
        if ($n==4 || ($n==3 && strlen($tmp[($n-2)])<=3)){
            $d['domain'] = $tmp[($n-3)].".".$tmp[($n-2)].".".$tmp[($n-1)];
            $d['domainX'] = $tmp[($n-3)];
        } else {
            $d['domain'] = $tmp[($n-2)].".".$tmp[($n-1)];
            $d['domainX'] = $tmp[($n-2)];
        }
    }
    return $d;
}

I think I’ll try to convert it to JavaScript.

Yes, looks promising :slight_smile:

However, I still have doubts, how to correctly handle for example

files.google.de
vs
google.co.uk

without having lexical knowledge of the possible TLD combinations. (?)

Here's an idea:

If you're not on the web page, open the URL, and then run the JavaScript.
You might be able to do this with CURL?

You mean to “pipe” the curl output into document.domain; ?

I don’t know how to use the shell in JavaScript. You are the JS guru :wink:

This perl script seems to work fine:

#!/usr/bin/env perl

use 5.010;
use strict;
use warnings;

use Domain::PublicSuffix;

my $suffix = Domain::PublicSuffix->new({'data_file' => '/tmp/effective_tld_names.dat'});
my $root = $suffix->get_root_domain('server1.db.de');
# my $root = $suffix->get_root_domain('db.de');
# my $root = $suffix->get_root_domain('server1.db.co.uk');


print $root;

Requires the Domain::PublicSuffix module. It makes use of a list with TLDs. Explanation here.

How do we get/install this module?
I read the cpan.org article, but it says:

This module will attempt to search etc directories in /usr/share/publicsuffix, /usr, /usr/local, and /opt/local for the effective_tld_names.dat file. If a file is not found, a default file is loaded from Domain::PublicSuffix::Default, which is current at the time of the module's release. You can override the data file path by giving the new() method a 'data_file' argument.

does your script do that?

Seems so, yes. I haven't done big testing, but it works fine with the examples I've put in the script above.

To install a perl module use cpan, or more comfortable, cpanminus. You can install cpanminus with Homebrew:

brew install cpanm

Before – if not already done – install perl with brew install perl (Instead of using the system perl)

OK, I installed both in the order you said:
brew install perl
brew install cpanm

but when I run from a KM Execute Shell Script, I get this:

Can't locate Domain/PublicSuffix.pm in @INC (you may need to install the Domain::PublicSuffix module) (@INC contains: /Library/Perl/5.18/darwin-thread-multi-2level /Library/Perl/5.18 /Network/Library/Perl/5.18/darwin-thread-multi-2level /Network/Library/Perl/5.18 /Library/Perl/Updates/5.18.2 /System/Library/Perl/5.18/darwin-thread-multi-2level /System/Library/Perl/5.18 /System/Library/Perl/Extras/5.18/darwin-thread-multi-2level /System/Library/Perl/Extras/5.18 .) at /var/folders/hb/6xgg0y8j4g530m81rd1f9mpc0000gn/T/Keyboard-Maestro-Script-51EF52D5-FB9D-48E7-B9B0-BF516C979CFF line 7.
BEGIN failed--compilation aborted at /var/folders/hb/6xgg0y8j4g530m81rd1f9mpc0000gn/T/Keyboard-Maestro-Script-51EF52D5-FB9D-48E7-B9B0-BF516C979CFF line 7.

I'm a total shell script dummy, so I don't have idea what this means, except that it could not find the Domain/PublicSuffix.pm

Any ideas on how to fix?

Does the script run outside KM?

(Copy it to a BBEdit document, save it as foo.pl, then hit ⌥⌘R)

Nope. Same basic error:

/Users/Shared/Dropbox/SW/DEV/Projects/[KM] Extract Domain Name/Get-Domain.pl:7: Can't locate Domain/PublicSuffix.pm in @INC (you may need to install the Domain::PublicSuffix module) (@INC contains: /usr/local/Cellar/perl/5.26.0/lib/perl5/site_perl/5.26.0/darwin-thread-multi-2level /usr/local/Cellar/perl/5.26.0/lib/perl5/site_perl/5.26.0 /usr/local/Cellar/perl/5.26.0/lib/perl5/5.26.0/darwin-thread-multi-2level /usr/local/Cellar/perl/5.26.0/lib/perl5/5.26.0 /usr/local/lib/perl5/site_perl/5.26.0)

To install "Domain::PublicSuffix module", do I need to do it from any particular dir?
Last time I did it from my home dir.

I just tried installing it again, and got this:

iMac-27-JMU:~ jimunderwood$ brew install cpanm
Warning: cpanminus 1.7043 is already installed
iMac-27-JMU:~ jimunderwood$ 

Any ideas?

When installing perl via Homebrew, have you seen and followed the instruction that have said something like this?:

PERL_MM_OPT="INSTALL_BASE=$HOME/perl5" cpan local::lib
echo 'eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)"' >> ~/.bash_profile

Or: do you have the line

eval "$(perl -I$HOME/perl5/lib/perl5 -Mlocal::lib)"

in your .bash_profile ?

Don't remember that.
I don't seem to have a bash profile:

iMac-27-JMU:~ jimunderwood$ open -a BBEdit.app .bash_profile
The file /Users/jimunderwood/.bash_profile does not exist.

How do I create it, and what should be in it?

Then in the ~./bashrc ?