Extract Domain from URL With RegEx

Not sure the exact terminology but by clean I just mean walmart.com as opposed to https://walmart.com/

Here's some examples I'd be working with

http://careers.walmart.com/?codes=Linkedin&utm_source=Linkedin&utm_campaign=ALL&utm_medium=SocialOrganic&utm_term=AboutUs&utm_content=Branding

http://publix.jobs/

https://jobs.gapinc.com/

https://forum.keyboardmaestro.com/t/simple-search-with-regex-question/15733/5

\.(.*)/
^ This worked 90% of the time, capturing everything between the first . and the last / but didn't work when it didn't match those paramterers.

This should do the trick:
(?m)http(?:s?):\/\/.*?([^\.\/]+?\.[^\.]+?)(?:\/|$)

For details, see regex101: build, test, and debug regex

2 Likes

@JMichaelTX

Do you know what alteration could me made to account for domain endings like .co.uk or .bc.ca, etc.?

Well, this is tough. Provided that these extensions always end in two characters, then this should work:
(?m)http(?:s?):\/\/.*?([^\.\/]+?\.[^\.]+?(?:\.\w{2})?)(?:\/|$)

For details, see regex101: build, test, and debug regex

This is NOT perfect, and is just my best guess. Again, the top-level domain MUST be exactly two characters for this to work, as in:
https://www.somedomain.co.uk/

This will work with and without the two character extension:
https://forum.keyboardmaestro.com/
https://www.somedomain.co.uk/

1 Like

Thanks! Only concerned with .co.uk and all of the canadian ones which are all two characters as well so hopefully this work just fine

@JMichaelTX

it's working great with the exception that if the domain ends in 2 charactes (ex. Just .ca or Just .co) the www. is not stripped from the url

Because of the complexity added by all of the new top-level domains in recent years, it becomes much more challenging to extract parts of a URL. I'd suggest that you post your full request at stackoverflow.com so you can reach a much broader audience of RegEx users.

1 Like

@rayk Do you have to use a regex to do this? It would be quite easy with a JXA action and NSURL.

Typing this from memory, so hopefully I’ve got it right (I’ll clean it up later if not, once I am back on my Mac)...

(() => {
	'use strict'
	
	const
		yourURL = Application('Keyboard Maestro Engine').getvariable('YourURL'),
		url = $.NSURL.URLWithString(yourURL),
		domain = url.host.js
	
	return domain
})();

Put that in an Execute JavaScript for Automation action (changing the YourURL variable name as appropriate) and it should get you what you want.

NSURL has the ability to pull out other components from a URL as well.

That is NOT working for me.
For "YourURL" of:
https://www.somedomain.co.uk/

it returns:
www.somedomain.co.uk

I believe what the OP wants here is:
somedomain.co.uk

BTW, just for clarity, in the script the parameter:
'YourURL'

in the statement
.getvariable('YourURL')

refers to the KM Variable that contains the actual URL.

This is improved for the next major version.

  • Added Filter URL components such as scheme, host and path.
3 Likes