rayk
October 16, 2019, 8:50pm
1
Not sure the exact terminology but by clean I just mean walmart.com as opposed to https://walmart.com/
Here's some examples I'd be working with
http://careers.walmart.com/?codes=Linkedin&utm_source=Linkedin&utm_campaign=ALL&utm_medium=SocialOrganic&utm_term=AboutUs&utm_content=Branding
http://publix.jobs/
https://jobs.gapinc.com/
https://forum.keyboardmaestro.com/t/simple-search-with-regex-question/15733/5
\.(.*)/
^ This worked 90% of the time, capturing everything between the first . and the last / but didn't work when it didn't match those paramterers.
This should do the trick:
(?m)http(?:s?):\/\/.*?([^\.\/]+?\.[^\.]+?)(?:\/|$)
For details, see regex101: build, test, and debug regex
2 Likes
rayk
November 4, 2019, 9:03pm
3
@JMichaelTX
Do you know what alteration could me made to account for domain endings like .co.uk or .bc.ca, etc.?
Well, this is tough. Provided that these extensions always end in two characters, then this should work:
(?m)http(?:s?):\/\/.*?([^\.\/]+?\.[^\.]+?(?:\.\w{2})?)(?:\/|$)
For details, see regex101: build, test, and debug regex
This is NOT perfect, and is just my best guess. Again, the top-level domain MUST be exactly two characters for this to work, as in:
https://www.somedomain.co.uk/
This will work with and without the two character extension:
https://forum.keyboardmaestro.com/
https://www.somedomain.co.uk/
1 Like
rayk
November 4, 2019, 11:49pm
5
Thanks! Only concerned with .co.uk and all of the canadian ones which are all two characters as well so hopefully this work just fine
rayk
November 5, 2019, 8:23pm
6
@JMichaelTX
it's working great with the exception that if the domain ends in 2 charactes (ex. Just .ca or Just .co) the www. is not stripped from the url
Because of the complexity added by all of the new top-level domains in recent years, it becomes much more challenging to extract parts of a URL. I'd suggest that you post your full request at stackoverflow.com so you can reach a much broader audience of RegEx users.
1 Like
@rayk Do you have to use a regex to do this? It would be quite easy with a JXA action and NSURL.
Typing this from memory, so hopefully I’ve got it right (I’ll clean it up later if not, once I am back on my Mac)...
(() => {
'use strict'
const
yourURL = Application('Keyboard Maestro Engine').getvariable('YourURL'),
url = $.NSURL.URLWithString(yourURL),
domain = url.host.js
return domain
})();
Put that in an Execute JavaScript for Automation
action (changing the YourURL
variable name as appropriate) and it should get you what you want.
NSURL has the ability to pull out other components from a URL as well.
roosterboy:
domain = url.host.js
That is NOT working for me.
For "YourURL" of:
https://www.somedomain.co.uk/
it returns:
www.somedomain.co.uk
I believe what the OP wants here is:
somedomain.co.uk
BTW, just for clarity, in the script the parameter:
'YourURL'
in the statement
.getvariable('YourURL')
refers to the KM Variable that contains the actual URL.
This is improved for the next major version.
Added Filter URL components such as scheme, host and path.
3 Likes