Macro: Extract Domain from URL [Example] (v9.0.6d1)

Use Case

  • Use to extract just the Domain Name from a valid URL
  • This is a very tough problem because of the wide variety of URL formats
  • It has been discussed in the KM Forum here:
  • There are solutions offered in both of those, and in other places on the Internet
  • Having studied and tested all that, I think I may have a highly reliable soluiton.
  • But, I need YOUR HELP to test and make sure, and to adjust as needed.
  • So please download and test with your most unusual URLs -- try to break the macro! :wink:

@peternlewis, since you're a Regex guru, and we've had a number of discussions about URLs, maybe you could review/test this if you have time.

MACRO:   Extract Domain from URL [Example]

-~~~ VER: 1.0    2020-07-12 ~~~
Requires: KM 8.2.4+   macOS 10.11 (El Capitan)+
(Macro was written & tested using KM 9.0+ on macOS 10.14.5 (Mojave))

DOWNLOAD Macro File:

Extract Domain from URL [Example].kmmacros
Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.


Example Output


Method

In the large majority of cases this Regex will return the correct Domain Name in the Local__Domain variable.

The only issue with the above RegEx is when the TLD (Top Level Domain) is non-US, i.e., a two-character name.

  • When this occurs, then this Regex will miss the main part of the SLD (Second Level Domain) IF there is no actual Server Name (like "www") provided.

  • For Example: https://controldesign.co.uk

  • The RegEx returns:

  * Local__Server: "controldesign" -- which is incorrect
  * Local__Domain: "co.uk"  -- which is incorrect
  * Local__SLD: "co"
  * Local__TLD: "uk"
  • In this case, the Regex thinks "controldesign" is the Server name, which is incorrect.
  • When that happens, the Local__SLD value will be ≤ 3 chars because it contains only part of the non-US domain.
  • So, this IF/THEN works like this:
  IF ((Length(TLD) = 2) AND (Length(SDL) ≤ 3))
  THEN prepend the Local__Server to Local__Domain

Note: The KM function for Length is "CHARACTERS()"


ReleaseNotes

Author.@JMichaelTX

PURPOSE:

  • Provide a Method to Extract the Domain Name from a URL

NOTICE: This macro/script is just an Example

  • It is provided only for educational purposes, and may not be suitable for any specific purpose.
  • It has had very limited testing.
  • You need to test further before using in a production environment.
  • It does not have extensive error checking/handling.
  • It may not be complete. It is provided as an example to show you one approach to solving a problem.

HOW TO USE

  • Run as is to see the results of the test URLs that have been provided.
  • ADD All of Your URL Test Cases to the Below Action in Magenta color.

REQUIRES:

  1. KM 8.0.2+
  • But it can be written in KM 7.3.1+
  • It is KM8 specific just because some of the Actions have changed to make things simpler, but equivalent Actions are available in KM 7.3.1.
    .
  1. macOS 10.11.6 (El Capitan)
  • KM 8 Requires Yosemite or later, so this macro will probably run on Yosemite, but I make no guarantees. :wink:

MACRO SETUP

  • Carefully review the Release Notes and the Macro Actions
    • Make sure you understand what the Macro will do.
    • You are responsible for running the Macro, not me. :wink:
      .
  • Assign a Trigger to this maro.
  • Move this macro to a Macro Group that is only Active when you need this Macro.
  • ENABLE this Macro.
    .
  • REVIEW/CHANGE THE FOLLOWING MACRO ACTIONS:
    • ALL Actions that are shown in the magenta color
    • ADD your URL Test Cases to this Action:
      • ADD All of Your URL Test Cases to the Below

USE AT YOUR OWN RISK

  • While I have given this limited testing, and to the best of my knowledge it will do no harm, I cannot guarantee it.
  • If you have any doubts or questions:
    • Ask first
    • Turn on the KM Debugger from the KM Status Menu, and step through the macro, making sure you understand what it is doing with each Action.

1 Like

Your regex looks fine to me. For this sort of thing I'd typically just extract bits at a time:

  • Delete everything before :\/\/+
  • Delete everything after \/
  • Delete everything before @ (remove any username)
  • Delete everything before : (remove any password)

And then try to figure out the part of the domain name that was desired, although this seems a difficult task to ensure correctness.

The only reason I'd do it in parts is it is easier to debug, but you already have a working solution so unless you have a case that doesn't work, I wouldn’t be messing with it.

Thanks for the review, Peter.

1 Like