Macro: Extract Domain from URL [Example] (v9.0.6d1)

Use Case

  • Use to extract just the Domain Name from a valid URL
  • This is a very tough problem because of the wide variety of URL formats
  • It has been discussed in the KM Forum here:
  • There are solutions offered in both of those, and in other places on the Internet
  • Having studied and tested all that, I think I may have a highly reliable soluiton.
  • But, I need YOUR HELP to test and make sure, and to adjust as needed.
  • So please download and test with your most unusual URLs -- try to break the macro! :wink:

@peternlewis, since you're a Regex guru, and we've had a number of discussions about URLs, maybe you could review/test this if you have time.

MACRO:   Extract Domain from URL [Example]

-~~~ VER: 1.0    2020-07-12 ~~~
Requires: KM 8.2.4+   macOS 10.11 (El Capitan)+
(Macro was written & tested using KM 9.0+ on macOS 10.14.5 (Mojave))

DOWNLOAD Macro File:

Extract Domain from URL [Example].kmmacros
Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.


Example Output


Method

In the large majority of cases this Regex will return the correct Domain Name in the Local__Domain variable.

The only issue with the above RegEx is when the TLD (Top Level Domain) is non-US, i.e., a two-character name.

  • When this occurs, then this Regex will miss the main part of the SLD (Second Level Domain) IF there is no actual Server Name (like "www") provided.

  • For Example: https://controldesign.co.uk

  • The RegEx returns:

  * Local__Server: "controldesign" -- which is incorrect
  * Local__Domain: "co.uk"  -- which is incorrect
  * Local__SLD: "co"
  * Local__TLD: "uk"
  • In this case, the Regex thinks "controldesign" is the Server name, which is incorrect.
  • When that happens, the Local__SLD value will be ≤ 3 chars because it contains only part of the non-US domain.
  • So, this IF/THEN works like this:
  IF ((Length(TLD) = 2) AND (Length(SDL) ≤ 3))
  THEN prepend the Local__Server to Local__Domain

Note: The KM function for Length is "CHARACTERS()"


ReleaseNotes

Author.@JMichaelTX

PURPOSE:

  • Provide a Method to Extract the Domain Name from a URL

NOTICE: This macro/script is just an Example

  • It is provided only for educational purposes, and may not be suitable for any specific purpose.
  • It has had very limited testing.
  • You need to test further before using in a production environment.
  • It does not have extensive error checking/handling.
  • It may not be complete. It is provided as an example to show you one approach to solving a problem.

HOW TO USE

  • Run as is to see the results of the test URLs that have been provided.
  • ADD All of Your URL Test Cases to the Below Action in Magenta color.

REQUIRES:

  1. KM 8.0.2+
  • But it can be written in KM 7.3.1+
  • It is KM8 specific just because some of the Actions have changed to make things simpler, but equivalent Actions are available in KM 7.3.1.
    .
  1. macOS 10.11.6 (El Capitan)
  • KM 8 Requires Yosemite or later, so this macro will probably run on Yosemite, but I make no guarantees. :wink:

MACRO SETUP

  • Carefully review the Release Notes and the Macro Actions
    • Make sure you understand what the Macro will do.
    • You are responsible for running the Macro, not me. :wink:
      .
  • Assign a Trigger to this maro.
  • Move this macro to a Macro Group that is only Active when you need this Macro.
  • ENABLE this Macro.
    .
  • REVIEW/CHANGE THE FOLLOWING MACRO ACTIONS:
    • ALL Actions that are shown in the magenta color
    • ADD your URL Test Cases to this Action:
      • ADD All of Your URL Test Cases to the Below

USE AT YOUR OWN RISK

  • While I have given this limited testing, and to the best of my knowledge it will do no harm, I cannot guarantee it.
  • If you have any doubts or questions:
    • Ask first
    • Turn on the KM Debugger from the KM Status Menu, and step through the macro, making sure you understand what it is doing with each Action.

Your regex looks fine to me. For this sort of thing I'd typically just extract bits at a time:

  • Delete everything before :\/\/+
  • Delete everything after \/
  • Delete everything before @ (remove any username)
  • Delete everything before : (remove any password)

And then try to figure out the part of the domain name that was desired, although this seems a difficult task to ensure correctness.

The only reason I'd do it in parts is it is easier to debug, but you already have a working solution so unless you have a case that doesn't work, I wouldn’t be messing with it.

Thanks for the review, Peter.

1 Like