List of URLs found in a text ( subroutine )

A subroutine which lists all the links (URLs) found in a given sample of plain text.

The list is formatted as as JSON array.

To count the links in the list, and get at individual items by index, see the Keyboard Maestro %JSONValue% token.

You can also work through the resulting list of URLs with a Keyboard Maestro For Each action, using its JSON keys collection option.

The subroutine, with a macro illustrating its use:

List of URLs found in a text Macros.kmmacros (8.6 KB)

Expand disclosure triangle to view JS source of subroutine
const
    uw = ObjC.unwrap,
    e = $(),
    source = kmvar.local_Source_text,
    maybeDetector = $.NSDataDetector
    .dataDetectorWithTypesError(
        $.NSTextCheckingTypeLink,
        e
    );

return maybeDetector.isNil()
    ? e.localizedDescription
    : JSON.stringify(
        uw(
            maybeDetector
            .matchesInStringOptionsRange(
                $(source), 0,
                $.NSMakeRange(0, source.length)
            )
        )
        .map(
            x => uw(x.URL.absoluteString)
        ),
        null, 2
    );

will produce the following from the text below:


Where the output text displayed is:

%Variable%local_LinkListJSON%

Number of links found: %JSONValue%local_LinkListJSON[0]%

     First link: %JSONValue%local_LinkListJSON[1]%
    Second link: %JSONValue%local_LinkListJSON[2]%
     Third link: %JSONValue%local_LinkListJSON[3]%

BACKGROUND

In this thread on obtaining the first URL in the notes attached to a Things3 Todo,

Things 3 Get URL of notes - Questions & Suggestions - Keyboard Maestro Discourse

it would have been helpful to have a subroutine which extracted a list of any links found in the note text.

Instead, we ended up using a useful but slightly baroque regular expression, contributed in a Stack Overflow discussion.

javascript - What is a good regular expression to match a URL? - Stack Overflow

A useful rule of thumb is that if your regular expression is more than about
10 characters long (perhaps something simple like \r\n|\n|\r) then
you are probably on the wrong track. Opinion varies on this, of course
see: Jeffrey Friedl's Blog » Source of the famous “Now you have two problems” quote

What we can safely say, however, is that regular expressions:

  1. Can't cope with recursive patterns (think of outlines, grammars), and
  2. are less readable than writable - worryingly time-consuming to debug and maintain.

On recent builds of macOS, the problem is largely solved for us by a very useful
Foundation library function (NSDataDetector dataDetectorWithTypes)

which has an NSTextCheckingTypeLink option, and does all the work for us –
building a list of the URLs in a given text.

We can wrap this in a Keyboard Maestro subroutine (as above, packaged with an example of its use) to make it easier to reach for.

4 Likes

What would be the equivalent subroutine to look for phone numbers or dates instead of URLs?

That's an amazing post, Rob.

Rob would know better, but I think you should pass the option:

NSTextCheckingTypePhoneNumber

to NSDataDetector dataDetectorWithTypes Foundation library function, and then access phoneNumber property:

const
    e = $(),
    detector = $.NSDataDetector.dataDetectorWithTypesError(
        $.NSTextCheckingTypePhoneNumber,
        e
    ),
    source = kmvar.local_Source_text;

return JSON.stringify(
    ObjC.unwrap(
        detector.matchesInStringOptionsRange(
            $(source), 0, $.NSMakeRange(0, source.length)
        )
    )
        .map(
            x => ObjC.unwrap(x.phoneNumber)
        ),
    null,
    2
);
1 Like

If you want to obtain dates from the input, you could pass the option

NSTextCheckingTypeDate

Accessing date property gives you an object of type NSDate. For basic debugging purposes, you could access description property to obtain a string representation of that object. Probably something like this:

const
    e = $(),
    detector = $.NSDataDetector.dataDetectorWithTypesError(
        $.NSTextCheckingTypeDate,
        e
    ),
    source = kmvar.local_Source_text;

return JSON.stringify(
    ObjC.unwrap(
        detector.matchesInStringOptionsRange(
            $(source), 0, $.NSMakeRange(0, source.length)
        )
    )
    .map(
        x => ObjC.unwrap(x.date.description)
    ),
    null,
    2
);

P.S.: About description property:

The representation is useful for debugging only.

There are a number of options to acquire a formatted string for a date including: date formatters (see NSDateFormatter and Data Formatting Guide), and the NSDate methods descriptionWithLocale:, dateWithCalendarFormat:timeZone:, and descriptionWithCalendarFormat:timeZone:locale:

Do note that

dateWithCalendarFormat:timeZone:
descriptionWithCalendarFormat:timeZone:locale:

are deprecated.

2 Likes

As @unlocked2412 points out, the trick is to experiment with the other NSTextCheckingType values, which are listed here:

NSTextCheckingType | Apple Developer Documentation

You can use them directly in JavaScript for Automation by prepending an ObjC bridge '$.' prefix, as in:

  • $.NSTextCheckingTypeLink
  • $.NSTextCheckingTypePhoneNumber
  • $.NSTextCheckingTypeDate
1 Like

The subroutine for dates works but I can't get the subroutine for phone numbers to return anything but nils.

I added 1 xxx-xxx-xxxx and (xxx) xxx-xxxx phone numbers (where x is a digit) to the source code, so there are a couple in there.

Please post your actual text sample. I will test.

This is what I was looking for! Thank you very much!! :grinning:

1 Like

Here's the set of three subroutines and a modified example text and macro. Just swap the name of the subroutine to test the other data types. Set to phone numbers now:

List of data found in a text Macros.kmmacros (17 KB)

The first problem that jumps to the eye in this rewrite is this:

source = `kmvar.local_Source_text`;

which is not binding the name source to the value of a keyboard maestro variable, but simply to the name of that variable.

Should be simply:

source = kmvar.local_Source_text;

or (redundantly, if we want the backticks)

source = `${kmvar.local_Source_text}`;
2 Likes

Exactly what Rob just said. That's the issue in your macro.

For reference, see:

I just pasted @unlocked2412's code into that function. I only rewrote the main macro to work with all three subroutines, adding some phone numbers and dates to the sample text.

But that was the issue. Removing the backticks solves the problem.

1 Like

Sorry about that. Not sure why those backticks appeared there. A typo, perhaps.

1 Like

It looks like we've fixed that puzzle, but more generally, for the purposes of experimentation with other NSTextCheckingType values, it may be helpful to trap any nil conditions which do arise (I personally haven't managed to provoke any yet), and read any localizedDescription string which the NSError object has to offer. For example:

Expand disclosure triangle to view JS source for a subroutine
const
    uw = ObjC.unwrap,
    e = $(),
    maybeDetector = $.NSDataDetector
    .dataDetectorWithTypesError(
        $.NSTextCheckingTypePhoneNumber,
        e
    );

return maybeDetector.isNil()
    ? uw(e.localizedDescription)
    : (() => {
        const source = kmvar.local_Source_text;

        return JSON.stringify(
            uw(
                maybeDetector
                .matchesInStringOptionsRange(
                    source, 0,
                    $.NSMakeRange(0, source.length)
                )
            )
            .map(x => uw(x.phoneNumber)),
            null, 2
        );
    })();
1 Like

No problem. I should have seen that but I was obsessively looking at the stringify function for clues.

1 Like