Extract number Macro (v11.0.2)

I am trying to extract numbers from some text and put each extracted number on a new line.

This is my text that is copied to clipboard:

Several species are present in reasonably high densities including linnet (12), skylark (14), whitethroat (16) and yellowhammer (17). The densities of skylark are a result of the amount of arable and pasture the fields in the site. The densities of the other three species are a result of the hedgerows that bound almost the entirety of each field.

Output should be:

12
14
16
17

How can I do this in Keyboard Maestro

extract number Macro (v11.0.2)

extract number.kmmacros (1.6 KB)

There's different ways to do this (most of them utilising regex), and if you change your macros search term to \D+, and replace with \n you should have a function version of your macro. Often times with a leading and trailing newline though, if that is a problem you could use the Filter action to Trim Whitespaces

1 Like

That's great thanks and Filter action is new to me and looks very useful.

Another query please. Is regular expression that can extract numbers with a decimal point, for example 12.1?

1 Like

I may have another solution. With no regex. After I get back from vacation.

1 Like

One approach:

Each Number in Text on Separate Line.kmmacros (3.7 KB)


Or another, in terms of .filter:

Expand disclosure triangle to view JS source
return kmvar.local_Source
    .split(/\W+/gu)
    .filter(k => !isNaN(k))
    .join("\n");
1 Like

The positive search would be something like \d+(\.\d+)?, and I am actually not sure how I could negate it, so here's an approach using this here pattern:

Extract numbers.kmmacros (19 KB)

(By the way, if someone knows how one would go about negating a regex pattern like the the above I'd be all ears — it'd be interesting to know)

1 Like

Probably the easiest -- and the most readable! -- way of doing this is via a "For Each..." action working on a Collection of substrings. Here I'm assuming that, as in your text, the numbers are always inside ( ) and can include decimals (although that isn't strict -- DUCY?):

Extract Numbers.kmmacros (4.5 KB)

Image

Because it is easier to read it's also easier to adjust to your precise requirements -- but do shout if anything isn't clear.

2 Likes

Negate how? Numbers that aren't decimals -- just miss out the decimal point as you did before. If you want only decimals, insist on a single decimal point followed by at least 1 digit: \d*\.\d+

I am probably not using the right term. With 'negation' I meant a patern that captures the opposite of \d+(\.\d+)? — as in matching all character that isn’t a consecutive string of number, possibly followed by a period and a another string of consecutive numbers. After searching a bit around I am beginning to think that this might not be possible with Regex alone.
But there are good workarounds, and I think doing a positive search iterating through and capturing all matches, as you also did, makes the more sense here. So my question was mostly in the hopes of finding means to relieve my itching curiosity.

(still itching though)

But I'm not sure what you mean by the opposite of that -- particularly the optional part. As always with regex, some examples of matching and non-matching strings would help.

I think it is probably not possible, but the pattern I am thinking of is the pattern that could be used to return exactly the same as my uploaded macro above, only following OP’s original approach with a simple search and replace, searching for for anything that isn’t a (decimal) number, replacing with \n, instead of the For each iteration.

Example input:

1Several 2.2 species123.456are present in reasonably high densities including linnet (12), skylark (14), whitethroat (16) and yellowhammer (17). The densities of skylark are a result of the amount of arable and pasture the fields in the site. The densities of the other three species are a result of the hedgerows that bound almost the entirety of each field789.

Output should be:

1
2.2
123.456
12
14
16
17
789

In kind of an extreme simplification I was thinking of a pattern that would 'negate' \d+(\.\d+)? in the same way that \D 'negates' \d. All though I of course see that it is not the same negating a composed pattern as it is negating a simple character class.

I think you'll need to look at lookaheads -- my brain starts to melt at that point. And they'd be a lot less efficient than your "For Each" (or that may just be my excuse for not trying harder :wink: ).

What you could do is split it into two ops

  1. Search [^\d\.]+ and replace \n to get rid of all but digits and points
  2. Search (?m)(^\.\n|\.$) and replace with nothing to get rid of the old full stops, now on lines by themselves, and any period at the end of a line.
2 Likes

I have, and mine too …
So I think I’ll let my quest rest, and I am very happy doing my searches positively with the For each!

I'm back from vacation and I wanted to offer an alternate solution.

image

That produces the requested output. Then you subsequently added a new request for including decimals as part of the numbers. My solution ALMOST solves that if you include a decimal after the 9 in my solution (see below.)

The problem with decimals is that now you have to start worrying about context. For example, using the alternate sample text, my code with a decimal will produce:

image

1
2.2
123.456
12
14
16
17
.
.
789.

You can see a couple of problems in that output. First, there are a couple of lines containing just "." which are the periods in some sentences. (That's easy to fix.) The second problem we see is that the final line is "789." which comes with a period. Since both "789" and "789." are both potential valid results, it's unclear why the "789" in the example provided above should NOT include the period. Is the code supposed to understand the difference between a period used in an English sentence and a period that is not part of English punctuation? "Understanding English" is not something any script can do, even with regex and lookahead.

1 Like

I'm having troubles to replicate your macro:
1
Extract Numbers 2.kmmacros (2.6 KB)
Can you see my error?

Yes, I can see your error. You are using a capital letter "I" (the letter between H and J) instead of a vertical bar, which we call the pipe character (which is usually above the RETURN key.)

What the script (and macro) does is replace any character that is not a numeric character with a newline, then removes all empty lines using the grep command. You need the pipe character to make it work.

Thank you. I had to ocr your macro :slight_smile:

I should take this as a lesson to always upload the code. Not everyone is familiar with shell commands.

1 Like