Recursive `offset` handler (AppleScript)

This is a pure AppleScript question, although it stemmed from a Keyboard Maestro forum post I was working on.

I recall one or two members here are particularly apt at formulating and/or debugging recursive functions. If anyone can tag an appropriate person for their attention, that would be helpful.

Objective: AppleScript's offset command returns the index (character position) of the first occurrence (left-to-right) of a substring within a string (or 0 if not found). I'm redefining it to perform this function recursively, so that it returns a list of indices of all occurrences of a substring within a string.

The syntax for offset is:

offset of %substring% in %string%

This is a specific objective, outwith any other methods of achieving the same action performed on a string (I'm sure most or all of us here can throw together a simple handler that indexes substrings):

`IndexOf` AppleScript handlers (iterative & recursive)
script iterative
	on indexOf(x, L)
		local x, L
		
		if L = {} or (x is not in L) then return 0
		if L's class = text then set L to L's characters
		
		script
			property array : L
		end script
		
		tell the result's array
			repeat with i from 1 to its length
				if x ≠ its item i then
					set its item i to null
				else
					set its item i to i
				end if
			end repeat
			
			its numbers
		end tell
	end indexOf
end script


script recursive
	on indexOf(x, L)
		local x, L
		
		if L = {} or (x is not in L) then return {}
		if L's class = text then set L to L's characters
		
		script
			property array : L
		end script
		
		tell the result's array
			set [x0, xN] to its [last item, ¬
				reverse of rest of reverse]
			
			if x = x0 then return my indexOf(x, xN) ¬
				& (its length)
		end tell
		
		indexOf(x, xN)
	end indexOf
end script

iterative's indexOf(8, {1, 2, 4, 8, 2, 4, 6, 8, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1})
--> {4, 8, 11}
recursive's indexOf(8, {1, 2, 4, 8, 2, 4, 6, 8, 10, 9, 8, 7, 6, 5, 4, 3, 2, 1})
--> {4, 8, 11}

iterative's indexOf("e", "Keyboard Maestro") --> {2, 12}
recursive's indexOf("e", "Keyboard Maestro") --> {2, 12}

Why? I did have a reason and then forgot what that reason was, and merely became fixated on the objective. It's now more about solving the problem than producing anything intrinsically of value.


Initial iterative implementation

After some failed attempts to construct a recursive routine, I implemented an iterative one to visualise the function's actions:

on offset of t in str
	local t, str
	
	set [i, j] to [1, 0]
	set array to {j}
	
	repeat
		set i to current application's (offset of t ¬
			in (text (j + 1) thru -1 of str))
		
		if i = 0 then exit repeat
		
		set j to (the array's last item) + i
		set the end of the array to j
	end repeat
	
	rest of the array
end offset


offset of "*" in "12*45678*0*BC*EF"
--> {3, 9, 11, 14}

Failed recursive attempt (final of many)

on offset of t in str
    local t, str

	set i to (current application's (offset of t in str))
	if i = 0 then return {str's length}
	set j to offset of t in (text (i + 1) thru -1 of str)
	
	j & (i + (j's item -1))
end offset


offset of "*" in "12*45678*0*BC*EF"
--> {2, 5, 7, 13, 16}

I don't like admitting defeat with things like this, but I am.

1 Like

Hey @CJK,

Recursion gives me headaches, so you're not alone...  :sunglasses:

-Chris

The basic pattern would be:

on run
    set strText to "The implications of imperious imperialism include an impious imperviousness and general impermeability"
    
    offset of "imp" in strText --> 5
    
    offsets("imp", strText) --> {5, 16, 10, 23, 8, 27}
end run


-- offsets :: String -> String -> [Int]
on offsets(needle, haystack)
    set i to offset of needle in haystack
    if 0 ≠ i then
        if i < length of haystack then
            {i} & offsets(needle, text (1 + i) thru -1 of haystack)
        else
            {i}
        end if
    else
        {}
    end if
end offsets

Tho it's traditional to use an internal helper function to:

  1. Reduce the number of arguments needed in the recursion
  2. Limit the search for the recursive function name to the local name space

i.e. to apply a 'worker-wrapper' transform to the version above.

The worker-wrapper version would look something like this:

-- offsets :: String -> String -> [Int]
on offsets(needle, haystack)
    script go
        on |λ|(hay)
            set i to offset of needle in hay
            if 0 ≠ i then
                if i < length of hay then
                    {i} & |λ|(text (1 + i) thru -1 of hay)
                else
                    {i}
                end if
            else
                {}
            end if
        end |λ|
    end script
    
    go's |λ|(haystack)
end offsets
2 Likes

PS a recursive version of a function is essentially just a formal definition of the output value.

The offsets of a needle in a haystack are a list of integers containing :

  1. (the absolute case) the first (if any) offset of the needle in the haystack, followed by:
  2. (the relative case) the offsets of the needle in the remaining hay.

Thank you for answering this post, @ComplexPoint. Your particular brand of insight is always richly enlightening.

I did, at one point, have a recursive handler that returned the results you're seeing from yours (in a much less graceful manner, and at least 40% by serendipity). However, is there a way to obtain the absolute values of each offset rather than the values relative to the one preceding each ?

Taking my example from the original post:

offset of "*" in "12*45678*0*BC*EF"

should yield:

{3, 9, 11, 14}

rather than:

{3, 6, 2, 3}

That is, each value reports a location in the haystack at which one can find a needle, independent of any of the other values surrounding it.

Good catch – Thanks ! Perhaps in too much of a hurry to reduce arguments : - )
We don't need the needle argument (we get it from the closure), but we do, as you point out, need the new start index.

So something like:

-- offsets :: String -> String -> [Int]
on offsets(needle, haystack)
    script go
        on |λ|(hay, iFrom)
            set i to offset of needle in hay
            if 0 ≠ i then
                set iMatch to iFrom + i
                if i < length of hay then
                    {iMatch} & |λ|(text (1 + i) thru -1 of hay, iMatch)
                else
                    {iMatch}
                end if
            else
                {}
            end if
        end |λ|
    end script
    
    go's |λ|(haystack, 0)
end offsets
1 Like

And, of course, that illustrates a further value of the worker-wrapper pattern – quite apart from any issues of efficiency (scope of name search, size of frame pushed onto the stack) it also allows for arguments which are needed for the recursion, but would be redundant or clumsy in the inputs to the wrapper function.

Thank you for this. All of this is invaluable, especially coming from an objective that I initially said had no intrinsic value beyond simply solving the problem. But seeing is doing is learning, and that's the point, I feel.

It should have clicked that I needed a tracker variable to accumulate the relative offsets to an absolute sum.

1 Like

In contexts where it might be useful to have either the relative or absolute offsets, I guess it might also be worth considering:

  1. just getting the relatives from a simpler recursion, and then
  2. deriving the absolutes from the relatives

e.g. something like:

on run
    set strText to "The implications of imperious imperialism include an impious imperviousness and general impermeability"
    
    offset of "imp" in strText --> 5
    
    offsets("imp", strText) --> {5, 21, 31, 54, 62, 89}
end run


-- offsets :: String -> String -> [Int]
on offsets(needle, haystack)
    script go
        on |λ|(hay)
            set i to offset of needle in hay
            if 0 ≠ i then
                
                if i < length of hay then
                    {i} & |λ|(text (1 + i) thru -1 of hay)
                else
                    {i}
                end if
            else
                {}
            end if
        end |λ|
    end script
    
    -- Relative increments,
    set xs to go's |λ|(haystack)
    
    -- as absolute offsets.
    if 0 < length of xs then
        scanl1(my add, xs)
    else
        xs
    end if
end offsets


-- add :: Int -> Int -> Int
on add(a, b)
    a + b
end add


-- Lift 2nd class handler function into 1st class script wrapper 
-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
    if class of f is script then
        f
    else
        script
            property |λ| : f
        end script
    end if
end mReturn

-- scanl1 :: (a -> a -> a) -> [a] -> [a]
on scanl1(f, xs)
    if length of xs > 0 then
        scanl(f, item 1 of xs, tail(xs))
    else
        {}
    end if
end scanl1

-- scanl :: (b -> a -> b) -> b -> [a] -> [b]
on scanl(f, startValue, xs)
    tell mReturn(f)
        set v to startValue
        set lng to length of xs
        set lst to {startValue}
        repeat with i from 1 to lng
            set v to |λ|(v, item i of xs, i, xs)
            set end of lst to v
        end repeat
        return lst
    end tell
end scanl

-- tail :: [a] -> [a]
on tail(xs)
    if xs = {} then
        missing value
    else
        rest of xs
    end if
end tail

I did consider this, but I feel it's less graceful. I like code to be beautiful, in so much as it can be depending on the task. And, often in beauty, there's efficiency. But I suppose beauty is a subjective measure, whilst efficiency is quantifiable. I recall reading a post on the Late Nights forum where either Mark or Shane had commented that your methods seem unnecessarily complex; reading between the words, I can assume he also didn't find that code very beautiful, though I would argue to the contrary. But you very rightly highlighted that complexity can be confused with lack of familiarity; and he did admit that your code was possibly a little faster.

I'm sorry, I've lost my train of thought and my point, beyond how I like beauty and efficiency. I'm actually going to look over this handler some more, because I wonder if there's actually a way to eliminate the need for the tracker variable too. That's not a request for you to ponder this too; it'll be today's obsession for me.

I look forward to hearing the result of your experiment : -)

My guess is that the newOffset argument may turn out to be hard to drop – its not so much a variable (the same reference, successively pointing to different/mutating values) as a fresh name bound in the new local context of each nested recursion. None of the inner iFrom ⇄ Integer bindings is aware of or affected by the outer ones. There's a whole spiral of nested iFroms as we circle down towards the absolute case where the last small string contains no match.

If you're right, it could probably be proven mathematically that such a solution like the one I am pondering does not exist, and your brief explanation hints at this. In my indexOf handlers posted in my original post, the recursive one, at first, appeared to need a tracking parameter. But recursing through the list in reverse negated the need for one, as the sub-list's length property was ideal for that purpose instead.

1 Like

Flexing this kind of thing in a different way, here is one approach to getting indices fairly flexibly from Strings or Lists, by supplying a predicate function that returns a Bool either from an individual item / Char, or from a String or sequence of items.

-- https://github.com/RobTrew/prelude-applescript


-- findIndices :: (a -> Bool) -> [a] -> [Int]
-- findIndices :: (String -> Bool) -> String -> [Int]
on findIndices(p, xs)
    script
        property f : mReturn(p)
        on |λ|(x, i, xs)
            if f's |λ|(x, i, xs) then
                {i}
            else
                {}
            end if
        end |λ|
    end script
    concatMap(result, xs)
end findIndices

on run
    
    findIndices(even, {7, 8, 9, 10, 11, 12})
    --> positions {2, 4, 6}
    
    findIndices(matching([2, 3]), {1, 2, 3, 4, 1, 2, 3, 4})
    --> positions {2, 6}
    
    findIndices(matching("en"), "patience and a sense of humour")
    --> positions {5, 17}
    
end run

-- concatMap :: (a -> [b]) -> [a] -> [b]
on concatMap(f, xs)
    set lng to length of xs
    set acc to {}
    tell mReturn(f)
        repeat with i from 1 to lng
            set acc to acc & |λ|(item i of xs, i, xs)
        end repeat
    end tell
    return acc
end concatMap

-- even :: Int -> Bool
on even(x)
    x mod 2 = 0
end even

-- Returns a sequence-matching function for findIndices etc
-- matching :: [a] -> (a -> Int -> [a] -> Bool)
-- matching :: String -> (Char -> Int -> String -> Bool)
on matching(pat)
    if class of pat is text then
        set xs to characters of pat
    else
        set xs to pat
    end if
    set lng to length of xs
    set bln to 0 < lng
    if bln then
        set h to item 1 of xs
    else
        set h to missing value
    end if
    script
        on |λ|(x, i, src)
            (h = x) and xs = ¬
                (items i thru min(length of src, -1 + lng + i) of src)
        end |λ|
    end script
end matching

-- min :: Ord a => a -> a -> a
on min(x, y)
    if y < x then
        y
    else
        x
    end if
end min

-- Lift 2nd class handler function into 1st class script wrapper 
-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
    if class of f is script then
        f
    else
        script
            property |λ| : f
        end script
    end if
end mReturn
1 Like

Or for an Execute a Javascript for Automation action:

(See parallel AS and JS versions)

(() => {
    'use strict';
    const main = () => {

        findIndices(even, [7, 8, 9, 10, 11, 12]);
        //-> positions [1, 2 ,3]

        return findIndices(
            matching('en'),
            "needs patience and a sense of humour"
        );
        //-> positions [10, 22]
    };

    // GENERIC FUNCTIONS ----------------------------

    // https://github.com/RobTrew/prelude-jxa

    // concatMap :: (a -> [b]) -> [a] -> [b]
    const concatMap = (f, xs) =>
        0 < xs.length ? (
            [].concat.apply([], (
                'string' !== typeof xs ? (
                    xs
                ) : xs.split('')
            ).map(f))
        ) : [];

    // eq (==) :: Eq a => a -> a -> Bool
    const eq = (a, b) => {
        const t = typeof a;
        return t !== typeof b ? (
            false
        ) : 'object' !== t ? (
            'function' !== t ? (
                a === b
            ) : a.toString() === b.toString()
        ) : (() => {
            const aks = Object.keys(a);
            return aks.length !== Object.keys(b).length ? (
                false
            ) : aks.every(k => eq(a[k], b[k]));
        })();
    };

    // even :: Int -> Bool
    const even = n => 0 === n % 2;

    // findIndices(matching([2, 3]), [1, 2, 3, 1, 2, 3])
    //-> {2, 5}

    // findIndices :: (a -> Bool) -> [a] -> [Int]
    // findIndices :: (String -> Bool) -> String -> [Int]
    const findIndices = (p, xs) =>
        concatMap((x, i) => p(x, i, xs) ? (
            [i]
        ) : [], xs);

    // Returns a sequence-matching function for findIndices etc
    // findIndices(matching([2, 3]), [1, 2, 3, 1, 2, 3])
    // -> [1, 4]

    // matching :: [a] -> (a -> Int -> [a] -> Bool)
    const matching = pat => {
        const
            lng = pat.length,
            bln = 0 < lng,
            h = bln ? pat[0] : undefined;
        return (x, i, src) =>
            bln && h == x &&
            eq(pat, src.slice(i, lng + i));
    };

    // showJSON :: a -> String
    const showJSON = x => JSON.stringify(x, null, 2);

    // MAIN ---
    return main();
})();

The aesthetics are very interesting and expressive – in addition to Knuth's formulation that premature optimization is the root of all evil (or at least most of it) in programming, there is also his take that things will go better if you try to produce something beautiful:

programming is an art, because it applies accumulated knowledge to the world, because it requires skill and ingenuity, and especially because it produces objects of beauty. A programmer who subconsciously views himself as an artist will enjoy what he does and will do it better. Therefore we can be glad that people who lecture at computer conferences speak of the state of the Art.

On recursion, there is still some mystery about exactly why it can be hard to teach well, and, at least initially, hard to digest.

One practical solution is to use 'recursion schemes' like folds, unfolds, scans which encode standard recursion patterns in ways that are easily implemented with linear code.

Deriving a compound value (a list of integers here) from a single value (a string, in this case), could be seen as an 'anamorphism' or 'unfold' (a building up of structure, whereas a fold is a recursion which collapses structure)

No matter which scripting language you are using, ff you have an unfoldr and a scanl (or scanr) (left or right) in your snippet list, you can assemble the equivalent of an 'anamorphic' recursion using a pattern along these lines:


-- offsetDeltas :: String -> String -> [Int]
on offsetDeltas(needle, hay)
    script
        on |λ|(hay)
            set x to offset of needle in hay
            if 0 < x then
                Just(Tuple(x, text (1 + x) thru -1 of hay))
            else
                Nothing()
            end if
        end |λ|
    end script
    
    unfoldr(result, hay)
end offsetDeltas

-- offsets :: String -> String -> [Int]
on offsets(needle, hay)
    tail(scanl(my add, 0, offsetDeltas(needle, hay)))
end offsets


on run
    
    offsets("pol", "subpolar polygonal acropolis")
    
    --> {4, 10, 24}
end run


-- GENERIC FUNCTIONS -------------------------------------------

-- https://github.com/RobTrew/prelude-applescript

-- > unfoldr (\b -> if b == 0 then Nothing else Just (b, b-1)) 10
-- > [10,9,8,7,6,5,4,3,2,1] 

-- unfoldr :: (b -> Maybe (a, b)) -> b -> [a]
on unfoldr(f, v)
    set xr to Tuple(v, v) -- (value, remainder)
    set xs to {}
    tell mReturn(f)
        repeat -- Function applied to remainder.
            set mb to |λ|(|2| of xr)
            if Nothing of mb then
                exit repeat
            else -- New (value, remainder) tuple,
                set xr to Just of mb
                -- and value appended to output list.
                set end of xs to |1| of xr
            end if
        end repeat
    end tell
    return xs
end unfoldr

-- scanl :: (b -> a -> b) -> b -> [a] -> [b]
on scanl(f, startValue, xs)
    tell mReturn(f)
        set v to startValue
        set lng to length of xs
        set lst to {startValue}
        repeat with i from 1 to lng
            set v to |λ|(v, item i of xs, i, xs)
            set end of lst to v
        end repeat
        return lst
    end tell
end scanl

-- add :: Int -> Int
on add(a, b)
    a + b
end add

-- Just :: a -> Just a
on Just(x)
    {type:"Maybe", Nothing:false, Just:x}
end Just

-- Nothing :: () -> Nothing
on Nothing()
    {type:"Maybe", Nothing:true}
end Nothing

-- Tuple (,) :: a -> b -> (a, b)
on Tuple(a, b)
    {type:"Tuple", |1|:a, |2|:b, length:2}
end Tuple

-- Lift 2nd class handler function into 1st class script wrapper 
-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
    if class of f is script then
        f
    else
        script
            property |λ| : f
        end script
    end if
end mReturn

-- tail :: [a] -> [a]
on tail(xs)
    if xs = {} then
        missing value
    else
        rest of xs
    end if
end tail

After a bit of thinking and playing around:

on offset of t in str
	local t, str
	
	if t is not in str then return {}
	
	script
		property s : str's id as list
	end script
	
	set rev to character id (reverse of result's s)
	set i to 1 + (str's length) - (current application's ¬
		(offset of t in rev))
	
	my (offset of t in str's text 1 thru (i - 1)) & i
end offset


offset of "*" in "12*45678*0*BC*EF*"
--> {3, 9, 11, 14, 17}
1 Like

Nice solution – reverse is a fairly expensive operation to repeat with every recursion, but it gives a very pleasing declarative definition for the single character case there.

Is it also intended to work with multi-character patterns like:

offset of "pen" in "try to pencil in some penguins"

?

True, which is partly why I put str's id inside a script object. I don't think recursion is by any means the ideal method for this sort of thing either, but it became more about solving a problem than ending up with something I need to use.

I might do a couple of timed tests using as much text as the stack can handle, and compare it with my current iterative handler to evaluate.

1 Like

I guess not. I'm fine with that, though. I think I'm done.

Actually, as soon as I replied to say I was done, I had an immediate realisation as to why it didn't want to work with multi-character patterns. The solution was fairly simple:

on offset of t in str
	local t, str
	
	if t is not in str then return {}
	
	script
		property s : str's id as list
		property r : t's id as list
	end script
	
	tell the result to set [revS, revT] to [¬
		character id (reverse of its s), ¬
		character id (reverse of its r)]
	
	set i to 1 + (str's length) - (current application's ¬
		(offset of revT in revS))
	
	my (offset of t in str's text 1 thru (i - 1)) & (i - (t's length) + 1)
end offset

offset of "pen" in "try to pencil in some penguins"
--> {8, 23}
1 Like