How to determine the position of a case switch?

ALYB · May 6, 2024, 10:37am

I want to determine the position where the case switches to uppercase in strings like:

HandmatigHandmatige stilstandsinvoer

(In this example it's the 'H' at position 10.)

What would be the most elegant (easiest ) way to achieve this?

ComplexPoint · May 6, 2024, 11:23am

Not sure about elegance, but here's one approach.

(If no upper case characters are found after the initial character, then the value -1 is returned)

1-based Index of first upper case character after start.kmmacros (2.2 KB)

ALYB · May 6, 2024, 11:41am

Thank you!

I guess that I should catch the variable via %systemclipboard%? Will test this tomorrow. Tired now.

Nige_S · May 6, 2024, 12:18pm

Questions: Does the string always start with an uppercase letter? Could you have, for example, aString goes here?

Assuming "yes" and "no" respectively, and that the example given is a good representation of input, a KM-native solution could be:

First Change to Uppercase.kmmacros (3.0 KB)

Unlike @ComplexPoint's, this will return 0 if no case-change is found.

Using those character classes should account for accented characters. The only other trick is turning off abort/notfiy when the regex action fails, allowing the macro to continue with the Local_foundString variable empty, an easy way of getting 0 when there's no case-change.

ComplexPoint · May 6, 2024, 12:35pm

Yes, you could bind the name local_Source there to the value of Keyboard Maestro's %SystemClipboard% token.

Incidentally, if you need to detect capitals which may be inflected with diacritics, rather than just the A-Z anglo set, then it would be better to specify uppers in terms of the unicode character class, so:

const 
    i = [...kmvar.local_Source]
    .slice(1)
    .findIndex(c => (/\p{Lu}/u).test(c));

return -1 !== i
    ? i + 2
    : i;

and even if you want to ignore cases where an upper-case character is preceded by a space (only looking for direction transitions from lower to upper), then you can still apply .findIndex over zipped pairs.

Expand disclosure triangle to view JS source

// isLower :: Char -> Bool
const isLower = c =>
    // True if c is a lower case character.
    (/\p{Ll}/u).test(c);

// isUpper :: Char -> Bool
const isUpper = c =>
    // True if c is an upper case character.
    (/\p{Lu}/u).test(c);

// zip :: [a] -> [b] -> [(a, b)]
const zip = xs =>
    // The paired members of xs and ys, up to
    // the length of the shorter of the two lists.
    ys => Array.from({
        length: Math.min(xs.length, ys.length)
    }, (_, i) => [xs[i], ys[i]]);

const
    cs = kmvar.local_Source,
    pairs = zip(cs)(cs.slice(1)),
    i = pairs.findIndex(
        ([a, b]) => isLower(a) && isUpper(b)
    );

return -1 !== i
    ? i + 2
    : i;

Nige_S · May 6, 2024, 2:34pm

Thinking further, this may be a better regex pattern (depending on your actual requirements, obviously):

^([^[:lower:]]+[^[:upper:]]+[:upper:]|[^[:upper:]]+[:upper:])

That'll cope with numbers at the start, punctuation in the string, etc.

HandmatigHandmatige
--> 10
Handmatig, Handmatige
--> 12
123, HandmatigHandmatige
--> 15
abc HandmatigHandmatige
--> 5

Airy · May 6, 2024, 4:02pm

The gist of my alternative solution is:

Use the command "cut -2-" to remove the first character from your input line.
Use the command "grep -aob '[A-Z]'" which extracts the numerical position of the first uppercase character.
Extract the number. The number isn't off by 1, despite the cut statement above, because UNIX counts starting with 0, and I suspect you prefer to count starting at 1.

I'm assuming that your data is limited to a single line of text, and that there can be zero or one (not more) uppercase letters at the beginning. If this isn't true, my solution will have problems.

_jims · May 6, 2024, 5:45pm

Hi, @ALYB. I know that you didn't ask for any case change, but maybe someone in the future that finds this thread might have that requirement. I'm sure that would be any easy change for @ComplexPoint, but here's another method using Python.

#!/bin/bash

find_case_change() {
    python3 -c "
s = '$1'
s = s[1:]
prev_char = s[0]
for i in range(1, len(s)):
    if s[i].isalpha():
        if (prev_char.islower() and s[i].isupper()) or (prev_char.isupper() and s[i].islower()):
            print(i + 2)
            break
        prev_char = s[i]
"
}

find_case_change 'AbcdEfg'
find_case_change 'Abcd1EFG'
find_case_change 'Abcd1$Efg'
find_case_change 'Abcd   Efg'
find_case_change 'ABcd   EFg'
find_case_change 'A bcd   EfG'
find_case_change 'ABCDefgfg'
find_case_change 'ABCD  efg'
find_case_change 'AbCD  efg'

Edit: 2024-05-07 09:1235 EDT: enumerate makes the above a bit cleaner...

find_case_change() {
    python3 -c "
str = '$1'
str = str[1:]
prev_char = ''
for i, char in enumerate(str):
    if char.isalpha():
        if prev_char and (prev_char.islower() and char.isupper()) or (prev_char.isupper() and char.islower()):
            print(i + 2)
            break
        prev_char = char
"
}

ComplexPoint · May 7, 2024, 7:50am

( In Python terms you could also, of course, reach for itertools.groupby, grouping on case, and counting the length of groups)

Perhaps starting with something like:

from itertools import groupby

def caseGroups(s):
    return [
       (k, "".join(list(m)))
       for k, m in groupby(
            s, key=lambda c: c.isupper()
       )
     ]


print(
    caseGroups("HandmatigHandmatige stilstandsinvoer")
)

→

[(True, 'H'), (False, 'andmatig'), (True, 'H'), (False, 'andmatige stilstandsinvoer')]

How to determine the position of a case switch?

Options