Find all words with special characters [SOLVED]

alltiagocom · April 6, 2024, 2:13pm

It doesn't seem to work and it seems to break the sorting completely

It's not a big deal. I can always add the Filter

alltiagocom · April 6, 2024, 2:28pm

Thank you both @Nige_S and @ComplexPoint for your precious contribution
You saved me so much work and improved my workflow a lot

ComplexPoint · April 6, 2024, 3:02pm

I've made one more update in the original post above.

Portuguese sort order should, I think, be better now.

const collator = new Intl.Collator("pt");

const ptComparison =  collator.compare;

Nige_S · April 6, 2024, 3:07pm

sort is logical in that it uses the underlying numerical code of the characters for the sort order -- some punctuation marks, then lower case a-z, then upper case A-Z, then similarly with accented and other characters. The Finder does "clever things" to make the sorting case insensitive. If KM's sort filter does that as well -- excellent!

The thing with uniq is that works by comparing each line with the next -- that's why you have to sort your input before using it. But a quick test suggests you can use the KM "Filter" action to do a case- and accent-insensitive sort and then use uniq -i which is smart enough to consider upper- and lower-cased accented characters to be the same.

But it's random as to whether it keeps the upper- or lower-case version of a line. Since you only want lower-case I suggest you put the text through another filter before sorting:

...giving us:

List Accented Words v2.kmmacros (4.9 KB)

Image

ComplexPoint · April 6, 2024, 3:34pm

@alltiagocom

Subsequently delegated segmentation into word tokens (in another update above) to Intl.Segmenter.

segmenter = new Intl.Segmenter(
    "pt", {granularity: "word"}
);

Find all words with special characters [SOLVED]

Options