Replace accented and special characters from a string


#1

Hello,

I'm looking for the best (and simple) way (Shell, Javascript, AppleScript, RegEx…) to convert and replace accented and special characters (diacritics) from a string (Clipboard, Variable,…)

Examples:

  • Replace ā, á, ǎ, and à with a.
  • Replace ē, é, ě, and è with e.
  • Replace ī, í, ǐ, and ì with i.
  • Replace ō, ó, ǒ, and ò with o.
  • Replace ū, ú, ǔ, and ù with u.
    Etc.

For now, I'm using a sequence of "Search and Replace" actions with RegEx but I would like to simplify the process.


#2

The structure of the problem is, of course, given by the fact that there is no formally definable relationship between the incoming characters with diacritics and the outgoing similar characters without diacritics.

That points to two quite large sets of regexes – one for the lower-case targets and and for the upper.

A lot of work ... and someone is bound to have done it before. On macOS you should, I think, find that you have command line access to iconv which is a bundling of such conversions.

In terminal.app or iTerm.app etc, you could start by entering:

man iconv

See, for example, under the //TRANSLIT option here:


#3

Objective-C has a couple of useful functions for this, so I wrote a quick JavaScript (JXA) script to utilise them.

Remove Diacritics.kmmacros (29.7 KB)

JavaScript
ObjC.import('Foundation');

var kme = Application('Keyboard Maestro Engine');


(()=>{
	// NSStringTransform constants
	const StripDiacritics     = $.NSStringTransformStripDiacritics;
	const StripCombiningMarks = $.NSStringTransformStripCombiningMarks;
	
	// User input via Keyboard Maestro
	const Input   = kme.getvariable('Input');
	const nsInput = $.NSString.stringWithString(Input);
	
	const nsOutput = nsInput
	                .stringByApplyingTransformReverse(
                        	StripDiacritics,
                        	false
		        	);
			
	return ObjC.unwrap(nsOutput);
})();

Output:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAȺ
EEEEEEEEEEEEEEEEEEEEEEEEEƎɆ
OOOOOOOOOOOOOOOOƟØOOOOOOOOOOOOOOOOOØ
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUɄ
aaaaaaaaaaaaaaaaaʾaaaaaaaaaaaaaⱥ
eeeeeeeeeeeeeeeeeeeeeeeeeɇ
ooooooooooooooooɵøooooooooooooooooooø
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuʉ

#4

Hey Cary,

JavaScript has a nice function for this (I believe ES6 is required though).

-Chris


Remove Accents from Unicode Text v1.00.kmmacros (5.1 KB)


KM to 'type' in the Prompt With List field
#5

Thanks to everybody!

The three solutions seem to work perfectly. But, sorry to @ComplexPoint and @CJK, I'll probably use the one from @ccstone. This JavaScript function is perfect to me. Short and understandable! I will bookmark the two others though.


#6

Me too :slight_smile:


#7

Hey Folks,

I dug my solution up on StackOverflow several months ago.

(I had to dig a bit to find the page again.)

There's a long discussion if anyone's interested, but I think the normalize('NFD') method in post #4 above is the neatest and the cleanest.

-Chris


#8

It is a good solution. Rembember to mark the solution, so it is easy for others to find when searching the forum.


#9

Done.


#10

I would have used the 'Translate Text to HTML' functionality in BBEdit via the Text Factory.

TextFactory

Wrote a quick blog about using TextFactory in Keyboard Maestro:

http://www.cryan.com/daily/20180626.jsp