Replace accented and special characters from a string

Hello,

I'm looking for the best (and simple) way (Shell, Javascript, AppleScript, RegEx…) to convert and replace accented and special characters (diacritics) from a string (Clipboard, Variable,…)

Examples:

  • Replace ā, á, ǎ, and à with a.
  • Replace ē, é, ě, and è with e.
  • Replace ī, í, ǐ, and ì with i.
  • Replace ō, ó, ǒ, and ò with o.
  • Replace ū, ú, ǔ, and ù with u.
    Etc.

For now, I'm using a sequence of "Search and Replace" actions with RegEx but I would like to simplify the process.

1 Like

The structure of the problem is, of course, given by the fact that there is no formally definable relationship between the incoming characters with diacritics and the outgoing similar characters without diacritics.

That points to two quite large sets of regexes – one for the lower-case targets and and for the upper.

A lot of work ... and someone is bound to have done it before. On macOS you should, I think, find that you have command line access to iconv which is a bundling of such conversions.

In terminal.app or iTerm.app etc, you could start by entering:

man iconv

See, for example, under the //TRANSLIT option here:

1 Like

Objective-C has a couple of useful functions for this, so I wrote a quick JavaScript (JXA) script to utilise them.

Remove Diacritics.kmmacros (29.7 KB)

JavaScript
ObjC.import('Foundation');

var kme = Application('Keyboard Maestro Engine');


(()=>{
	// NSStringTransform constants
	const StripDiacritics     = $.NSStringTransformStripDiacritics;
	const StripCombiningMarks = $.NSStringTransformStripCombiningMarks;
	
	// User input via Keyboard Maestro
	const Input   = kme.getvariable('Input');
	const nsInput = $.NSString.stringWithString(Input);
	
	const nsOutput = nsInput
	                .stringByApplyingTransformReverse(
                        	StripDiacritics,
                        	false
		        	);
			
	return ObjC.unwrap(nsOutput);
})();

Output:

AAAAAAAAAAAAAAAAAAAAAAAAAAAAAȺ
EEEEEEEEEEEEEEEEEEEEEEEEEƎɆ
OOOOOOOOOOOOOOOOƟØOOOOOOOOOOOOOOOOOØ
UUUUUUUUUUUUUUUUUUUUUUUUUUUUUUɄ
aaaaaaaaaaaaaaaaaʾaaaaaaaaaaaaaⱥ
eeeeeeeeeeeeeeeeeeeeeeeeeɇ
ooooooooooooooooɵøooooooooooooooooooø
uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuʉ
3 Likes

Hey Cary,

JavaScript has a nice function for this (I believe ES6 is required though).

-Chris


Remove Accents from Unicode Text v1.00.kmmacros (5.1 KB)

3 Likes

Thanks to everybody!

The three solutions seem to work perfectly. But, sorry to @ComplexPoint and @CJK, I'll probably use the one from @ccstone. This JavaScript function is perfect to me. Short and understandable! I will bookmark the two others though.

2 Likes

Me too :slight_smile:

1 Like

Hey Folks,

I dug my solution up on StackOverflow several months ago.

(I had to dig a bit to find the page again.)

There's a long discussion if anyone's interested, but I think the normalize('NFD') method in post #4 above is the neatest and the cleanest.

-Chris

It is a good solution. Rembember to mark the solution, so it is easy for others to find when searching the forum.

1 Like

Done.

1 Like

I would have used the 'Translate Text to HTML' functionality in BBEdit via the Text Factory.

TextFactory

Wrote a quick blog about using TextFactory in Keyboard Maestro:

http://www.cryan.com/daily/20180626.jsp

1 Like