How to Surround Quoted Words With Tags?

ALYB · November 29, 2020, 9:41am

I'd like to request help for a task that goes beyond my knowledge of KM: I want to transfer all tags of the type {x1} ... {/x1} in source sentences to the corresponding target sentences (translations).

The tags should be inserted at the correct position, related to the surrounding quote characters (either left or right of the quote character).

Note that the tags don't need to be paired in all sentences.

Example:

Source 1:
"Keyboard Maestro"{/x1} is the best {x2}"invention"{/x2} after {x3}"sliced bread"

Source 2:
"{x1}Keyboard Maestro{/x1}" is the best "{x2}invention{/x2}" after "{x3}sliced bread{/x3}"

Target 1:
"Keyboard Maestro"{/x1} is de beste {x2}"uitvinding"{/x2} na {x3}"geroosterd brood"

Target 2:
"{x1}Keyboard Maestro{/x1}" is de beste "{x2}uitvinding {/x2}" na "{x3}geroosterd brood{/x3}"

(I've used { and } to representative the less than and greater than characters.)

In source 1 the first opening and the last closing tags are missing, because they would be at the first/last position of the sentence (my editor hides them at these positions).

ComplexPoint · November 29, 2020, 11:27am

What are the inputs to your macro ?

a tagged EN text and an untagged NL text ?

How do you obtain the segmentation of the NL string ?

Or are you just aiming for restoration of the missing opening or closing tags ?
Or to move the quotation characters outside the tags ?

PS, FWIW you can enter <tags> ... </directly> on Discourse forums like this

by flanking the string with backtick characters

`<tags> ... </directly>`

ComplexPoint · November 29, 2020, 12:22pm

If that were the goal, then others might be keen to move straight to applying the problem to the need for Regular Expression practice, but I think my first experiment, in a KM context, might be some variant of a pattern like this:

Quotes moved out of tags.kmmacros (22.4 KB)

ALYB · November 29, 2020, 1:20pm

The inputs is indeed a tagged EN text and the NL translation from which I've removed all tags.

The aim is to transfer all tags from their quote-related position from EN to NL:

So the quotes with the orange arrows have to be tagged (at the correct side of the quote, either at the left or at the right, like in the source), whereas the quotes with the red arrows (that aren't tagged in the source), should be omitted. And, to complicate things: the tag numbering has to match the numbering in the source.

ComplexPoint · November 29, 2020, 5:53pm

So the NL input string already has quotation marks ?

(matching the positions of the quotes in the EN ?)

i.e. if we sketched out the inputs in terms of JS, it might look like this:

const
    strEN = '"Keyboard Maestro"</x1> is the best <x2>"invention"</x2> after <x3>"sliced bread"',
    strNL = '"Keyboard Maestro" is de beste "uitvinding" na "geroosterd brood"';

?

ALYB · November 29, 2020, 6:42pm

Yes, that's indeed correct. However, I realised that it's very likely that the authors of these texts won't be very consequent in their tagging (at least that's what I see every day). So variations of tag-quote spanning and even errors will be present too. The easiest approach will be to just mimic the variations and errors (a manual correction can take place later).

strEN = '"Keyboard Maestro"</x1> is the best <x2>"invention"</x2> after <x3>"sliced bread<x/3>" with no "<x4>butter"</x4> or <x5>"marmelade"',
strNL = '"Keyboard Maestro" is de beste "uitvinding" na "geroosterd brood" zonder "boter" of "jam"';

ComplexPoint · November 29, 2020, 6:44pm

OK, so does this match the simple case ?

(It looks like what I would call a zipWith pattern. There's probably a way to do that with a KM for Each, but in terms, for the moment, of JS):

Tags added to Target.kmmacros (20.3 KB)

JS Source

(() => {
    'use strict';

    // main :: IO ()
    const main = () => {
        const
            kme = Application('Keyboard Maestro Engine'),
            kmVar = k => kme.getvariable(k),
            strEN = kmVar('strEN'),
            strNL = kmVar('strNL');

        return zipWith(
            en => nl => en.startsWith('<') ? (
                `${prefixTag(en)}"${nl}"${suffixTag(en)}`
            ) : nl
        )(
            strEN.split('"')
        )(
            strNL.split('"')
        ).join('');
    };

    // prefixTag :: String -> String
    const prefixTag = s => {
        const
            iTagEnd = [...s].findIndex(
                c => '>' === c
            );
        return -1 !== iTagEnd ? (
            s.slice(0, 1 + iTagEnd)
        ) : '';
    };

    // suffixTag :: String -> String
    const suffixTag = s => {
        const
            iTagEnd = [...s].reverse().findIndex(
                c => '<' === c
            );
        return -1 !== iTagEnd ? (
            s.slice(s.length - (1 + iTagEnd))
        ) : '';
    };

    // --------------------- GENERIC ---------------------

    // zipWith :: (a -> a -> a) -> [a] -> [a]
    const zipWith = f => {
        // A list with the length of the shorter of 
        // xs and ys, defined by zipping with a
        // custom function, rather than with the
        // default tuple constructor.
        const go = xs =>
            ys => 0 < xs.length ? (
                0 < ys.length ? (
                    [f(xs[0])(ys[0])].concat(
                        go(xs.slice(1))(ys.slice(1))
                    )
                ) : []
            ) : [];
        return go;
    };

    // MAIN --
    return main()
})();

ALYB · November 29, 2020, 7:28pm

It looks like a inversion of the order tag - quote is taking place:

And:

BTW: We'll leave all the irregularities, caused by sloppy tagging by the authors, for what it is ...

ComplexPoint · November 29, 2020, 7:43pm

Yes, I thought that was what you were aiming for :- )

It seemed to be implied by this example:

If you would like to show:

A sample input string
and the exactly matching output string

then I should be able to adjust the source accordingly.

ALYB · November 29, 2020, 7:59pm

Very generous! This should cover all possible cases:


strEN = '‘Keyboard Maestro’</x1> is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for ‘<x6>real’',
strNL = '‘Keyboard Maestro’</x1> is de beste <x2>‘uitvinding’</x2> na <x3>‘geroosterd brood<x/3>’ zonder ‘<x4>boter’</x4> of ‘<x5>jam</x5>’, zeker ‘<x6>weten’';

(I've replaced the straight double quotes with the single curly ones, because I want to be able to adapt the JS to paired surrounding punctuation marks like (), {}, <> etc.)

ComplexPoint · November 29, 2020, 8:06pm

I should have asked for one more thing - I had forgotten that there are, of course 2 input strings for each output string.

Could I trouble you to add the corresponding NL input string, to go with the EN input string ?

ALYB · November 29, 2020, 8:17pm

Like this?

strEN = '‘Keyboard Maestro’</x1> is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for ‘<x6>real’',
strNL = '‘Keyboard Maestro’</x1> is de beste <x2>‘uitvinding’</x2> na <x3>‘geroosterd brood<x/3>’ zonder ‘<x4>boter’</x4> of ‘<x5>jam</x5>’, zeker ‘<x6>weten’';

strEN = '‘<x1>Keyboard Maestro</x1>’ is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for <x6>‘real’</x6>',
strNL = '‘<x1>Keyboard Maestro</x1>’ is de beste <x2>‘uitvinding’</x2> na <x3>‘geroosterd brood<x/3>’ zonder ‘<x4>boter’</x4> of ‘<x5>jam</x5>’, zeker <x6>‘weten’</x6>';

ComplexPoint · November 29, 2020, 8:48pm

Mmm ... now I'm feeling a bit puzzled, it seems that the input NL already has tags in your example ?

So ... what at we producing, from the two input lists, that is different in the output list ?

I think I was expecting to see:

input EN with tags
input NL with no tags, but some quotation marks ...

ALYB · November 29, 2020, 8:55pm

Your assumption is correct. I just added the tags to the NL to indicate where they have to be placed.

strEN = '‘Keyboard Maestro’</x1> is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for ‘<x6>real’',
strNL = '‘Keyboard Maestro’ is de beste ‘uitvinding’ na ‘geroosterd brood’ zonder ‘boter’ of ‘jam’, zeker ‘weten’';

strEN = '‘<x1>Keyboard Maestro</x1>’ is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for <x6>‘real’</x6>',
strNL = '‘Keyboard Maestro’ is de beste ‘uitvinding’ na ‘geroosterd brood’ zonder ‘boter’ of ‘jam’, zeker ‘weten’';

ComplexPoint · November 29, 2020, 10:50pm

OK, so there are two input strings:

EN
NL

but how many output strings are you expecting ?

I thought one, but now it looks like two, and it seems that the EN is changing – not just the NL ?

Could I ask you to show me one example of:

Exactly what an unmodified pair of expected input strings will look like, and label them explicitly as "Example of expected inputs"
and then an example of the expected output, explicitly labelled expected output.

With those labelled strings above I am no longer quite sure whether you expect 1 or 2 strings to be produced, or exactly what the pattern is

I had thought it was two input strings (EN, NL) and one output string (NL) ...

but that doesn't seem to be what you are showing ...

ComplexPoint · November 29, 2020, 11:12pm

PS the best way to get help is always to skip explanation and description, and just go straight to:

This is the starting point (example),
and this is what I want to produce from it (example)

How do I get from A to B ?

Showing is always clear, telling, for some reason, very rarely : -)

ComplexPoint · November 29, 2020, 11:50pm

In the meanwhile, you may find that you can fine-tune it by experimenting with adjustments to line 14 of the JS source in the Execute JavaScript action.

Here is a more vanilla version of the macro and that code, which leaves the double quotes in place.

Tags added to Target.kmmacros (21.0 KB)

JMichaelTX · November 30, 2020, 2:10am

ALYB:

strEN = '‘Keyboard Maestro’</x1> is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for ‘<x6>real’',
strNL = '‘Keyboard Maestro’ is de beste ‘uitvinding’ na ‘geroosterd brood’ zonder ‘boter’ of ‘jam’, zeker ‘weten’';

strEN = '‘<x1>Keyboard Maestro</x1>’ is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for <x6>‘real’</x6>',
strNL = '‘Keyboard Maestro’ is de beste ‘uitvinding’ na ‘geroosterd brood’ zonder ‘boter’ of ‘jam’, zeker ‘weten’';

This is a great use case for using RegEx.
However, that requires clear, consistent examples of source text, and the same for the output results text.
Unfortunately, your example is inconsistent, and it is not clear how you are converting (translating) all of the text. You are changing some text that is NOT between tags.

So, in order to have a meaningful solution I have had to edit your source and results text:

Source Text

VERSION 1 -- Missing Opening Tag

‘Keyboard Maestro’</x1> is the best <x2>‘invention’</x2> after <x3>‘sliced bread’</x3> with no <x4>‘butter’</x4> or <x5>‘marmelade’</x5>, for <x6>‘real’</x6>

VERSION 2 -- Both Open and Close Tags

<x1>‘Keyboard Maestro’</x1> is the best <x2>‘invention’</x2> after <x3>‘sliced bread’</x3> with no <x4>‘butter’</x4> or <x5>‘marmelade’</x5>, for <x6>‘real’</x6>

So here is a RegEx solution that replaces the Tag block, and any source text, with the same text based solely on the Tag Name:

x1	Keyboard Maestro
x2	uitvinding
x3	geroosterd brood
x4	boter
x5	jam
x6	weten

I suspect that the source and replacement text for a given Tag (like "x2") could vary with your input source text. If so, you will need to provide specific examples.

For now, the replacement text is based solely on the Tag Name.

Example Output

For RegEx details, see: regex101: build, test, and debug regex

Below is just an example written in response to your request. You will need to use as an example and/or change to meet your workflow automation needs.

Please let us know if it meets your needs.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Below is just an example written in response to your request. You will need to use as an example and/or change to meet your workflow automation needs.

Please let us know if it meets your needs.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MACRO: Replace Text Between Tags [Example]

-~~~ VER: 1.0 2020-11-29 ~~~
Requires: KM 8.2.4+ macOS 10.11 (El Capitan)+
(Macro was written & tested using KM 9.0+ on macOS 10.14.5 (Mojave))

DOWNLOAD Macro File:

Replace Text Between Tags [Example].kmmacros
Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.

ReleaseNotes

Author.@JMichaelTX

PURPOSE:

Replace Text Between Tags
- Tags (Open & Close) and Original Text Are Replaced by Text in "ReplacementText" KM Variable
- Note that this replaces ANY text between the Tags with the same replacement text. This assumes that the same source text is always the same for a given Tag.

HOW TO USE

First, make sure you have followed instructions in the Macro Setup below.
REPLACE the text in the first two Set Variable Actions with your actual data
Trigger this macro.

MACRO SETUP

Carefully review the Release Notes and the Macro Actions
- Make sure you understand what the Macro will do.
- You are responsible for running the Macro, not me. ??
  .
  Make These Changes to this Macro

Assign a Trigger to this macro.
Move this macro to a Macro Group that is only Active when you need this Macro.
ENABLE this Macro, and the Macro Group it is in.
.

REVIEW/CHANGE THE FOLLOWING MACRO ACTIONS:
(all shown in the magenta color)
- SET Source String
  - The source text that you want to process
- SET Tag Replacement Text
  - List of Tags and the corresponding replacement text

REQUIRES:

KM 9.0+ (may work in KM 8.2+ in some cases)
macOS 10.11.6 (El Capitan)+

TAGS: @RegEx @Strings @Example

USER SETTINGS:

Any Action in magenta color is designed to be changed by end-user

ALYB · November 30, 2020, 9:26am

The examples were exactly as intended: I tried to add all irregularities that occur. Note that these aren't database strings or so that have to adhere to strict structural rules. They are e.g. MS Word strings where every tag stands for a colour change or e.g. a hyperlink. (Authors are often sloppy and they aren't consequent in placing quotes in or outside formatting.)

Besides that, the macro has to be generic. What it needs to do is transfer tags in exactly the same order from source to target, at exactly the same side of the quote as in the source. Using a list with words would make the macro inflexible.

Thank you for your help!

ALYB · November 30, 2020, 9:45am

I'm sorry to say that the second version doesn't solve the task either.

I understand that I've been not clear enough in defining what the macro needs to do. So please let me define the task better.

Task:
Transfer tags near quotes in exactly the same order from source to target, at exactly the same side of the quote as in the source. Only quotes should be used to determine the position of tags, no words should be used.

There are 2 different variants of the source sentence, one with only paired tags and one with paired tags and unpaired tags. Every variant has all possible combinations of the order of tags and quotes. The editor creates an intermediate target (which is a translation of the source) without any tags. The macro should insert all tags to the intermediate target to create the desired target (final translation).

Variant 1: Paired tags only

Source ‘<x1>Keyboard Maestro</x1>’ is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for <x6>‘real’</x6>
Intermediate target ‘Keyboard Maestro’ is de beste ‘uitvinding’ na ‘geroosterd brood’ zonder ‘boter’ of ‘jam’, zeker ‘weten’
Desired target ‘<x1>Keyboard Maestro</x1>’ is de beste <x2>‘uitvinding’</x2> na <x3>‘geroosterd brood<x/3>’ zonder ‘<x4>boter’</x4> of ‘<x5>jam</x5>’, zeker <x6>‘weten’</x6>

Variant 2: Paired tags and unpaired tags

Source ‘Keyboard Maestro’</x1> is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for ‘<x6>real’
Intermediate target ‘Keyboard Maestro’ is de beste ‘uitvinding’ na ‘geroosterd brood’ zonder ‘boter’ of ‘jam’, zeker ‘weten’
Desired target ‘Keyboard Maestro’</x1> is de beste <x2>‘uitvinding’</x2> na <x3>‘geroosterd brood<x/3>’ zonder ‘<x4>boter’</x4> of ‘<x5>jam</x5>’, zeker ‘<x6>weten’