How to Surround Quoted Words With Tags?

tiffle · November 30, 2020, 11:33am

That's the first example you give but the second example has the tags outside the single quotes, like this:

Is it really that inconsistent or is it just a typo?

ALYB · November 30, 2020, 12:14pm

That's indeed intended inconsistency. (The technical background: The editor that I use to translate, allows to hide sentence-spanning tags. This is very useful if the whole sentence is spanned: these spanning tags won't be displayed. It's a little less useful if the opening tag at the start of the sentence (or closing tag at the end) is hidden, since the corresponding closing (or opening) tag will still be displayed, like in the second variant.)

ComplexPoint · November 30, 2020, 12:34pm

To get the match between inputs and ouputs that you after, we need to know:

what outputs you are seeing
which aspects of them diverge from the output required.

If you can, try not to think in terms of 'tasks to solve' and what scripts 'do',
and instead just narrow focus exclusively down to examples of

the input texts, both EN and NL
and the output text (updated NL)

Here is an update which broadens out the range of delimiting quotes that are recognised, to include '"{}().

Tags added to Target.kmmacros (20.5 KB)

Could you tell us:

what input pair you are giving (the exact EN text and NL text, in their raw pre-process state)
what output text you are seeing from that input pair
exactly what the difference is between what you see and what you need.

(I absolutely appreciate the relevance of concepts like works and doesn't' work, solves and doesn't solve at your end, but here they leave us rather in the dark.

All we can work with is:

concrete examples of the two input texts and the single output text.
indication of exactly which substrings in the latter are not right yet.)

ALYB · November 30, 2020, 1:05pm

Input pairs:

Variant 1: Paired tags only

Input EN ‘<x1>Keyboard Maestro</x1>’ is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for <x6>‘real’</x6>
Input NL ‘Keyboard Maestro’ is de beste ‘uitvinding’ na ‘geroosterd brood’ zonder ‘boter’ of ‘jam’, zeker ‘weten’

Variant 2: Paired tags and unpaired tags

Input EN ‘Keyboard Maestro’</x1> is the best <x2>‘invention’</x2> after <x3>‘sliced bread<x/3>’ with no ‘<x4>butter’</x4> or ‘<x5>marmelade</x5>’, for ‘<x6>real’
Input NL ‘Keyboard Maestro’ is de beste ‘uitvinding’ na ‘geroosterd brood’ zonder ‘boter’ of ‘jam’, zeker ‘weten’

What I see (I marked the undesired results in red):

What it should be:

Second variant, what I see:

What it should be:

ComplexPoint · November 30, 2020, 2:34pm

Thanks for taking the trouble to do that – it's really helpful.

It may be more solid to break the problem down into two parts:

Placing the <numbered tags>
Allowing for, and preserving, various paired delimiters.

Let's start by trying to solve problem 1, and if we can get that right, then we can think about problem 2.

On placing the <numbered tags>, are we getting closer with this ?

Tags added to Target.kmmacros (20.9 KB)

ComplexPoint · November 30, 2020, 3:08pm

An initial guess at stage 2 (detecting and preserving various quote pair types), when we get there, might look something like:

Tags added around various Quote pairs.kmmacros (23.6 KB)

JS source - grouping by quotation set membership

(() => {
    'use strict';

    // main :: IO ()
    const main = () => {
        const
            kme = Application('Keyboard Maestro Engine'),
            kmVar = k => kme.getvariable(k),
            strEN = kmVar('strEN'),
            strNL = kmVar('strNL');

        const delims = '\"\'\“\”\‘\’{}()';

        // delimSplit :: String -> [String]
        const delimSplit = s =>
            groupBy(on(eq)(c => delims.includes(c)))(s);

        return zipWith(
            en => nl => {
                return en.includes('<') ? (
                    `${firstTag(en)}${nl}${lastTag(en)}`
                ) : nl;
            }
        )(
            delimSplit(strEN)
        )(
            delimSplit(strNL)
        ).join('');
    };


    // firstTag :: String -> String
    const firstTag = s =>
        s.startsWith('<') ? (() => {
            const i = [...s].findIndex(c => '>' === c);
            return -1 !== i ? (
                s.slice(0, 1 + i)
            ) : '';
        })() : '';


    // lastTag :: String -> String
    const lastTag = s =>
        s.endsWith('>') && s !== firstTag(s) ? (() => {
            const
                i = [...s].reverse().findIndex(
                    c => '<' === c
                );
            return -1 !== i ? (
                s.slice(s.length - (1 + i))
            ) : '';
        })() : ''


    // --------------------- GENERIC ---------------------

    // Tuple (,) :: a -> b -> (a, b)
    const Tuple = a =>
        b => ({
            type: 'Tuple',
            '0': a,
            '1': b,
            length: 2
        });

    // showLog :: a -> IO ()
    const showLog = (...args) =>
        console.log(
            args
            .map(JSON.stringify)
            .join(' -> ')
        );

    // groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
    const groupBy = fEq =>
        // Typical usage: groupBy(on(eq)(f), xs)
        xs => (ys => 0 < ys.length ? (() => {
            const
                tpl = ys.slice(1).reduce(
                    (gw, x) => {
                        const
                            gps = gw[0],
                            wkg = gw[1];
                        return fEq(wkg[0])(x) ? (
                            Tuple(gps)(wkg.concat([x]))
                        ) : Tuple(gps.concat([wkg]))([x]);
                    },
                    Tuple([])([ys[0]])
                ),
                v = tpl[0].concat([tpl[1]]);
            return 'string' !== typeof xs ? (
                v
            ) : v.map(x => x.join(''));
        })() : [])(list(xs));

    // on :: (b -> b -> c) -> (a -> b) -> a -> a -> c
    const on = f =>
        // e.g. groupBy(on(eq)(length))
        g => a => b => f(g(a))(g(b));

    // eq (==) :: Eq a => a -> a -> Bool
    const eq = a =>
        // True when a and b are equivalent in the terms
        // defined below for their shared data type.
        b => {
            const t = typeof a;
            return t !== typeof b ? (
                false
            ) : 'object' !== t ? (
                'function' !== t ? (
                    a === b
                ) : a.toString() === b.toString()
            ) : (() => {
                const kvs = Object.entries(a);
                return kvs.length !== Object.keys(b).length ? (
                    false
                ) : kvs.every(([k, v]) => eq(v)(b[k]));
            })();
        };

    // list :: StringOrArrayLike b => b -> [a]
    const list = xs =>
        // xs itself, if it is an Array,
        // or an Array derived from xs.
        Array.isArray(xs) ? (
            xs
        ) : Array.from(xs || []);

    // sj :: a -> String
    function sj() {
        // Abbreviation of showJSON for quick testing.
        // Default indent size is two, which can be
        // overriden by any integer supplied as the
        // first argument of more than one.
        const args = Array.from(arguments);
        return JSON.stringify.apply(
            null,
            1 < args.length && !isNaN(args[0]) ? [
                args[1], null, args[0]
            ] : [args[0], null, 2]
        );
    }

    // zipWith :: (a -> a -> b) -> [a] -> [b]
    const zipWith = f => {
        // A list with the length of the shorter of 
        // xs and ys, defined by zipping with a
        // custom function, rather than with the
        // default tuple constructor.
        const go = xs =>
            ys => 0 < xs.length ? (
                0 < ys.length ? (
                    [f(xs[0])(ys[0])].concat(
                        go(xs.slice(1))(ys.slice(1))
                    )
                ) : []
            ) : [];
        return go;
    };

    // MAIN --
    return main();
})();

ALYB · November 30, 2020, 4:29pm

This is almost perfect. There's only the closing quote marks that don't match (‘ instead of ’). Works for both variants:

And:

ComplexPoint · November 30, 2020, 4:38pm

I take it that This refers, in your context, to version 1 above (Tags added to Target) ?

(that draft doesn't aim to get the quotes right, so its sounds like we're making progress)

How about version 2 (the draft for stage 2 of the problem, above):

Tags added around various Quote pairs ?

What are you seeing in the output there ?

ALYB · November 30, 2020, 4:41pm

You're getting there.

There's only one glitch in the variant with the Paired tags only: the last, closing tag is missing:

The other variant is correct.

Then there is this issue with the punctuation marks not being transferred to (synced with) the source. Instead of () the target shows ‘’:

And:

ComplexPoint · November 30, 2020, 5:05pm

Next step, with this version:

Tags added around various Quote pairs 002.kmmacros (24.4 KB)

I am seeing

(<x1>Keyboard Maestro</x1>) is de beste <x2>(uitvinding)</x2> na <x3>(geroosterd brood<x/3>) zonder (<x4>boter)</x4> of (<x5>jam</x5>), zeker <x6>(weten)

Could you again mark any points of divergence (in the output) for us ?

i.e. show us the text triples of this kind that you are seeing:

(input EN, input NL) -> output NL

ALYB · November 30, 2020, 5:25pm

Indeed. And the last closing tag is missing in this variant:

It also looks like the targets always default to ‘’ instead of matching the () {} “” in the source:

And:

ComplexPoint · November 30, 2020, 6:13pm

In that example the number of segments in the EN is one unit longer than the number of segments in the NL, so I'll need a variant of

zipWith

to allow for paired lists with differing lengths.

I think I may have a zipWithLong in my library somewhere. If not I'll write one – it may prove useful elsewhere.

Ah, you want the NL to adopt the bracketing delimiters from the EN source as well as the tags ?

OK. I'll take a look at that.

ComplexPoint · November 30, 2020, 6:19pm

Next step:

zipWithLong to pick up an extra trailing segment
EN quotes replacing the NL quotes.

Could you again show us any divergences in the

(EN input, NL input) -> NL output

triples ?

Tags added around various Quote pairs 003.kmmacros (23.7 KB)

JS Source 003

(() => {
    'use strict';

    // Draft 003

    // main :: IO ()
    const main = () => {
        const
            kme = Application('Keyboard Maestro Engine'),
            kmVar = k => kme.getvariable(k),
            strEN = kmVar('strEN'),
            strNL = kmVar('strNL');

        const delims = '\"\'\“\”\‘\’{}()';

        // delimSplit :: String -> [String]
        const delimSplit = s =>
            groupBy(on(eq)(c => delims.includes(c)))(s);

        return zipWithLong(
            en => nl => en.includes('<') ? (
                `${firstTag(en)}${nl}${lastTag(en)}`
            ) : delims.includes(nl) ? (
                en
            ) : nl
        )(
            delimSplit(strEN)
        )(
            delimSplit(strNL)
        ).join('');
    };

    // firstTag :: String -> String
    const firstTag = s =>
        s.startsWith('<') ? (() => {
            const i = [...s].findIndex(c => '>' === c);
            return -1 !== i ? (
                s.slice(0, 1 + i)
            ) : '';
        })() : '';


    // lastTag :: String -> String
    const lastTag = s =>
        s.endsWith('>') && s !== firstTag(s) ? (() => {
            const
                i = [...s].reverse().findIndex(
                    c => '<' === c
                );
            return -1 !== i ? (
                s.slice(s.length - (1 + i))
            ) : '';
        })() : ''


    // --------------------- GENERIC ---------------------

    // Tuple (,) :: a -> b -> (a, b)
    const Tuple = a =>
        b => ({
            type: 'Tuple',
            '0': a,
            '1': b,
            length: 2
        });

    // showLog :: a -> IO ()
    const showLog = (...args) =>
        console.log(
            args
            .map(JSON.stringify)
            .join(' -> ')
        );

    // groupBy :: (a -> a -> Bool) -> [a] -> [[a]]
    const groupBy = fEq =>
        // Typical usage: groupBy(on(eq)(f), xs)
        xs => (ys => 0 < ys.length ? (() => {
            const
                tpl = ys.slice(1).reduce(
                    (gw, x) => {
                        const
                            gps = gw[0],
                            wkg = gw[1];
                        return fEq(wkg[0])(x) ? (
                            Tuple(gps)(wkg.concat([x]))
                        ) : Tuple(gps.concat([wkg]))([x]);
                    },
                    Tuple([])([ys[0]])
                ),
                v = tpl[0].concat([tpl[1]]);
            return 'string' !== typeof xs ? (
                v
            ) : v.map(x => x.join(''));
        })() : [])(list(xs));


    // on :: (b -> b -> c) -> (a -> b) -> a -> a -> c
    const on = f =>
        // e.g. groupBy(on(eq)(length))
        g => a => b => f(g(a))(g(b));


    // eq (==) :: Eq a => a -> a -> Bool
    const eq = a =>
        // True when a and b are equivalent in the terms
        // defined below for their shared data type.
        b => {
            const t = typeof a;
            return t !== typeof b ? (
                false
            ) : 'object' !== t ? (
                'function' !== t ? (
                    a === b
                ) : a.toString() === b.toString()
            ) : (() => {
                const kvs = Object.entries(a);
                return kvs.length !== Object.keys(b).length ? (
                    false
                ) : kvs.every(([k, v]) => eq(v)(b[k]));
            })();
        };


    // list :: StringOrArrayLike b => b -> [a]
    const list = xs =>
        // xs itself, if it is an Array,
        // or an Array derived from xs.
        Array.isArray(xs) ? (
            xs
        ) : Array.from(xs || []);

    // sj :: a -> String
    function sj() {
        // Abbreviation of showJSON for quick testing.
        // Default indent size is two, which can be
        // overriden by any integer supplied as the
        // first argument of more than one.
        const args = Array.from(arguments);
        return JSON.stringify.apply(
            null,
            1 < args.length && !isNaN(args[0]) ? [
                args[1], null, args[0]
            ] : [args[0], null, 2]
        );
    }

    // zipWithLong :: (a -> a -> a) -> [a] -> [a] -> [a]
    const zipWithLong = f => {
        // A list with the length of the *longer* of 
        // xs and ys, defined by zipping with a
        // custom function, rather than with the
        // default tuple constructor.
        // Any unpaired values, where list lengths differ,
        // are simply appended.
        const go = xs =>
            ys => 0 < xs.length ? (
                0 < ys.length ? (
                    [f(xs[0])(ys[0])].concat(
                        go(xs.slice(1))(ys.slice(1))
                    )
                ) : xs
            ) : ys
        return go;
    };

    // MAIN --
    return main();
})();

ALYB · November 30, 2020, 9:48pm

Thank you very much for all your time, patience and help!

ComplexPoint · November 30, 2020, 10:31pm

Thank you for all the patient examples of inputs outputs and divergences : -)

JMichaelTX · December 1, 2020, 5:56am

ALYB:

I'd like to request help for a task that goes beyond my knowledge of KM: I want to transfer all tags of the type {x1} ... {/x1} in source sentences to the corresponding target sentences (translations).

The tags should be inserted at the correct position, related to the surrounding quote characters (either left or right of the quote character).

Note that the tags don't need to be paired in all sentences.

Example:

Source 1:
"Keyboard Maestro"{/x1} is the best {x2}"invention"{/x2} after {x3}"sliced bread"

Source 2:
"{x1}Keyboard Maestro{/x1}" is the best "{x2}invention{/x2}" after "{x3}sliced bread{/x3}"

Target 1:
"Keyboard Maestro"{/x1} is de beste {x2}"uitvinding"{/x2} na {x3}"geroosterd brood"

Target 2:
"{x1}Keyboard Maestro{/x1}" is de beste "{x2}uitvinding {/x2}" na "{x3}geroosterd brood{/x3}"

(I've used { and } to representative the less than and greater than characters.)

In source 1 the first opening and the last closing tags are missing, because they would be at the first/last position of the sentence (my editor hides them at these positions).

Now that I better understand both your data and workflow, I can provide a KM Macro that uses Regex Actions. IMO, this Macro is far simpler, and easier to understand than the complex JavaScript solution. Of course, you do have to have some understanding of RegEx. Let me know if you have any questions.

Example Output

Below is just an example written in response to your request. You will need to use as an example and/or change to meet your workflow automation needs.

Please let us know if it meets your needs.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

MACRO: Add Tags in EN Text to NL Text [Example]

-~~~ VER: 2.0 2020-11-30 ~~~
Requires: KM 8.2.4+ macOS 10.11 (El Capitan)+
(Macro was written & tested using KM 9.0+ on macOS 10.14.5 (Mojave))

DOWNLOAD Macro File:

Add Tags in EN Text to NL Text [Example].kmmacros
Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.

ReleaseNotes

Author.@JMichaelTX

PURPOSE:

Add Tags in EN Text to NL Text
- Get Tags in EN Text, and Insert into corresponding quoted text in NL Text
- Needs only the Open or Close Tag for each Quoted String
- The Tags can be either inside OR outside of the EN Quoted Text.
- The resultant NL Text with Tags will have the proper open/close tags, and the tags will be OUTSIDE of the Quoted Text.

HOW TO USE

First, make sure you have followed instructions in the Macro Setup below.
REPLACE the text in the first two Set Variable Actions with your actual data
Trigger this macro.

MACRO SETUP

Carefully review the Release Notes and the Macro Actions
- Make sure you understand what the Macro will do.
- You are responsible for running the Macro, not me. ??
  .
  Make These Changes to this Macro

Assign a Trigger to this macro.
Move this macro to a Macro Group that is only Active when you need this Macro.
ENABLE this Macro, and the Macro Group it is in.
.

REVIEW/CHANGE THE FOLLOWING MACRO ACTIONS:
(all shown in the magenta color)
- CHANGE to Your EN Source Text
  - The EN source text that you want to process
- CHANGE to Your NL Source Text
  - The NL Source Text you want to process

REQUIRES:

KM 9.0+ (may work in KM 8.2+ in some cases)
macOS 10.11.6 (El Capitan)+

TAGS: @RegEx @Strings @Example

USER SETTINGS:

Any Action in magenta color is designed to be changed by end-user

ComplexPoint · December 1, 2020, 9:42am

Good to see variant approaches – I would be particularly interested to see an approach (which I think should be possible, perhaps with JSON-patterned KM variable values) which bypasses both script and regular expressions.

Note though, that your Regex draft hasn't yet caught up with the full set of requirements and data samples that emerged in the thread.

In particular:

Allowing for a variety of quote-pair types (not just twinned smart single quotes)
Allowing for inputs in which the EN and NL differ from each other in the quote pair types which they use.
Defining an NL output in which, regardless of the quote pair type used in the NL source, the EN quote pair type is used.

for example (see the most recent draft of the version using an Execute Javascript action):

EN input:

(<x1>Keyboard Maestro</x1>) is the best <x2>(invention)</x2> after <x3>(sliced bread<x/3>) with no (<x4>butter)</x4> or (<x5>marmelade</x5>), for <x6>(real)</x6>

NL input:

{Keyboard Maestro} is de beste {uitvinding} na {geroosterd brood} zonder {boter} of {jam}, zeker {weten}

Macro output (preserving all tag and quote boundaries, even when overlapping):

(<x1>Keyboard Maestro</x1>) is de beste <x2>(uitvinding)</x2> na <x3>(geroosterd brood<x/3>) zonder (<x4>boter)</x4> of (<x5>jam</x5>), zeker <x6>(weten)</x6>

(I'm sure the regular expression version can be quickly updated to fix those points, at little cost to 'simplicity', for those who are keeping up their training in debugging regular expressions : -)

JMichaelTX · December 1, 2020, 9:57pm

If the OP is interested in my RegEx solution, AND is willing to post a complete set of requirement (real-world examples of source text, unedited, and real-world examples of desired output results, then I would be happy to revise my solution accordingly.

@ALYB, please let me know if you are interested.

ALYB · December 2, 2020, 7:26am

Thank you for your kind offer @JMichaelTX, but the solution provided by @ComplexPoint is already perfect for my purposes.

JMichaelTX · December 3, 2020, 1:28am

No problem, I just wanted to provide a working example for those interested in using RegEx.
The RegEx is not that hard in this case, but it does take some creative thinking in how to use multiple RegExes in order to design a solution.

Nothing wrong with using JavaScript, if that is your preference. I use JavaScript all the time.

How to Surround Quoted Words With Tags?

Example Output

MACRO: Add Tags in EN Text to NL Text [Example]

DOWNLOAD Macro File:

ReleaseNotes

Options