Easy question for someone who knows Regex!

Hi!

I am relatively new to Regex and have been too stubborn to ask for help until having spent the last 3 hours trying to get this working :frowning:

I'm trying to use the "for each" command to capture substrings of a variable (paragraphs) with the following regular expression:

(\t\t)(.|\s)*?(?=\n\t\t)

It seems to be working as expected using the Regexr tool, and splits the content into the 4 matches I want:

http://regexr.com/3d13u

But when I use it in KM, the KM engine crashes. I guess there is some kind of infinite loop or something I am causing that isn't triggering an error on this Regexr tool?

Thanks in advance for any tips to get this working, I can provide more info, I did best to keep this post brief!

Would it help if I uploaded the KM macro file?

Cheers,
Joel

No, it doesn’t crash here. Try to paste a sample text directly into the text field of the variable declaration, to see if it still crashes.

I get substrings like these:

I guess this is correct.

To avoid matching the tab sequence in the first line (of your sample text) you could do
(?m)^(\t\t)(.|\s)*?(?=\n\t\t)

And you can make the dot match newlines with (?s)
So: (?sm)^(\t\t)(.)*?(?=\n\t\t)

1 Like

Thanks for your help, Tom. Strange, for me it is crash city. KM Engine icon in the menu bar freezes immediately and can see KM going nuts the Activity Monitor. Tried rebooting, no difference. Any idea if there is some kind of cache I can reset or something to try to eliminate crashing?

Thanks again!!

Did you try it with a literal sample text in the field (instead of %pp_entire_lesson%)?

This way you can see if the problem comes from the %pp_entire_lesson% variable content.

1 Like

I had tried that and still had the same problem. But right now I tried copying the exact same text I had pasted in that Regexr tool back into KM and no crashing!

So I think I have narrowed it down to a problem with the text that Excel is capturing to that variable. If I copy and paste it from Excel to another app first (including textedit), all is good. But if I use a macro to copy the text directly from Excel to KM, it's a first-class trip to crash city.

Very strange since this is plain text. Maybe something to do with text encoding? I will experiment with a way to try to re-reecode that text coming out of excel before putting it into KM to see if that helps. Any suggests for how to do this are very welcome.

If you're interested in checking out the crash, here is the text macro attached.

Thanks again for your help Tom, at least I know it wasn't a regex+KM problem.

regextest2.kmmacros (5.5 KB)

But right now I tried copying the exact same text I had pasted in that Regexr tool back into KM and no crashing!

That’s what I did, too, and because of that I asked you to test it.

But if I use a macro to copy the text directly from Excel to KM, it's a first-class trip to crash city.

Very strange since this is plain text. Maybe something to do with text encoding?

I’m sure that Excel is doing very strange things (like all MS applications). A probably more consistent way could be to work with text files (csv), exported from Excel.

And to further simplify the regex (no capturing needed here):
(?sm)^\t\t.*?(?=\n\t\t)

And if you don’t need/want the tabs at the beginning of the substring, this is cleaner:
(?sm)(?<=^\t\t).*?(?=\n\t\t)

(You didn’t see the 2 tabs in the Alert box because apparently the Alert Box action automatically stripped them away. In a “Display text in a window” action you would see them.)

1 Like

Thanks Tom. How do I learn more about that (?sm) modifier? I couldn’t find it in any Regex references, is it specific to KM?

It’s the line endings. If I filter the variable with this, everything runs fine:

I couldn't find it in any Regex references, is it specific to KM?

The reference you find in KM’s Help menu (“ICU Regular Expression Reference”).

1 Like

No, it’s rather common. Though there are differences.

1 Like

Tom, you just saved my life!

I would have never figured that out in a million years…

You have no idea how relieved I am… I spent the last couple hours exporting the content directly from Google drive as CSV, using regex to convert to tab delimited (to make it work with my huge daisy-chained macro).

After all that work it was STILL crashing.

But adding that Unix line ending filter completely fixed it.

What is the best way to thank you for your generous time?

I’ve learned something, too :blush:

1 Like