RegEx Matching Large Blocks of Text

MartinPacker · January 22, 2021, 3:40pm

I'm actually having to think critically about whether to use RegEx in certain parts of md2pptx. Right now, on a 2015 15" MBP, it can only generate 40 PowerPoint slides a second.

(Those parts are parsing Markdown image references and clickable image references - so they're quite complex.)

ComplexPoint · January 22, 2021, 4:49pm

I forget whether you are using Python or JS there, but in either case, for full parsing (string pattern matching + declarative nested/recursive patterns, it can be interesting to experiment with parser combinators, which are fairly readily written – or drawn from a library, in some cases – in any language with naturally first-class functions).

You get the power of regex without the constraints, all within the syntax of a single language.

( Not necessarily a solution if the core problem is really performance, but very good for readability and refactoring )

JMichaelTX · January 22, 2021, 6:29pm

If you want help with this, please create a new Topic and provide us with the specific details, including real-world source text and real-world expected results.

MartinPacker · January 23, 2021, 8:54am

Thanks @ccstone and @JMichaelTX. I’m good for the moment. The joke was 40 slides a second ought to be good enough on old kit like that for anybody. (My productivity and consistency gain from writing and using md2pptx had been enormous anyway so I’m good.)

@ccstone my use case is Python.

As an experiment, I tried to short circuit RegExes in my most complex (Markdown image reference) case by using Python short circuit anding with a hunt for “![“. The idea being that an image reference always contains “![“, whether clickable or not. It made no difference to the speed - on an image heavy 55-slide presentation. Still 1.3 seconds or so. So I conclude RegEx is not too heavy in this (anecdotal) case.

RegEx Matching Large Blocks of Text

Options