RegEx Matching Large Blocks of Text

I'm actually having to think critically about whether to use RegEx in certain parts of md2pptx. Right now, on a 2015 15" MBP, it can only generate 40 PowerPoint slides a second. :slight_smile:

(Those parts are parsing Markdown image references and clickable image references - so they're quite complex.)

1 Like

I forget whether you are using Python or JS there, but in either case, for full parsing (string pattern matching + declarative nested/recursive patterns, it can be interesting to experiment with parser combinators, which are fairly readily written ā€“ or drawn from a library, in some cases ā€“ in any language with naturally first-class functions).

You get the power of regex without the constraints, all within the syntax of a single language.

( Not necessarily a solution if the core problem is really performance, but very good for readability and refactoring )

1 Like

If you want help with this, please create a new Topic and provide us with the specific details, including real-world source text and real-world expected results.

1 Like

Thanks @ccstone and @JMichaelTX. Iā€™m good for the moment. The joke was 40 slides a second ought to be good enough on old kit like that for anybody. :slight_smile: (My productivity and consistency gain from writing and using md2pptx had been enormous anyway so Iā€™m good.)

@ccstone my use case is Python.

As an experiment, I tried to short circuit RegExes in my most complex (Markdown image reference) case by using Python short circuit anding with a hunt for ā€œ![ā€œ. The idea being that an image reference always contains ā€œ![ā€œ, whether clickable or not. It made no difference to the speed - on an image heavy 55-slide presentation. Still 1.3 seconds or so. So I conclude RegEx is not too heavy in this (anecdotal) case.

1 Like