Need Help With a RegEx to Cleanup PDF Annotations; and Learning RegEx

There's a lot to be said, I think, for keeping to one general language (JavaScript or AppleScript for example), and using some simple primitives like:

  • splitOn
  • startsWith or isPrefixOf
  • endsWidth or isSuffixOf

That's often enough for real problems, and makes much more productive use of human time : -)

JS has the advantage that if you every genuinely do need a visit to the medicine cabinet for a short regular expression, then JavaScript has a regular expression engine built in.

From AppleScript you need to juggle an unholy mixture of three different syntaxes:

  1. The ObjC foreign function interface syntax,
  2. AppleScript syntax,
  3. and regular expression syntax.
1 Like

Would you know of a library of Javascript regex ? thank you

I think this is true to a certain extent with learning all languages. The key is to take notes with links to the reference. In the note create your own example of text to process.

It is also useful to have a cheat sheet, just like we often do with apps that have a large number of keyboard shortcuts. Or, even like KM which has so many Actions very few could remember them all. So we have a KM Wiki.

In the case of RegEx, I have found this cheat sheet to be very useful:
Regex Accelerated Course and Cheat Sheet

Well, this is much like learning any language -- it is somewhat "chicken and egg" problem.
I do agree we tend to learn and retain knowledge the most when solving a real problem.
OTOH, you do need to become familiar with the tools in your tool box, to at least know of them even if you don't fully remember how to use them.

I have found when learning a new language, that it is very beneficial to reread the documentation several times as I learn more. After I have used the language for a while and developed a reasonable understanding of its terms, I learn a lot more when I read the manual a second/third time.

I have found that the more I use Regex, the more uses I find for it, and of course I am able to construct more complicated patterns. That is one reason that I like to help other users here in the KM forum that have problems dealing with text manipulation. It helps keep me sharp, and even expands my knowledge sometimes.

Finally I'll say this: Regex is a great example of the adage "use it or lose it".

1 Like

Tell me more ? ( Not quite sure that I have caught up with what a library of Regex might look like ).

A regex engine is part of the standard JS interpreter, and this is a good starting point for documentation:

[Regular expressions - JavaScript | MDN](Regular expressions - JavaScript | MDN)

JS strings also have some built-in methods like:

  • haystack.includes(needle) -> true | false
  • haystack.startsWith(needle) -> true | false
  • haystack.endsWith(needle) -> true | false

as well as:

  • haystack.match(regex) -> String
  • haystack.matchAll(regex) -> [String]

and

  • haystack.findIndex(test function) -> Zero-based index or -1
1 Like

The Regex engine used by JavaScript is significantly different from PCRE and that used by KM.
And, the JavaScript syntax to use Regex is quite different.
I don't see any advantages to use JavaScript just to execute a Regex search or replace function, when you have both of these easy to use as KM Actions.
Going down this path will just make Regex more confusing for you at this stage.

2 Likes

The point of JS is to avoid Regex (like the plague), (or like an addictive and time-wasting substance) most of the time :slight_smile:

Two days is no joke ...

1 Like

thank you

In the case of RegEx, I have found this cheat sheet to be very useful:
Regex Accelerated Course and Cheat Sheet

EXCELLENT site !! thank you !

Hey @ronald,

Yes, but...

:sunglasses:

Then you'll bang your head against the wall instead of getting your problem solved -- unless you really decide you're going to learn something from the exercise.

I like the cheat-sheet @JMichaelTX mentions, but I also like this one because of the filter.

https://www.debuggex.com/cheatsheet/regex/pcre

However my most used reference is my own:

I have a local copy of this bound to an AppleScript with a keyboard shortcut in BBEdit's Script-Menu.

--------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2008/01/05 05:34
# dMod: 2018/05/30 19:09
# Appl: TextWrangler
# Task: Open BBEdit/TextWrangler RegEx Cheat Sheet
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @System_Events, @TextWrangler, @RegEx, @Cheat_Sheet
--------------------------------------------------------

set preferredWindowBounds to {351, 45, 1393, 1196}
set bbeditCheatSheetPath to "~/Documents/BBEdit Documents/Documentation/RegEx Cheat Sheet.txt"

# Expand the $HOME-based (tilde) path above.
tell application "System Events" to ¬
   set bbeditCheatSheetPath to POSIX path of disk item bbeditCheatSheetPath

tell application "BBEdit"
   set bbApp to a reference to it
   
   tell document "RegEx Cheat Sheet.txt"
      
      if it exists then
         if index of its window ≠ 1 then
            set index of its window to 1
         end if
      else
         tell bbApp to open bbeditCheatSheetPath opening in new_window
      end if
      
      if bounds of its window ≠ preferredWindowBounds then
         set bounds of its window to preferredWindowBounds
      end if
      
   end tell
end tell

--------------------------------------------------------

Since I always compose regular expressions in BBEdit, my reference is only a keystroke away. Of course nothing is stopping your from making a macro to open one (or more) of the other references in your web browser.

I've also used this site quite extensively over the years:

And don't forget our own wiki page on regular expressions.

Learning things haphazardly on the Internet has its place, but for serious study you need a reference book or two (or ten).

I started my regex odyssey on the Internet back when there wasn't much content, and it was hard to find – and as I stated earlier I pulled my hair out a lot, and my wall got pretty bloody...

When finally I got serious I bought some books.

I have all of these plus several tomes on Perl:

I'm Interested in these two, but I haven't had my hands on them yet.

Note -- “Mastering Regular Expressions” by Jeffry Friedl has its merits but is very technical and not really for beginners.

Reading about regular expressions helps develop vocabulary, but it is only through repeatedly working with them that one develops proficiency.

Learning regular expressions is difficult for most people, but the rewards last a lifetime.

I use them every day either directly or in macros I've built that employ them.

-Chris

1 Like

IMO, learning RegEx is no more difficult that learning most programming languages, like JavaScript. In fact, I'd argue that you can develop solutions to common problems with a modest amount of effort. And, a real productivity benefit, is that the KM Actions for "Search" and "Replace" allow you to easily use RegEx with KM Variables.

Whether or not you want a physical book most likely depends on each person's learning style and preferences. And, I don't find learning on the Internet to be necessarily "haphazard". There are a number of excellent RegEx tutorials that are very methodic -- as good as most books. There are even some online training courses that are much better than just a book. I have used a number of StackStills.com courses, and find them excellent. I have not used this course, but here are some examples:
The Complete Regular Expressions Course with Exercises
Go from Zero to Expert in Building Regular Expressions

BTW, I have NEVER paid the full advertised price for StackSkills courses. If you do some searching you should find some major discounts, like 80-90% discount.

Actually, I find most books to be good as references, but not necessarily for learning, unless the book is specifically designed as a text book.

If you don't like the approach of the RegEx tutorial I suggested, Regular Expressions Quick Start, then do a search on "regular expression tutorial" and you will find a number of choices.

Regex Resources

A Google search on "regular expression training" found these web sites, and many more.
If none of the below appeal to you, then do the search to find one that does.

1 Like

I will register for the course today.

I like their style
I see Euro 30 (=$30) . Where did you see $10 ?
Isn't it windows only ?
thanks

Hi @ronald, you’ve clicked on the wrong link. @JMichaelTX was directing you to a udemy course but somehow you’ve pursued the path to purchase the RegexBuddy software. (It is indeed Windows-only.)

I suggest you try the link again https://www.udemy.com/course/the-complete-regular-expressions-course-with-exercises-for-beginners/

BTW, when I used Windows (a while ago now) I bought copies of both RegexBuddy and RegexMagic and found them really useful. On the Mac I use RegExRX all the time in preference to web-based sites. It’s not so great for learning regex but it works for me... YMMV.

2 Likes

@JMichaelTX @tiffle

@tiffle thank you for clarifying the conclusion.

I followed your suggestions and bought the course. So thanks a lot to both of you !!

2 Likes

--- and I just bougt RegexRx

RegExRx isn't perfect, but I've been a fairly happy user for the better part of a decade.

Aside from BBEdit and Regex101.com it's my go-to tool for figuring out complex regular expressions.

-Chris

1 Like

thank you. I bought it and will have a look when I finish the excellent Udemy regex course. The teacher's strong Indian accent makes it even more fun.

3 Likes

2 posts were split to a new topic: How Do I Add Line Numbers to End of Each Line?

So? How is that relevant?

Take Care,
Chris

(Keyboard Maestro Moderator)