How Do I Extract Pieces of Text from a Larger Block Using Regex with a Loop

Morning @Nige_S ,
I only want the main 'headings' that start with a number.
In the following block I would like to end up with zz_cb01 = Introduction, and zz_cb02 = Supplies Needed.
The numbering is not relevant at all.
There may be 10 'numbered' main points that I want into zz_cbxx variables, starting with 01. Or there may be 30 'numbered' main points.

And yes, @griffman solution works perfectly.

I wouldn't mind getting my head into a little more regex within the 'for each' action but I just can't get it.
I know a few on the forum are not in favor of any regex that is longer that the first digit on you thumb.....

I'm continuing to try the 'for each' approach, with either "the lines in" or "the substring" with regex, all to no avail.

Cheers

1. Introduction
	•	Importance of a Clean Coffee Maker: Explain how regular cleaning affects coffee flavor and machine longevity. Mention the buildup of oils, coffee grounds, and mineral deposits that can alter taste.
	•	How Often to Clean: Recommend a cleaning frequency, such as a quick clean after each use and a deep clean every 1-3 months, depending on usage and water hardness.
2. Supplies Needed
	•	Essentials: List vinegar or descaling solution, mild dish soap, water, a soft brush, a clean cloth, and a small brush or toothpick.
	•	Eco-Friendly Alternatives: Suggest lemon juice or baking soda for those who prefer natural cleaning methods.

Morning, well, upon getting another block of text, the solution does not work.
It works on the following: giving me
zz_cb01 = Introduction
zz_cb02 = Supplies Needed
zz_cb03 = Step-by-Step Cleaning Process

1. Introduction
	•	Importance of a Clean Coffee Maker: Explain how regular cleaning affects coffee flavor and machine longevity. Mention the buildup of oils, coffee grounds, and mineral deposits that can alter taste.
	•	How Often to Clean: Recommend a cleaning frequency, such as a quick clean after each use and a deep clean every 1-3 months, depending on usage and water hardness.
2. Supplies Needed
	•	Essentials: List vinegar or descaling solution, mild dish soap, water, a soft brush, a clean cloth, and a small brush or toothpick.
	•	Eco-Friendly Alternatives: Suggest lemon juice or baking soda for those who prefer natural cleaning methods.
3. Step-by-Step Cleaning Process
	•	Step 1: Empty and Prepare: Remove coffee grounds and rinse removable parts like the carafe and filter basket.
	•	Step 2: Clean Removable Parts: Wash the carafe, filter basket, and any removable parts with dish soap. Rinse thoroughly to avoid any soap residue.
	•	Step 3: Clean the Reservoir:

It does not work on the following:

1. Introduction to the Role of a Director of Residences
   • Definition: Explain what a Director of Residences is and how they fit within a high-net-worth household.
   • Value to High-Net-Worth Families: Highlight the role's importance in ensuring a seamless lifestyle, managing multiple properties, and handling complex logistics.
   • EEAT Signals: Emphasize the expertise, reliability, and high-level management skills that distinguish this position.
2. Key Responsibilities of a Director of Residences
   • Property Management: Oversight of multiple residences, including maintenance, repairs, and vendor management.
   • Staff Management: Supervision of household staff, ensuring standards, training, and scheduling.

I don't see why it wouldn't work. There must be some kind of formatting in the way.

Ah, so "without preceding number/period" was a typo. Gotcha.

Much of KM -- indeed, programming in general -- is taking what you already know/have and tweaking it to fit a particular situation.

There's already a bunch of ways above that will put each line of some text into individual variables. The main difference between the original problem and this is that you only want to process lines that start with one or more numbers then a period.

Two ways of doing that are

  1. For each line in the text -- if it starts with numbers and a period then create a new variable and store the line in it. If it doesn't, ignore it

  2. For each line in the text that starts with some numbers then a period -- create a new variable and store the line in it
    image

The first builds a Collection containing every line and then works through that Collection looking for matches to process. The second builds a Collection of lines that match and then works through that Collection processing them.

Two approaches to the same problem, both "correct", either can be more "correct" than the other depending on the situation.

Here's the first version in action:

Extract Numeric Lines v3.kmmacros (9.7 KB)

Image

And here's the second:

Extract Numeric Lines v4.kmmacros (8.1 KB)

Image

You can test by changing the number in the final "Display Text" action. And, as before, I've used local variables for testing -- enable the disabled action and disable/delete the action before it to use globals.

There's nothing special about the regex. What is special is the Collection you are creating for the "For Each" action to then work through. Using the above as an example, the action says (ignoring the "item" variable for now):

"Go through the variable Local_theText and, with ^ and $ matching start- and end-of-lines instead of the start and end of the entire string (the (?m), each time you match a substring with some numbers, a period, and a space at the start of the line (^\d+\.\s) followed by as many any characters (excluding new lines) as possible (.*) to the end of the line ($), add that match as a new item to the Collection.

So your Collection starts empty:

{}

...and your regex scans the text from the start, finds the first match, adds a copy of that match to the Collection:

{"1. Introduction"}

(I've used quotes to show that this is an element in the Collection.)

The regex then continues to the next match and adds that:

{"1. Introduction","2. Supplies Needed"}

...and you now have a Collection containing two elements, each a substring of your original text that matches your pattern.

And so on, through the text, until your Collection is

{"1. Introduction","2. Supplies Needed","3. Step-by-Step Cleaning Process","4. Deep Cleaning Tips","5. Additional Maintenance Tips","6. Troubleshooting Common Issues","7. Benefits of Regular Cleaning","8. Conclusion"}

Note that the Collection is "ordered" -- the first item you add is the first item in the Collection, the second the second, etc. It is not sorted, even if the numbers in your example text make it look that way!

All that is mere preparation for the main event -- the magic of "For Each". The action now works through the Collection -- one element at a time, from first to last -- doing whatever is in its "execute" block. So in our example (again, blanking out irrelevant sections):

...the variable Local_line is set to the first item of the Collection, "1. Introduction" (without the quotes!). The "Set Variable" action evaluates the token in the field for the variable's name -- 01 for the first time through the loop -- and appends that to our literal text "Local_zz_cb" to create the complete name: Local_zz_cb01, then sets the value of that newly-created variable to the evaluation of the text token %Variable%Local_line%. We then add 1 to Local_i.

End result -- we have a variable Local_zz_cb01 with the value "1. Introduction", and Local_i is 2.

And we go back to the top of the loop. Local_line is set to the next item of the Collection, "2. Supplies Needed", and it all happens again but with our new values for Local_line and Local_i.

And so on, until all items in the Collection have been processed.

OK -- that's really long-winded (luckily, KM is a lot faster at doing than I am at explaining) but hopefully it makes some sense. If I've confused more than helped then please, please, say -- it'll be a problem with my explaining and not your understanding!

When I paste your new text into my demo macro, it works perfectly. How are you getting the text into the macro in the "real" version?

-rob.

wow @Nige_S you are always so generous with your expertise and time man,, woof, thank you.

I used version 1 and it is working as desired. I'll make some small edits later for my uses.
AND, I really benefited from your detailed explanation about version 2 and the collection approach.

EDIT: already =), what if I did want the initial number/s / period and space taken off the top of the 'result' in version 1. - I know I could cobble something together, but it would be me hacking...

Can't thank you enough.....
I'm sure I'll be checking back on some of this.... =)
All good,
Cheers

Hey @griffman , I'm sure it's me...
Here's your original pretty much, except I have added two text blocks at the top to test with and the display of the variables at the end.
When I toggle the first block on, it works and populates the variables that I am looking for.
When I toggle the first block off and the second on, it does not display as selected.

Keyboard Maestro Actions.kmactions (10 KB)

I think I got it, removing the number/s, period and space at the top.
I added the red action. Seems to work for me.
thanx

Yep, that's great. I'd just add that you've already got a version of the match you need in the "If..." so you could have used that, suitably expanded: ^\d+\.\s(.+)

You could do similarly with the other version.

1 Like

hey bud, as an aside, and I can start a new post, but....

I looked for 'how do I post a macro"
and didn't find what I was looking for.

How do you get the sideways carot to show / hide the images of the macros that you post up to the forum?

cheers

Click on the "cogwheel", top-right corner of the "post editing" box, and select "Hide Details":

image

If you prefer to work directly in Markdown, wrap whatever you want to hide in details tags:

[details="Summary"]
This text will be hidden
[/details]

...gives you:

Summary

This text will be hidden

Many more Forum formatting tricks in @_jims's excellent Entering and Enhancing Forum Posts post.

1 Like

My versions are courtesy of the excellent Automating Sharing Macros or Actions to the Forum by @ccstone. Assign it to a hot key, select the macro to share, hit the hot key, and it winds up on your clipboard, ready to paste into a new/existing topic.

-rob.

1 Like