How Do I Extract Pieces of Text from a Larger Block Using Regex with a Loop

Try this for a starting point:

  • Your Alt Texts are every line that begins with "- Alt Text: "
  • Your Title Attribs are every line that begins with "- Title Attribute: "
  • Your Descriptions are every line that begins with "- Description: "

You can use a "For Each: substrings" for each of those, then use the same method as the previous macro to get them into individual variables.

Have a go and see how you get on.

1 Like

the for each thing is still hanging me up... I just have a wall about it, I guess

Then take it back a step...

Doing as described above will only work if every "record" has a line for every "field". If they might not you might be better off going line-by-line, testing and assigning as you go.

That's also easier to get your head round than a "Collection of substrings" using some weird regex. So give it a go using "For Each: Lines in Collection".

Crude I know.... work in progress..... more tomorrow

Image 01-05 Information Regex K0334 TESTING Macro (v11.0.3)

Image 01-05 Information Regex K0334 TESTING.kmmacros (59 KB)

FWIW a single-variable JSON version, so that you can write things like:

File name of image 4:
	%JSONValue%local_JSON[4].Filename%

Image Title Attribute of image 4:
	%JSONValue%local_JSON[4].Image_Title_Attribute%

JSON Array of Image Details.kmmacros (7.9 KB)


Expand disclosure triangle to view full JSON
[
  {
    "Filename": "1 bridal-attendant-01.jpg",
    "Alt_Text": "Muffetta Household Staffing Agency bridal attendant assisting during wedding preparation. A bridal attendant helping the bride put on earrings.",
    "Title_Attribute": "Bridal attendant services by Muffetta Household Staffing Agency",
    "Description": "A bridal attendant helping the bride put on earrings. 🏡👰 Muffetta Household Staffing Agency provides professional bridal attendant services for a perfect wedding day. #muffettastaffing",
    "Image_Title_Attribute": "Bridal Attendant Helping Bride with Earrings"
  },
  {
    "Filename": "2 bridal-attendant-02.jpg",
    "Alt_Text": "Muffetta Household Staffing Agency bridal attendant assisting during wedding preparation. A bridal attendant assisting the bride with her wedding dress.",
    "Title_Attribute": "Bridal attendant services by Muffetta Household Staffing Agency",
    "Description": "A bridal attendant assisting the bride with her wedding dress. 🏡👰 Muffetta Household Staffing Agency provides professional bridal attendant services for a perfect wedding day. #muffettastaffing",
    "Image_Title_Attribute": "Bridal Attendant Helping Bride with Wedding Dress"
  },
  {
    "Filename": "3 bridal-attendant-03.jpg",
    "Alt_Text": "Muffetta Household Staffing Agency bridal attendant assisting during wedding preparation. A luxury wedding venue with elegant pink and white decor.",
    "Title_Attribute": "Bridal attendant services by Muffetta Household Staffing Agency",
    "Description": "A luxury wedding venue with elegant pink and white decor. 🏡👰 Muffetta Household Staffing Agency provides professional bridal attendant services for a perfect wedding day. #muffettastaffing",
    "Image_Title_Attribute": "Elegant Wedding Venue with Pink and White Decor"
  },
  {
    "Filename": "4 bridal-attendant-04.jpg",
    "Alt_Text": "Muffetta Household Staffing Agency bridal attendant assisting during wedding preparation. A bridal attendant helping a bride adjust her earrings.",
    "Title_Attribute": "Bridal attendant services by Muffetta Household Staffing Agency",
    "Description": "A bridal attendant helping a bride adjust her earrings. 🏡👰 Muffetta Household Staffing Agency provides professional bridal attendant services for a perfect wedding day. #muffettastaffing",
    "Image_Title_Attribute": "Bridal Attendant Adjusting Bride's Earrings"
  },
  {
    "Filename": "5 bridal-attendant-05.jpg",
    "Alt_Text": "Muffetta Household Staffing Agency bridal attendant assisting during wedding preparation. A wedding coordinator preparing decorations for a luxury wedding event.",
    "Title_Attribute": "Bridal attendant services by Muffetta Household Staffing Agency",
    "Description": "A wedding coordinator preparing decorations for a luxury wedding event. 🏡👰 Muffetta Household Staffing Agency provides professional bridal attendant services for a perfect wedding day. #muffettastaffing",
    "Image_Title_Attribute": "Wedding Coordinator Preparing Luxury Event Decor"
  }
]
Expand disclosure triangle to view JS source
return (() => {
    "use strict";

    const main = () =>
        parts(/\s*---\s*/u)(
            kmvar.local_Source
        )
            .map(
                x => dictionary(
                    lines(x).map(keyValue)
                )
            );

    // --------------------- GENERIC ---------------------

    const dictionary = kvs =>
        kvs.reduce(
            (a, [k, v], i) =>
                0 < i
                    ? ({ ...a, [k]: v })
                    : a,
            {}
        );

    const keyValue = s => {
        const [k, ...v] = s.split(":")

        return [
            noBullet(k).replace(/ /g, "_"),
            v.join("").trim()
        ];
    };

    const lines = s =>
        // A list of strings derived from a single string
        // which is delimited by \n or by \r\n or \r.
        0 < s.length
            ? s.split(/\r\n|\n|\r/u)
            : [];

    const noBullet = s =>
        s.startsWith("- ")
            ? s.slice(2)
            : s;

    const parts = delimiter =>
        s => s.split(delimiter).filter(
            x => 0 < x.trim().length
        );


    return JSON.stringify(main(), null, 2);
})();
1 Like

Again (and you'll hate me for this), back up a step.

Consider:

That could mean

  • I want to create variables Alt_Text_1 through Alt_Text_5, Title_Attributes_1, through Title_Attributes_5, and Descriptions_1 through Descriptions_5, or
  • I want to create variables myVar_1 through myVar_15

Which will make your life easier for present, and especially future, you -- naming each variable for what it is for and having easy access to "related" values, or having one big, undifferentiated, set of variables and remembering that each "label is in every third variable but you start counting at different positions"?

Even if you go with the "one big group" approach (again, you may be constrained by the macro/process the results of this will feed in to), you don't need to literally reference cb_zz01, cb_zz02, etc. See the macro in this post for how you can use the counter to do the numbers for you.

1 Like

Thank you for your help. Greatly appreciated.
There is a reason for the variable to be named in the current fashion so in this case it is not beneficial to use different names.

Can you check the link to 'this post' it seems to go to this current post.
I'd like to take a look at it. Although I've been here before with a counter number being added to a variable to create a 'numbered variable'.

Again, I am grateful @Nige_S and @ComplexPoint

@ComplexPoint - woof! wow, very quick, and powerful.
I'll have to ask LLM what the JSON is doing so I can at least start to understand what it's doing.
Thank you.

It links about 10 posts up (works fine for me) -- the "QandAs into Many Vars" macro.

ah, thank you

It isn't really "doing" – just defining the pattern of an output.

return (() => {
    "use strict";

    const 
        sectionDividers = /\s*---\s*/u,
        lineDividers = /\s*\n\s*/u;

    const main = () =>
        // An array of sections,
        kmvar.local_Source.split(sectionDividers)
        .flatMap( 
            // with non-empty sections represented as key:value dictionaries,
            section => 0 < section.length
                ? [
                    // A key:value Object (dictionary) of key-value pairs,
                    Object.fromEntries(
                        // obtained from an array of lines in the section,
                        section.split(lineDividers)
                        
                        // with each line sub-divided into a key:value entry pair.
                        .map(keyValue)
                    )
                ]
                // and empty sections discarded.
                : []
        );

    // --------------------- GENERIC ---------------------

    const keyValue = s => {
        const [k, ...v] = s.split(":")

        return [
            noBullet(k).replaceAll(" ", "_"), 
            v.join(":").trim()
        ];
    };

    const noBullet = s =>
        s.startsWith("- ")
            ? s.slice(2)
            : s;

    return JSON.stringify(main(), null, 2);
})();

JSON Array of Image Details (JS simplified- commented).kmmacros (8.0 KB)

1 Like

I'm going to argue this one -- it is always beneficial for variables to have meaningful names, and more obvious information is nearly always better. When you get an error in a couple of months time and need to work out what's missing, which helps more?

  • zz_cb07
  • Alt_Text_3
  • Alt_Text[3]
  • ImageDetails[3].Alt_Text

For the first you'll need to work your way through your macro, maybe do some finger counting. For the rest (single variables, array, and JSON) you can see at a glance that image 3's alt text is the problem.

But that's something to consider for future macros. For this one, treat it just like a printed sheet that you are extracting info from to a bunch of individual Post-Its. Most people would go down the sheet one line at a time, see if the start of the line matched anything they were looking for, and if so they'd write that info on a newly-labelled Post-It.

In pseudo-code:

set my post-it counter to 1
for each line in the text
   if the start of the line matches something I'm looking for
      copy the line to a new post-it labelled with the post-it counter number
      increase the post-it counter by 1
   end if
end for

Now consider the "if..." bit of that. In this case you don't care what it matches, just that it does match -- your "fields" are in the same order throughout, so working line-by-line means they'll be written to your Post-Its in the right order.

So, re-writing the above in a more KM way:

set counter to 1
for each line in text
    -- use enough for a single match AND enough that you understand the meaning at a glance
   if start of line is "- Alt" or "- Title" or "- Description"
      then
         set variable zz_cb<counter> to line
         set counter to counter + 1
      otherwise
         -- there is no otherwise! Saying that shows you didn't forget...
   end if

At which point you can build the macro almost line-for-line from the pseudo-code:

Image Attributes (Multi Var).kmmacros (10.0 KB)

Image

(As before, I've used local variables for testing -- delete "Set Variable 'Local_zz_...' " and enable "Set Variable 'zz_...' " to use globals.)

And you can make it more concise by replacing the three "If..." conditions with one:

1 Like

Morning, that's a great solution and I appreciate all the explanations!
I'm closer to understanding the for each method.
Question for learning, would the 'substring' 'mode' usually be a regex?
Thank you, Troy

"It depends."

Pick the simplest solution that meets your needs. If you wanted to extract each number from 1,2,3,4,5 it would make sense to use a "separated by a simple string" match:

But if it was from Every1good2boy3deserves4fish5 you'd ignore the separators and use a regex to "get each number in":

It is all just text matching. The difference is that a "simple match" looks for the same literal character(s), a regex looks for anything that matches a pattern you define, and the action lets you choose whether to extract "the things that match" or "the text separated by the things that match".

2 Likes

I am trying to get the numbered lines only into variables.
The main lines that start with a number.
I have gotten the first one but I cannot get the others....
What am I missing.... ?

I just took another look at it, I know it's wack....

Main Points Text of Number lines Only Regex K0338 Macro (v11.0.3)

Main Points Text of Number lines Only Regex K0338 .kmmacros (51 KB)

One way to do this is to remove everything that's not a line that starts with a number followed by a dot, then remove blank lines:

Download Macro(s): Extract numeric lines.kmmacros (7.0 KB)

Macro screenshot

Macro notes
  • Macros are always disabled when imported into the Keyboard Maestro Editor.
    • The user must ensure the macro is enabled.
    • The user must also ensure the macro's parent macro-group is enabled.
System information
  • macOS 14.7
  • Keyboard Maestro v11.0.3

The regex finds lines that don't start with a number-dot combo, and replaces them with nothing. Then the blank lines are deleted. Run that, and I think it's what you want:

-rob.

1 Like

Hey @griffman , nice... I'd like to get the 'lines of text without the preceding number/period and space into variables such as:
zz_cb01
zz_cb02
zz_cb03
zz_cb04
zz_cb05
zz_cb06
zz_cb07
zz_cb08
zz_cb09
zz_cb10
etc etc

I was trying to do it programmatically, I've done it before but can't find that macro and don't completely understand the use of %%'s and processing tokens....
cheers

There are probably—no, definitely—better ways of doing this, I imagine. I just sort of brute forced it together to see if it'd work, and it seems to. Run it, then check your global variables.

Download Macro(s): Extract numeric lines v2.kmmacros (8.6 KB)

Macro screenshot

Macro notes
  • Macros are always disabled when imported into the Keyboard Maestro Editor.
    • The user must ensure the macro is enabled.
    • The user must also ensure the macro's parent macro-group is enabled.
System information
  • macOS 14.7
  • Keyboard Maestro v11.0.3

You can control the number of leading zeros by changing this line: %Dec2%local_loopCount% — replace the 2 with 3 or 4 or whatever.

New in this version is another regex to remove the leading number/dot, and then the loop to go through each row in the variable, assigning each to a new variable.

-rob.

woof bro.... nice, works like a charm,,,,, I'm off to bed but I will definitely get into it tomorrow to understand what/how it's doing what its doing.
the %Dec02% is nice, never knew that.
Cheers.....

I haven't actually tried @griffman's macro yet -- but does it actually do what you want? More importantly -- have you completely defined what you want?

(Emphasis mine.)

Your sample data looks to have "headings" followed by multiple lines. Taking your words above literally, that would mean that, for example (and with variable numbers made up):

4. Deep Cleaning Tips
    • Hard-to-Reach Areas:...
    • Dealing with Stubborn Stains:...
    • Descaling:...

...would result in

  • Variable zz_cb10 being set to "• Hard-to-Reach Areas:..."
  • Variable zz_cb11 being set to "• Dealing with Stubborn Stains:..."
  • Variable zz_cb12 being set to "• Descaling:..."

...with your variables' numbers bearing little relationship to your headings' numbers. Is that what you want, or are you aiming for a variable for each "heading" that contains all the lines "belonging" to that heading?