Search Regular Expression Multiple Line Regex Issues

FIrst time posting here, so go easy on me. :wink:

I've been working on this problem for a few days now, and just can't seem to figure it out. I've read a ton of other discussions on the use of regular expressions/searching multi lines for use with Regex to parse out individual bits of info.

In a nutshell, I have a tab delimited lookup table that I have set up in a text file. There are only a few rows in it now, but there will ultimately probably be several hundred rows in it when it's all said and done. Nevertheless, the four rows I have in it now should be enough to develop a proof of concept.

The first column contains a unique six character identifier in each row. These six character identifiers are taken from plugin names assigned by the Mackie MCU protocol used in my DAW. The next three columns are intended to be used as a place where I can then input up to a max of three words (one in each column) which will then ultimately be combined into a single variable for use with the KM "Stream Deck Set Title of Button" action.

I've successfully used a shell script in KM to locate and produce the appropriate info from the desired row in the table, based on the unique six character identifier I'm asking it to look for. This six character identifier dynamically changes, but I've got that part figured out, and don't have an issue getting this dynamic info to the shell script. It always spits out the corresponding info from the adjacent columns, for that given row where the six character identifier is located. So I'm good there.

However, when I go to use the "Search Variable" action, using regular expression, this is where I run into problems. I've gotten it to spit out the word in each adjacent column, but it's having trouble if I leave any of these entries blank in the lookup table. I may not always have need for use of a word in each of the three columns in the lookup table, so I want to be able to leave some of them blank. It may be that I only need to enter a word into the first and/or second columns of the lookup table, leaving the third column blank, for example. This causes problems.

It seems to maybe be related to something to do with the end of each line not being defined? I'm not sure. I have noticed that when I just go and add "X" in a column further to the right in each row of the lookup table, the problem goes away. If I remove the X out to the right in each row of the lookup table, the regular expression action in KM just returns nothing for each capture group. If I put the Xs back in, it works again.

However, I don't want to have to go in and place an X in each row. I'd like to understand what the problem is here, and fix it so that you just enter a word or words into any or all of those three columns without worry that the Regex isn't going to function properly. So the problem is two fold. One, it doesn't like me leaving an entry field blank for any of the rows/columns. Two, for this to work, it seems to need an entry (using an "X" here) in a column to the right of the columns I actually care about.

The problem seems to likely be with how I've got my Regex set up. I'm pretty much a novice with Regex, so I imagine it comes down to a mistake I've made there. Can somebody help me understand how I need to setup my Regex to do what I want it do? I've included a screenshot from KM for the relevant part of my macro. I've also included a screenshot of my lookup table. Thanks

Welcome to the forums! Hopefully we'll be able to help you figure out what's going on. If I read the essence of the problem correctly, you have some text that looks like this:

175BCm	175-B	Comp	UA
GlxyTp	Galaxy	Tape
176Cmp	176	Compressor

And you want to extract up to four separate capture groups, depending on how many fields are in each row, right?

Assuming that's right, there is a regex solution, but it's really messy. Instead, you might want to try using Keyboard Maestro's array variables, which make it easy to extract elements from groups of data. And while there may be a way to use Tab as the array delimiter, it's easier to just use a comma, as that's the default character in KM.

regex work.kmmacros (6.9 KB)

Macro screenshot

The attached macro uses a variable to represent your text file, but it would work just as well using the results of your grep on the line that's returned. (I just wanted to show it processing more than one result).

It starts by replacing the tabs with commas, which are the default array delimiters in KM. Then it just reads each item in the array into the variable that you want to use. If the array item is empty, that variable is empty.

I've left the regex solution at the end, but it's quite complicated as it relies on optional non-capturing groups. (It does work, though.) If it were my macro, I'd be using the array solution, just because readability and debugging seems much simpler to me.

Here's what the macro split out:

Is this what you were trying to accomplish?

-rob.

1 Like
  1. Yes, I think you got the gist of what I was trying to accomplish.
  2. So that Regex worked, from what I can tell. I pasted it in, and now everything seems to work as I wanted it to. No "X"s required at the end of each line.
  3. I don't fully understand what you did with that new Regex, but it seems to work. I can see though why you said it was messy, and why I was having trouble with this. I don't understand what it is doing. I'll have to spend some time with it to see if I can understand what you've done there with that Regex. What was wrong with my Regex, in a nutshell?
  4. As for the macro-based alternative you're proposing, I looked thru it, and I think I understand what it is doing, but I'll have to play with it some. Now that I have a Regex that is working, I won't really ever need to change it. Other then having to figure out how to format the Regex (which you've already done :smile: ), and other than the ease of debugging (which you already mentioned), what additional benefit is there, if any, to using the macro approach to the Regex approach? If they both will work, is one going to work faster than the other one or something?
  5. I won't bother with the replace tabs with comma bit. My text file is sufficiently small enough at this time that I would just start over with a CSV file if I decided to go with the macro approach. No biggie to start over there. It's just a few lines at this point.
  6. Thanks

Instead of fiddling about with replacing tabs by commas, you can indeed use tabs as array delimiters like this for example:

%Variable%local_theLine[1]\t%

Note that this is a feature added in KM v11,

"Regex" and "nutshell" don't really go together very well :).

Honestly, I didn't look at your regex in detail, I just started from scratch. But you are right, the problem is that you don't have all the fields in all the lines, so your third and fourth capture groups won't work. Looking now, and testing on a regex tester, I think this would have worked, but not for all cases:

(?m)([^\s*$]*)\t([^\s*$]*)\t([^\s*$]*)?\t?([^\s*$]*)?

The trailing ? creates an optional element (either an element or a group), and all of your lines (in the demo, at least) had three fields, so only the tab and the last group would need them. If you have some that have two fields, you'd need to do the same with the prior tab-capture group.

And that's basically all my version is, built with optional capture groups (?:\t(\S+))? for the last two fields. I just bundled the tabs with the optional capture groups ... which, come to think of it, might be bad :). You should probably Trim Whitespace on the results.

The regex version will probably be faster, just much harder to create and debug. Array variables are quite powerful, and they make it really easy to see what's going on. It would basically just replace the regex statement in your existing macro. But if speed's important, regex is the way to go.

-rob.

1 Like

Yeah, speed is important to me on this. So I might still be inclined to go Regex.

There's a few little wrinkles to the Regex thing that I'm discovering as I continue to test the Regex you provided in your first post.

  1. I should have been more clear about what I was hoping to be able to put in each of the four fields in my lookup table. The first field will always have a six character MCU identifier. That will not change. It's also a safe bet that, at minimum, the second field will always have something in it. However, the next two fields may or may not have anything in them, depending on the name needed there. What would need to be done to adjust the Regex you originally provided to account for this? Would it be just as simple as swapping in (?:\t(\S+))? for \t(\S+) in the second field section of the Regex?

  2. Given the limited space on Stream Deck buttons, I was generally thinking of one word per field, given that this would normally be all that would fit. However, it's possible that two short words like "API 2500" would fit all on one line on a SD button, in which case one single field on the lookup table could, on occasion, contain two words, which means it would also contain a space between those two words. I noticed this when I typed in "API 2500" into one field in the lookup table and then realized that it broke the Regex. What would it take for the Regex to add in the ability to recognize the space (and leave it in place) within a single field?

  3. I'm not sure I understood your comment about trimming white space.

Can you possibly post a longer list of the text in your file, containing samples of anything you'd like to handle? The problem is that regex is quite particular, so I built it to match the three lines I had to work with.

-rob.

Sure. See below. for a more expanded idea of what I'm talking about.

Basically, here are the various scenarios below that I could encounter. The first field will always be the same, as I mentioned before (six characters, one word, always).

  1. Second field: One word
  2. Second field: Two words
  3. Second field: One word Third field: One word
  4. Second field: Two words Third field: One word
  5. Second field: One word Third field: Two words
  6. Second field: One word Third field: One word Fourth field: One word
  7. Second field: Two words Third field: One word. Fourth field: One word
  8. Second field: One word. Third field: Two words. Fourth field: One word
  9. Second field: One word. Third field: One word. Fourth field: Two words
  10. Second field: Two words. Third field: Two words. Fourth field: One word
  11. Second field: Two words. Third field: One word. Fourth field: Two words
  12. Second field: One word. Third field: Two words. Fourth field: Two words
  13. Second field: Two words. Third field: Two words. Fourth field: Two words

That's all of the possible combos I can come up with. The space on the Stream Deck wouldn't realistically allow any more than two words per line anyway, and they'd have to be relatively short words, as it is. But that CAN happen with names that have multiple words that also happen to be short words (or abbreviated words).

Bottom line, I was just hoping to allow for at least two words per field, so it really just comes down to the same caveats for fields two thru four. The ability to allow at least two words per field, which then means there could be as many as six words total (two per line, three lines) on a Stream Deck button.

Perhaps it would be more precise to just say that I would like any spaces used between two or more words in a given field (fields 2 thru 4) to be preserved, otherwise trim any spaces at the beginning and end of any word or combination of words within a given field.

Please note that, in the case of the API 2500 Buss Compressor, that's indicating that "API 2500" would be in field two, "Buss" would be in field three, and "Compressor" would be in field four. So there would be a space between "API" and "2500" and then a tab between "2500" and "Buss" and a tab between "Buss" and "Compressor".

Hope that all makes sense. Thanks again.

And here, apparently, is your regex:

^([^\t]+)(?:\t([^\t]+))?(?:\t([^\t]+))?(?:\t([^\t]+))?$

See, this is why I'd use the array variable method, even at a slight speed hit :smiley:.

And most of the above is not my doing, but ChatGPT, as this is beyond what I usually try to do with regex. I had a version that I thought was close, but wasn't handling things correctly after field two. When I gave it to ChatGPT, it tossed most of it and returned the above.

Here's ChatGPT's explanation of how this works:

Explanation:

  • ^: Asserts the start of the line.
  • ([^\t]+): Captures one or more characters that are not tabs.
  • (?:\t([^\t]+))?: Non-capturing group for an optional tab followed by characters that are not tabs. This group is repeated three times to capture up to four groups.
  • $: Asserts the end of the line.

This regular expression will capture up to four groups in each line, separated by tabs. If there are fewer groups, the additional capturing groups will be empty. You can adjust the number of (?:\t([^\t]+))? segments based on the maximum number of groups you expect to capture.

I had it create a sample data set of 100 such groupings, and the above regex seemed to handle it all correctly. But if I went back and looked at this six months later, I'd have no clue what it was doing :).

-rob.

Sorry to ask, but is the goal here to fit as many words as possible onto a Stream Deck button? If so, I have a way that doesn't involve regex at all.

No. I'm just planning for the occasional worst case scenario. I'd say that I generally won't have more than four words, with two and three being more common. But there will be some times that four words is needed. And if I'm going to make it so that one field can have two or more words, I might as well make it so they all can (other than the first field which will never need it). Because, depending on word size and where a particular word might come in the naming sequence, I might need the two words to be in any of the three fields.

In any case, I'd be curious what you have in mind. Lay it on me.

My goal is to solve this problem without regex or fields. You mentioned Stream Deck and I'm aware of the limitation of the KM editor's insistence on a single visible line of data, so that's why I was thinking you were doing this simply for the ability to create multi-line buttons. Maybe I was wrong. I'm still unsure what the ultimate goal is here.

Yes, you can break up lines in a variable using regex, but you can also break up a single line with this action: (and it's easy to enhance it to work with multiple lines of input)

That action will create a variable contining a multi-line string that's guaranteed to fit on a stream deck button, assuming that the font you are using allows for a maximum of ten characters per line. You can change the number 10 to whatever number suits you best.

I would add one more thing to remove ugly spacing/tabbing issues:

That's a string of characters in that Regex, for sure. Haha. The thing is, I'm going to be using this for mixing and recording music, so speed is a huge plus for this kind of stuff.

I didn't realize you were using ChatGPT for this. I was wondering how you were spitting this back to me so quick.

I've been playing with this and it seems to work exactly as I want it to. Matter of fact, based on the idea that it's ignoring tabs and nothing else with the use of ([^\t]+), this should be able to accept more than two words in any given field. And Chat GPT even indicated that you could add on as many (?:\t([^\t]+))? as you needed, based on how many fields/groups you needed to capture. So I could always add another one of these to accommodate a fourth line on the SD Button if I ever needed to, at which point there simply wouldn't be any more room for a fifth line anyway, so this kind of future proofs me to point of no further possibility.

Sweet. I think is exactly what I was looking for. Thanks so much for your help.

The goal here is to offer the maximum flexibility on how info can be entered into available fields in my lookup table, via KM, and then have that translate to the best use of the limited screen real estate on a SD button.

I've been using \n in my macro to force a line break to the next line for each subsequent variable corresponding to an individual field in my look up table.

Let me see if I understand you corrrectly. Are you saying that your method would identify the word in the first field as a match, and then subsequently treat everything after that, in that same line, as one single string, and then intelligently force a wrap around of sorts on that string to make it fit on a SD button? So I wouldn't need to mess with tabs then? Just type the words I want, with spaces in between, and then the shell script would handle the rest?

That's almost what I mean. You mentioned the phrase "first field." But in my solution there are no "fields" at all. I'm simply squishing text. The fact that your first word of text fits on a single line of a stream deck button is merely a nice coincidence.

You can still use regex if you want. But shell commands are so powerful that sometimes they're much easier to use.

To be clear though, this table is being used as a translator of sorts. Match to the 6 character identifier in the first field, and then translate that to the user defined words placed in the rest of the fields in that line. And this identifier has to be looked up, based on dynamic input. IOW, this identifier term is constantly changing, so it needs to be able to reference a lookup table to translate to the correct words I want displayed on the SD button.

So I don't want to actually see that first word (the 6 character identifier) anywhere on the Stream Deck button. That identifier is just being used to translate. Does your shell script remove that 6 character identifier, and then treat the rest of the characters in that line to a wrap around, or does that the identifier remain visible on the SD button? Sorry, I'm not an expert in scripting, so I'm not able to make a whole lot of sense of the script you wrote.

In any case, if I understand you correctly now, it sounds like you could just have a bunch of lines in a text document, with each line starting with the unique 6 char identifier and then, placing in a string directly after the identifier, the words you want to see on the SD button. Would that be a correct assessment of what you're saying?

Let me see if I can adapt that idea to my solution.

I'm still pondering it.

To be clear, I only used it for the last one, as it had me stumped :).

-rob.

I wasn't sure what the source of the data was, but if you're in full control of it, then yea, if you can preset it to what you want, it becomes very simple to just match and strip the last bit off, if I understand what you're saying correctly.

-rob.