Newbie Help Please - How to Process Text?

EttVenter · January 27, 2023, 12:05pm

If I gave KM a string like this:

Things, Stuff, More Things

And I want that string to be formatted and edited to look like this:

[[Things]]
[[Stuff]]
[[More Things]]

How would I go about doing that? I did use the search, but I couldn't find anything. Thanks in advance!

Nige_S · January 27, 2023, 12:38pm

The simplest way to transform

Things, Stuff, More Things

to

[[Things]]
[[Stuff]]
[[More Things]]

...is to do just as you would in Word or TextEdit -- "Search and Replace" the , with

]]
[[

to give

Things]]
[[Stuff]]
[[More Things

...then add [[ to the beginning and ]] to the end.

As a KM macro:

Munge Text.kmmacros (3.1 KB)

Image

unlocked2412 · January 27, 2023, 2:00pm

Another route could be (using Swift), to:

produce a list, splitting on the character ,

.split(separator: ",")

remove whitespace from both ends of each element of the resulting list

.map({$0.trimmingCharacters(in: .whitespaces)})

enclose each element of the resulting list

.map({"[[\($0)]]"})

and, finally joining the list using the newline character (\n).

.joined(separator: "\n")

Enclose list in double brackets.kmmacros (2.1 KB)

Expand disclosure triangle to see "Swift" source

import Foundation

func main() -> () {
    let str = ProcessInfo.processInfo.environment["KMVAR_localParameter"]!

    return print(
        str
        .split(separator: ",")
        .map({$0.trimmingCharacters(in: .whitespaces)})
        .map({"[[\($0)]]"})
        .joined(separator: "\n")
    )
}

main()

DanThomas · January 27, 2023, 4:58pm

How about like this - the green actions are the ones that actually do the work:

The first green action searches for ", " and replaces it with "]]" followed by a newline character followed by "[[".

I used the %Space% token so you could see the search string is a comma followed by a space. You could just type a normal space character there.

Here's the replace string:

]]\n[[

And here's the text in the second green action, which just adds the leading "[[" and the trailing "]]":

[[%Variable%Local_Input%]]

Hope that helps.

unlocked2412 · January 27, 2023, 5:38pm

Yours is an interesting solution. However, if there is a trailing comma, like:

Things, Stuff, More Things,

then the output would be, I think:

[[Things]]
[[Stuff]]
[[More Things,]]

and I see a similar issue with this (also interesting) solution:

Nige_S · January 27, 2023, 6:03pm

As it should be! OP's specification is that items are separated by "comma space" -- a lone comma should be considered part of the string, a trailing "comma space" would indicate an empty item at the end.

Contrariwise, consider the string

Things, Stuff, Stuff,2, More Things

...where I'd argue that your solution fails OP's intended goal.

But, at the moment, there's no right answer. There's the friendly answer: "Go back to OP and tighten up the specifications". And there's the Consultant's answer: "That input's out of spec. Submit a change request and we'll fix it -- at twice our normal hourly rate <ker-ching!>".

unlocked2412 · January 27, 2023, 7:41pm

Well, the OP didn't make a specification. Only showed us sample input and output. If that input is going to be machine-produce, then yes, I agree with you.

However, if a human is going to write that line, would be difficult to follow (I think) such a strict pattern such as "comma" and "space", and never miss a space. That's why I frequently use a comma as the splitting character to account for both cases: space after the comma, and no space after the comma. My 2cents.

DanThomas · January 27, 2023, 9:29pm

I answered the question that was asked. If the OP has additional requirements, then let him state them. Trying to guess those additional requirements is not only an exercise in futility, but one step down the road to madness.

Unless you consider that fun - in which case, go for it!

Personally, if it were me, I'd have done it in JXA, but I doubt that would help. Would have been more fun to me, though.

ccstone · January 28, 2023, 5:08am

Hey Guys,

Just FYI:

I've parsed this kind of comma-delimited-text (CSV) so many times over the years, and over time there's inevitably something wrong with the input – especially when I need to get the job done and don't really have time to fix it.

Leading text – horizontal or vertical.
Trailing text – horizontal or vertical.
No space in the comma delimit.
Too many spaces in the comma delimit.
A space before the comma and not after.
Non-breaking-spaces in the comma delimit.
A tab stuck in the comma delimit.

So – when I build solutions of this sort I do a bunch of processing to make sure the output is what I want.

-Chris

DanThomas · January 28, 2023, 12:38pm

Here's a JavaScript example. This is just the logic:

const inputString = "Things, Stuff, More Things";
return inputString
	.split(/, */)                // split on a comma, followed by zero or more spaces
	.filter(s => s.trim() != "") // filter out empty strings
	.map(s => `[[${s.trim()}]]`) // format the result
	.join("\n");                 // return a string with each entry on a new line

The entire JXA script requires getting the input variable from KM, so it would look something like this:

Click to expand

(function() {
	'use strict';

    const _kme = Application("Keyboard Maestro Engine");
    const _currentApp = Application.currentApplication();
    _currentApp.includeStandardAdditions = true;

	function getKMVariable(name, required) {
		var result;
		if (name.match(/^instance|^local/i)) {
			let inst = _currentApp.systemAttribute("KMINSTANCE");
			result = _kme.getvariable(name, {instance: inst});
		} else {
			result = _kme.getvariable(name);
		}
        if (required && !result)
			throw new Error(`Variable "${name}" is empty`);
		return result;
	}

 	function execute() {
		const inputString = getKMVariable("Local_Input", true);
		return inputString
			.split(/, */)                // split on a comma, followed by zero or more spaces
			.filter(s => s.trim() != "") // filter out empty strings
			.map(s => `[[${s.trim()}]]`) // format the result
			.join("\n");                 // return a string with each entry on a new line
	}

	return execute();
})();

And you'd call it something like this:

"Click to expand

unlocked2412 · January 28, 2023, 7:31pm

That is a nice solution, @DanThomas. Good to see a JS one. And seems to work with any kind of combination: no space|space after the comma. And I hope @Nige_S you don't take my comment personally. @ccstone could phrase my intent much better than I did; the kind of parsing we see is rarely so perfect, so perhaps is a good idea to aim for a wide range of possible inputs. No bad intention on my part, really. I appreciate how much effort and thought you put in to help users out.

And, by the way, I always appreciate the effort and time you guys put into this forum to help so many users out there.

DanThomas · January 28, 2023, 7:43pm

No worries.

Nige_S · January 28, 2023, 9:08pm

Not at all! But I do have to, respectfully, disagree.

We do have a spec, of sorts, for the input. It's in the OP. We've no reason to assume that's incorrect, so that's what we should base our processing on.

Trying to catch every potential input error is a road to insanity. Chris has listed some possibles, I'd also include m, ., and < as common "I tried to type a comma and missed" errors, and I'm sure you could come up with more. But that's why you should sanitise input before you process it -- but, I'd argue, not during since sanitising and processing are discrete functions and so better separated.

EttVenter · January 29, 2023, 9:01am

Hi everyone!

Man, thank you guys so much for the responses! Your replies have helped me build exactly what I needed. Thank you!

unlocked2412 · January 29, 2023, 2:35pm

I am glad to hear that, so I appreciate your response. I value that you took the time to respond, so I could learn from another viewpoint.

No doubt that your solution works perfectly for the example given. But I really don't think just one example is even close to a spec.

Of course, there could be other mistakes. But, in general, I do not want specifically the separator to end up as of the items in any possible scenario. That's why I split on the separator.

Though, I see that you used the comma and the space as a separator. However, CSV for example, takes only the comma as the default separator that's why I am not familiar with the separator you used.

I'm interested about this. How does sanitize the input look like for you? Perhaps, we are talking about the same thing but I am not sure.

Nige_S · January 29, 2023, 8:59pm

But we can only go by what we're given. And what we've been given in this case is that "the only separator being used is , ".

Of course, if OP was talking to us about this our first questions would be "That's not CSV output from any of the systems you're likely to be using. Where's the input coming from? Are you sure about that separator? Will it always and only be that, will there also be , in any of the strings that we'd need to account for, any other potential gotchas?". And so on.

That was my bad -- I should have used the %Space% token in the macro to make that more obvious. That'll bite me when I revisit it next week/month/year!

Yes. And while there's no CSV standard as such, comma-space is "non-standard" enough, and consistently used by OP (albeit, I agree, over a small sample), that we should either take it literally or ask about it -- not assume we can treat it as a "normal" CSV and simply trim spaces.

But -- and here's an assumption, or rather inference, of my own -- I don't think OP is processing CSV or some variant. I think this is text selected in a document and eg being converted into Markdown "internal links" for Obsidian or similar.

Ideally you'd sanitise at data entry -- think form validation, or even autocorrect (which, in OP's case, could ensure that every , is followed by a space). Our Admin team were frequently mis-entering in our HR database, hitting Return or space at the end of a first name for example, so I've put in an auto-calc triggered by field exit that strips any trailing non-alphas.

Otherwise it depends on the "rules" and the "mistakes". So for OP's text you could decide that the "rule" was, indeed, strict comma-separated and that strings couldn't contain Returns or tabs -- they'd be treated as "wrong" separators. So you could S'n'R for one or more Returns and/or tabs, replacing them with a single comma, then replace any space-comma or comma-space with a single comma. But it's more maintainable to clean up your data first then use a simple split than to try and create "one split to rule them all".

All this is, of course, only my opinion -- YMMV

unlocked2412 · January 29, 2023, 11:45pm

Yes, that's true.

So interesting to hear your thoughts ! However, I've never done that, I now realise. When I have to process text, I use scripting in JS for example. If what I have to do is a bit more involved, I use a little parser combinator library written by Hutton. That works well for my parsing needs. I mean, I rarely search and replace with regexes. But super interesting to see your viewpoint !

Thank you so much for taking the time to write and share what you do !

Newbie Help Please - How to Process Text?

Options