Extracting Text from a JSON String

Jordan_Bodwell · September 26, 2021, 4:45pm

Hey Everyone,

I am stumped.

I would like to use regex to isolate client names and appointments from an XPath text extraction. My regex works on Atom but I can't understand why it's not working on KM...

Maybe some quirk of the KM regex processing? Maybe tiny user error??

Here is the sample text:

"xpath": "//*[@id=\"eHanaFrame_LCtl_dayCell_20210922\"]/div",
"firstOnly": false,
"matches": [
{
  "name": "DIV",
  "text": "**9 AM-9:50 AM Jones, Indiana (DOB: 01/01/1901)9/22/2021**View AppointmentView Documentation , 9:00 AM - 9:50 AMJJones, Indiana (DOB: 01/01/1901), 9:00 AM - 9:50 AMJones, Indiana (DOB: 01/01/1901) (Outpatient Progress Note) ",
  "data-classification": "Scheduling",
  "class": "CalendarGridEvent_Scheduled_Divided",
  "id": "eHanaFrame_LCtl_ctl243"

And here is the RegEx:

(?<="text": ").*(?=View AppointmentEnter|View AppointmentView Documentation)

Edit: Target selected text is in markdown bold notation.

ComplexPoint · September 26, 2021, 7:45pm

Show works where tell fails, and the minimum is always:

some sample input (you've given us that)
an example of the output that you want. (we still need to see that)

A thread title that names a small component of a possible (but failing) solution also tends to harvest much less attention and help than a thread which names the problem that you are trying to solve.

What's the use case ?

ComplexPoint · September 26, 2021, 7:58pm

Incidentally, as the list of matches is well-formed json, you could read it in a JavaScript action and extract the text property directly as matches[0].text

Once you have a string like:

"9 AM-9:50 AM Jones, Indiana (DOB: 01/01/1901)9/22/2021View AppointmentView Documentation , 9:00 AM - 9:50 AMJJones, Indiana (DOB: 01/01/1901), 9:00 AM - 9:50 AMJones, Indiana (DOB: 01/01/1901) (Outpatient Progress Note) "

you can split it on a sub-string like "View Appoint" and directly obtain the first resulting segment:

s.split("View Appoint")[0]

e.g. a pattern like:

(() => {
    "use strict";

    const dictSample = {
        "xpath": "//*[@id=\"eHanaFrame_LCtl_dayCell_20210922 \"]/div",
        "firstOnly": false,
        "matches": [{
            "name": "DIV",
            "text": "9 AM-9:50 AM Jones, Indiana (DOB: 01/01/1901)9/22/2021View AppointmentView Documentation , 9:00 AM - 9:50 AMJJones, Indiana (DOB: 01/01/1901), 9:00 AM - 9:50 AMJones, Indiana (DOB: 01/01/1901) (Outpatient Progress Note) ",
            "data-classification": "Scheduling",
            "class": "CalendarGridEvent_Scheduled_Divided",
            "id": "eHanaFrame_LCtl_ctl243"
        }]
    };

    return dictSample.matches[0].text
        .split("View Appoint")[0];
})();

A thread title more likely to help others find a solution later might be closer to:

Extracting sub-strings from records matched by XPath

ccstone · September 26, 2021, 11:04pm

Hey Jordan,

That happens now and then...

First let me put my moderator hat on for a minute.

Whenever possible please post an actual testable example macro demonstrating your issue in as few steps as possible. You will always get better help when you make it easy for others to test your work. If they're not testing they're guessing, and guessing very often waste's everyone's valuable time.
You did well in posting both your data and your regular expression.
- However – please use the Preformatted-Text Button in the forum editor when posting code or search text samples - otherwise the Discourse forum software may render it in unintended fashion and cause confusion.

Okay, let's look at your problem.

I agree with @ComplexPoint that it's generally a good idea to use the best tool for the job and that extracting the the correct JSON field before pruning it down with text processing methods is a solid idea.

But – I don't have much experience doing that so let's stick to your regex search for the time being:

I didn't find anything wrong with it. I pasted your text and your regex directly into a search action as you show in your screenshot above, and I added one capture group.

It works like a charm:

Extract Client Data from JSON String (RegEx-Text-Input) v1.00.kmmacros (6.7 KB)
Keyboard Maestro Export

Here's virtually the same macro, except that it uses a variable instead of text as the search input:

Variable-Based-Macro

Extract Client Data from JSON String (RegEx) v1.00.kmmacros (6.9 KB)
Keyboard Maestro Export

I'm having to conjecture that there's one or more anomalies in your local text or regex that isn't demonstrable given the details you provided – one more reason for needing to test your failing macro.

Just for giggles – here's a quick and dirty example of a JSONValueToken.

Note – this macro only extracts the “text” field and does no other text processing.

A Basic JSONValueToken Example

Extract Client Data from JSON String (JSONValue Token) v1.00.kmmacros (6.4 KB)
Keyboard Maestro Export

A couple of final tips:

Look in the Keyboard Maestro Editor's Help menu for ICU Regular Expression Reference – you'll find that ICU regex has some subtle differences from PCRE and other common flavors of regex.

Note also that multiline (?m) is NOT on by default in KM's regex search. This wasn't a problem for you this time, since you were using look-behind and look-ahead assertions and not anchoring to the beginning or end of a line.

-Chris

drdrang · September 28, 2021, 1:06pm

I agree with @ComplexPoint about using a parser instead of a regex—or maybe in combination with a regex—when you have structured text like JSON or XML. The parser will simplify your work and will handle edge cases better. This assumes that your structured text is complete and well formed (what’s in the original post isn’t, although I assume that was because it was a quick copy and paste from something that was complete and well formed) and that you have a parser handy.

JavaScript is a natural choice for a JSON parser, but there are lots of them around. If you use Homebrew, it’s easy to install jq, which has some really nice features for extracting and manipulating JSON data. It also integrates well with Keyboard Maestro because it communicates via standard input and output.

Here’s a quick example:

You can download the macro itself from here: Appointment.kmmacros (2.7 KB)

The example text is

{
"xpath": "//*[@id=\"eHanaFrame_LCtl_dayCell_20210922\"]/div",
"firstOnly": false,
"matches": [
{
  "name": "DIV",
  "text": "9 AM-9:50 AM Jones, Indiana (DOB: 01/01/1901)9/22/2021View AppointmentView Documentation , 9:00 AM - 9:50 AMJJones, Indiana (DOB: 01/01/1901), 9:00 AM - 9:50 AMJones, Indiana (DOB: 01/01/1901) (Outpatient Progress Note) ",
  "data-classification": "Scheduling",
  "class": "CalendarGridEvent_Scheduled_Divided",
  "id": "eHanaFrame_LCtl_ctl243"
},
{
  "name": "MOD",
  "text": "10 AM-10:50 AM Jones, Chipper (DOB: 01/01/1970)9/22/2021View AppointmentEnter , 9:00 AM - 9:50 AMJJones, Chipper (DOB: 01/01/1970), 10:00 AM - 10:50 AMJones, Chipper (DOB: 01/01/1970) (Outpatient Progress Note) ",
  "data-classification": "Scheduling",
  "class": "CalendarGridEvent_Scheduled_Divided",
  "id": "eHanaFrame_LCtl_ctl244"
}
]
}

where I’ve added another entry and the necessary brackets and braces to fill out a complete JSON structure. This is fed as standard input to the shell command

/opt/homebrew/bin/jq -r '.matches[].text | sub("View Appointment.*$"; "")'

which extracts the text from each of the matches, gets rid of everything from “View Appointment” until the end, and returns the results, one per line,

9 AM-9:50 AM Jones, Indiana (DOB: 01/01/1901)9/22/2021
10 AM-10:50 AM Jones, Chipper (DOB: 01/01/1970)9/22/2021

which you can then process however you like in Keyboard Maestro.

(Be careful about the path to jq; Homebrew may install it in a different place on your computer.)

As in @ComplexPoint’s solution, the use of a parser to pull out the text makes the subsequent manipulation much simpler. His split("ViewAppoint") doesn’t use regular expressions at all; my sub("View Appointment.*$"; "")—a substitution command in jq—is barely a regex.

The cost of this simpler manipulation is that you have to break out of Keyboard Maestro itself and use some outside tool. I think it's worth the cost, but everyone values things differently.

ComplexPoint · September 28, 2021, 1:48pm

I agree – jq is a very useful and absorbing instrument.

For learning and experimentation, I particularly like the Visual Code jq playground – an extension which you can find and install in Visual Studio Code by searching for jq under Extensions.

Within that playground, a good starting place is View > Command Palette ... > JQPG Examples, which lets you experiment with every example in the jq manual.

(For more reading practice and experimentation, there's quite quite a rich set of examples in Category:Jq - Rosetta Code )

ComplexPoint · September 28, 2021, 2:19pm

FWIW the simplest useful jq filter is .

It preserves the structure of the input – just pretty-printing the output.

So we could pretty-print the JSON above, if homebrew has placed our jq installation in, for example /usr/local/bin/jq

with:

ccstone · September 29, 2021, 1:21am

jq is also available from MacPorts: Install jq on macOS with MacPorts

Thanks to @drdrang for bringing it up.

Note – it's generally a better idea to set an ENV_PATH variable in Keyboard Maestro than to mess around with full paths to shell commands.

A change to your system can cost you significant time fixing them, whereas it only requires one change to the ENV_PATH variable.

-Chris

drdrang · September 29, 2021, 4:09am

Yeah, that’s how I do it, but I was afraid of adding another explanation to what was already a long answer. I suppose I should just bookmark the wiki entry for ENV_PATH and link to it whenever I mention a shell script step.

ccstone · September 29, 2021, 4:39am

I've got it in Typinator myself: Path in Shell Scripts

kmw.sh.path ==

[Path in Shell Scripts](https://wiki.keyboardmaestro.com/action/Execute_a_Shell_Script?redirect=1#Path_in_Shell_Scripts)

Extracting Text from a JSON String

Options