Extracting named fields from a series of Key:Value lines?

DMA · June 4, 2024, 1:19am

I've been trying to extract various bits and pieces from a multi-line text in the system clipboard, and assemble into a smaller bit of info, as I have hundreds and hundreds to do. Here's the example:

Restrictions:	
Contact your local office for all commercial or promotional uses.
Credit:	Shirlaine Forrest / Contributor
Editorial #:	1159057436
Collection:	WireImage
Date created:	29 June, 2019
Upload date:	29 June, 2019
Licence type:	Rights-managed
Release info:	Not released. More information
Source:	WireImage
Object name:	d4s_3193_2019062944636318.jpg
Max file size:	2417 x 3776 px (20.46 x 31.97 cm) - 300 dpi - 6 MB

I need to grab the bit that follows "Credit: " but ends with the following slash "/" on the same line, and also the "Collection: " entry and then assemble like this:

Shirlaine Forrest / WireImage

I've exhausted all I know about positive look behind (?<=Credit: ) and capturing with something like ([^/]+) ((not a forward slash) but as I don't even know if what I want to do is possible, I thought I'd call it after about 90 minutes of floundering and ask for help! Honestly, not trying to get someone to write my grep for me because I can't be bothered, genuinely asking for some assistance following loads of reading and trial and mainly error!

DMA · June 4, 2024, 1:31am

CleanShot 2024-06-04 at 02.30.25
Think I might have done it, however, I'd be very interested to see if there's a better way to do it

griffman · June 4, 2024, 1:35am

I'm positive that there's a very slick way to do it with Java's string handling, but hopefully @ComplexPoint will weigh in with that solution—it's beyond my skill level.

I did it pretty much the way you did it, using two passes. I often do this if the regex is complicated to find data across multiple lines in a variable, as Keyboard Maestro is so fast at them there's not a big performance hit for doing it twice.

-rob.

DMA · June 4, 2024, 2:05am

Thanks Rob, I really struggled with this, and then just after I posted, had the minor brain wave that enabled me to hack something together. God I love Keyboard Maestro. There are probably loads of ways of doing this, something for every skill level

Airy · June 4, 2024, 2:06am

Your method is very clear and simple. Don't worry that it used two statements.

ComplexPoint · June 4, 2024, 7:50am

My sympathies. After 30 minutes, let alone 90, perhaps one does have to ask how much grep, and regular expressions, are really helping to solve this problem ?

Another route, is to use Keyboard Maestro's %JSONValue% token,
and write:

%JSONValue%local_Dict.Credit[1]% / %JSONValue%local_Dict.Collection[1]%

which yields:

Shirlaine Forrest / WireImage

Taking particular fields from key-value lines.kmmacros (5.4 KB)

after we have automated reduction of the record to the following JSON Dictionary:

{
  "Restrictions": [
    "Contact your local office for all commercial or promotional uses."
  ],
  "Credit": [
    "Shirlaine Forrest",
    "Contributor"
  ],
  "Editorial #": [
    "1159057436"
  ],
  "Collection": [
    "WireImage"
  ],
  "Date created": [
    "29 June, 2019"
  ],
  "Upload date": [
    "29 June, 2019"
  ],
  "Licence type": [
    "Rights-managed"
  ],
  "Release info": [
    "Not released. More information"
  ],
  "Source": [
    "WireImage"
  ],
  "Object name": [
    "d4s_3193_2019062944636318.jpg"
  ],
  "Max file size": [
    "2417 x 3776 px (20.46 x 31.97 cm) - 300 dpi - 6 MB"
  ]
}

by writing this in a Keyboard Maestro Execute JavaScript for Automation action:

return JSON.stringify(
    kmvar.local_Source
    .replace("\t\n", "\t")
    .split("\n")
    .reduce(
        (dict, line) => {
            const [key, value] = line.split("\t");

            return (
                dict[key.slice(0, -1)] = value.trim().split(" / "),
                dict
            );
        },
        {}
    ),
    null, 2
);

Generally, losing time grappling with Regular Expressions is a useful sign that we really have wandered too deep into the territory described by Jamie Zawinski in 1997:

Some people, when confronted with a problem, think
"I know, I'll use regular expressions."
Now they have two problems.

Each of those 90 minute (or even half hour) sessions would, I think, be much more profitably spent learning the basics of any common scripting language that provides splits, dictionaries, maps and folds, filters. Python, JavaScript, whatever.

DMA · June 4, 2024, 10:50am

Thanks so much for taking the time to expand on this. I'll spend some time taking it all in as I'm sure I could use it on many future things.

With reference for whether it's worth doing – some day I'll be more efficient but it doesn't magically happen, right! Once I'd figured out a kludgey way to do it the whole process of working on these became a lot less stressful, and while I wasn't finding an answer quickly, I was learning and refreshing my memory of things, and enjoying it to a degree!

griffman · June 4, 2024, 12:40pm

Every time I've gone looking for a learning resource for such things, I get frustrated because they all seem targeted at users who already know how to code. Or if I specifically search for something for a pure beginner to learn, it starts off at the "this is a keyboard, this is a mouse, here's how you type" level, and takes forever to move to even a basic "hello, world" program.

Do you have any pointers for resources that fall somewhere between those two extremes? (Please don't go actively search on my behalf now, just curious if you're already aware of any.)

-rob.

DMA · June 4, 2024, 12:45pm

Yeah I did a load of PHP back in the day, but have little need for it now. It seems that it would have been more long-term useful to have learned Javascript. However, PHP at least taught me a lot about the mindset and problem solving of this sort of thing. Now I think I'm somewhere close to where you are Rob!

ComplexPoint · June 4, 2024, 1:19pm

The key thing is not to memorise details, but to find your way towards real clarity about a few fundamental concepts.

@unlocked2412 and I have thought about writing a book about a small and solid subset of functional JavaScript that works well for light scripting and automation – but we're both busy at the moment, and I'm already old enough to have a shortening shelf-life

Haven't yet looked at it, but over the summer I may dip into:

Structure and Interpretation of Computer Programs: JavaScript Edition (MIT Electrical Engineering and Computer Science)

which should, I think be interesting, and get to the fundamentals – it's a reworking of a classic – but I haven't checked its pacing and assumptions, or even its JS style.

(JS has accumulated messily and become a bit of a lumber-room. Ever since Crockford's JavaScript – the good parts it's been well understood that you do well to ignore a lot of it, and work with a subset which suits your context)

Briefly I would:

First make sure that you understand everything in JSON well
Explore the built-in methods of the Array type (lists), especially:
- .filter
- .map
- .reduce
- .flatMap
Find ways of solving particular problems that arise in your own work.

If I had to major on one of the above, I would suggest becoming a very well-practiced user of .reduce.

Armed with .reduce alone you can do anything (even define your own filter, map, and flatMap), and you will never need to mess with loops.

ComplexPoint · June 4, 2024, 1:32pm

And interactive online edition of SICP JS – looking quickly at the TOC, I think it does take the right approach.

The key thing is to stick with expressions, functions, constants.

( Forget the ghastly plodding mess of statements, loops and mutable variables – that way lies lost time, and the slough of despond )

kevinb · June 4, 2024, 1:59pm

The key to solving this one with regular expressions is to use [\s\S]* instead of the habitual .*

[\s\S]* will match any number of whitespace or non-whitespace characters – and \s includes the newline character. This helps when looking for matches across multiple lines.

Search for:

[\s\S]*Credit:\s(.*\s\/\s)[\s\S]*Collection:\s(.*)[\s\S]*

Replace with:

$1$2

The result using your sample text:

Shirlaine Forrest / WireImage

Edited to remove some unnecessary text in search term.

DMA · June 4, 2024, 2:14pm

Kevin, thank you so much for taking the time. This was exactly what I was trying to do. Thank you for [\s\S]* I will log it away in my notes and noggin!

DMA · June 4, 2024, 2:18pm

Seems I can only mark one post as the solution. I've marked Kevin's as such as it was more directly what I was attempting to do, but thank you Complex for pointing out the JSON method which looks amazing.

Extracting named fields from a series of Key:Value lines?

Options