Need help with a regex to extract first and last terms in a text string for further processing

Introduction: I hope someone is interested in solving this little puzzle. As far as I can tell, there is no app that will do what I need, but if there is, I'm extremely interested and will gladly pay for it. I'm not especially creative, so the best I can think of is the method below, which if nothing else at least automates as much as possible. There likely are much better ways to achieve what I need, and I welcome any ideas.

Problem summary: I need to find an automated way to verify that a document's in-text citations (author's last name and year of publication) do in fact match up to the list of references (usually in the same Microsoft Word or PDF document, but sometimes in a separate file). I waste untold hours doing this mind-numbingly repetitive chore. Isn't this exactly the kind of thing that computers are for?

The long-winded version: I need to automate a workflow whereby I can do the below. I've given a more concrete example after the imagined steps.

  1. copy a string of text selected in a document, such as "Smith 1967" or "Howell, Taylor, and Riley 1992b" or "Smith has shown repeatedly that he has endless trouble with regular expressions (2020c)"
  2. extract the first word of the string (will always be a proper name) and the last word of the string (will always be a four-digit year, may or may not include a lowercase letter at the end, may or may not be surrounded with parentheses -- I don't want the parentheses or any other punctuation, just the four-digit year, and the lowercase letter if present)
  3. create variables from the extracted proper name and year (or year+letter)
  4. create a new regex populated with the variables
  5. copy the regex to the clipboard
  6. switch to another app and run a search using the populated regex

I'm happy to say that I can handle (1). (2) is where I get into trouble. Here is a more specific example of exactly what I want to happen:

  1. copy the selected text "Smith has shown repeatedly that he has endless trouble with regular expressions (2020c)" to clipboard
  2. extract "Smith" and "2020c" (presumably with regex, unless there's a more elegant method)
  3. save "Smith" and "2020c" as variables -- let's say Name and Year
  4. populate a regex that when used in a regex search will find the two variables regardless of the text between them, perhaps something like (?<=Name).*?(?=Year)
  5. place the populated regex on the clipboard
  6. switch to the second app containing the list of references (probably FoxTrot Pro, if I can get this to work, since it allows for regex searches in PDFs), paste our example of "(?<=Smith).*?(?=2020c)" or whatever regex into the search box, and run the search, hopefully finding the reference entry for Smith 2020c.

Thank so much for any thoughts on this, including better methods,
Kerry

Hello! I've created an account just to answer this (I'm a long time lurker!)

Here's a macro with the gist of what you need, I believe the regex it uses and the one it generates are exactly what you're looking for (it can handle cases where the year is between () or even have a lowercase letter).

If you're serious about the payment part I would gladly accept it if this macro helps you!

Search for references.kmmacros (5.1 KB)

2 Likes

Hey @kerjsmit,

Well.

I was going to do this with AppleScript since FoxTrot Professional has AppleScript support.

Then I found out how bad their AppleScript support is...

FoxTrot has a lot of good features, but there are a bunch of things about it that are just plain clunky.

Too bad.

I did ask on their support board about how to craft a regular expression based AppleScript search query, but I'll be very shocked if I get a useful answer. (Based on the apps AppleScript dictionary.)

-Chris

I'm not at all happy with this, because I could't do a proper regular expression query with AppleScript in the FoxTrot Professional Search app.

But I'll post the effort anyway.

Rather than using Keyboard Maestro to parse the clipboard I'm using AppleScript's own word-parser, since the requirement is very simple.

If this was my macro to use day in and day out I'd make it more bombproof, but it fulfills the stated requirements.

-Chris


Search FoxTrot Pro for a Citation v1.00.kmmacros (5.8 KB)

Thanks so much! I'll check it out ASAP and gladly pay you for it if it works for me!

Yeah, FoxTrot Pro functions very well, and it has that great "neighboring words" feature, but it does have some issues that drive me nuts.

Thank you! I'll give it a try ASAP!