Search & Replace Clipboard is not removing some text

I am trying to remove some text from the clipboard.
I use the Search and Replace command.

test_macro.kmmacros (5.0 KB)

Here is an example:

Amiot, D. (2004). Haut degré et préfixation. In F. Lefeuvre & M. Noailly (Eds.), Travaux linguistiques du Cerlico: Vol. 17. Intensité, Comparaison, Degré (pp. 91–104). Rennes: Presses Universitaires de Rennes.

I want to remove the Author and Year from the clipboard using these two Actions:

I know the Regex gets it correctly. But, for some absurd reason, the clipboard still contains the full text.

I'm not sure why this does not work for you, works fine for me.

IAC, it is easy enough to extract all 3 fields in one KM Action, and you don't need to remove styles:

image

The RegEx is:
^(.+) \((\d+)\)\. (.+)

For details, see https://regex101.com/r/q9jgJ2/1

If this does not work for you, please post what versions of KM and macOS you are running.

Questions?

4 Likes

The reason why I didn't concatenate them is because the Year could appear anywhere in the text in other styles (Chicago, or APA or Macmillan).

I am trying to extract reference data (mostly incollections ) from PDF files.

Other styles often put the year latter in the reference like this:

Amiot, D. Haut degré et préfixation. In F. Lefeuvre & M. Noailly (Eds.), Travaux linguistiques du Cerlico: Vol. 17. Intensité, Comparaison, Degré (pp. 91–104), (2004). Rennes: Presses Universitaires de Rennes.

A third style is to put the editors after the Booktitle:

Amiot, D. Haut degré et préfixation. In Travaux linguistiques du Cerlico: Vol. 17. Intensité, Comparaison, Degré (pp. 91–104), (2004), F. Lefeuvre & M. Noailly (Eds.),. Rennes: Presses Universitaires de Rennes.

Since the patterns are many, I was trying to design a macro that could capture all of them.

  • Finding the Year and Pages is very easy
  • Getting the author is also easy because it always comes first
  • The Title always comes after the author or the Author and year. It always precedes . In. I have struggled to get the Title.

So, to simply the process, I am trying a strategy to find the items from the clipboard--> put each of them variable-->delete them from the clipboard.

Nope, not so here.

Using your exact sample text from above, this is what my clipboard contains after running the macro:

‌ Haut degré et préfixation In F Lefeuvre & M Noailly Eds, Travaux linguistiques du Cerlico: Vol 17 Intensité, Comparaison, Degré pp 91–104 Rennes: Presses Universitaires de Rennes

I don’t know if it’s the desired result, but in any case it’s not the same as before.


Some observations on your regexes:

^(\w+.*?)[\(|"]
  • In case the | is meant as alternation operator: You don’t need that inside of a character class. (A character class matches any contained character.)
  • You don’t need to escape ( inside a character class.

So, if you’re just looking for ( or ", then this should do: [("]

Is it intentional that you are looking for straight (aka typewriter) quotation marks? (")

Judging by your sample, the sources are scientific texts, meant for print. In those texts it is more likely that you’ll find typographic (aka curly) quotation marks. So, I would look for those too: [("“]


\d{4}

This might also match any 4-digit page number.

Add an assertion, for example for a leading opening parenthesis: (?<=\()\d{4}

1 Like

OK, now that we have a more complete set of example references (source data), and the details of your requirements, we can offer a better solution.

Given these 3 examples:
(which I have modified slightly to test for better matching)

Amiot, D. (2004). Haut degré et préfixation. In F. Lefeuvre & M. Noailly (Eds.), Travaux linguistiques du Cerlico: Vol. 17. Intensité, Comparaison, Degré (pp. 91–104). Rennes: Presses Universitaires de Rennes.

Amiot, D. X. Haut degré et préfixation. In F. Lefeuvre & M. Noailly (Eds.), Travaux linguistiques du Cerlico: Vol. 17. Intensité, Comparaison, Degré (pp. 91–104), (2005). Rennes: Presses Universitaires de Rennes.

Amiot, David X. Haut degré et préfixation. In Travaux linguistiques du Cerlico: Vol. 17. Intensité, Comparaison, Degré (pp. 91–104), (2005), F. Lefeuvre & M. Noailly (Eds.),. Rennes: Presses Universitaires de Rennes.

I have written a macro that produces this result (as an example):

RegEx to Extract Author, Year, & Title

Note the following, and confirm that it meets your requirements, and the variations in reference formats that you need to deal with:

  1. Author
  • This will handle a variety of author names, as well as being followed by either the Year, or other text, as shown here:
    image
    .
  1. Year
    • RegEx:
      \(([12]\d{3})\)
    • For details see https://regex101.com/r/rqU2tD/1/
    • Supports matching the Year in the following pattern:
      • 4 digits, that start with 1 or 2, in parenthesis.
        . (match highlighted in green)
        image
    • Note that the last example, "(3004)" did NOT match.
      .
  2. Title
 <Prefix><Title><Suffix>
 
 where:
 
 	•	<Prefix> is ". In "  
 	•	<Title> is the Title to be extracted, which must end with a page reference like  "(pp. 91-104)"
 	•	<Suffix> is a space, comma, or period.

OK, so here's the macro. Please review & test and let us know if it works for you.

MACRO:   Extract Author, Year, & Title from Reference [Example] @RegEx

~~~ VER: 1.0    2018-06-19 ~~~

DOWNLOAD:

Extract Author- Year- & Title from Reference [Example] @RegEx.kmmacros (110 KB)
Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.


ReleaseNotes

Author.@JMichaelTX

PURPOSE:

  • Extract the Author, Year, and Title from standard references
  • It is designed to handle different formats

HOW TO USE:

  1. Copy reference string (one line) to Clipboard
  2. Trigger this macro

NOTICE: This macro/script is just an Example

  • It is provided only for educational purposes, and may not be suitable for any specific purpose.
  • It has had very limited testing.
  • You need to test further before using in a production environment.
  • It does not have extensive error checking/handling.
  • It may not be complete. It is provided as an example to show you one approach to solving a problem.

REQUIRES:

  1. KM 8.2+
  • But it can be written in KM 7.3.1+
  • It is KM8 specific just because some of the Actions have changed to make things simpler, but equivalent Actions are available in KM 7.3.1.
    .
  1. macOS 10.11.6 (El Capitan)
  • KM 8 Requires Yosemite or later, so this macro will probably run on Yosemite, but I make no guarantees. :wink:

MACRO SETUP

  • Carefully review the Release Notes and the Macro Actions
    • Make sure you understand what the Macro will do.
    • You are responsible for running the Macro, not me. :wink:
      .
  • Assign a Trigger to this maro.
  • Move this macro to a Macro Group that is only Active when you need this Macro.
  • ENABLE this Macro.
    .
  • REVIEW/CHANGE THE FOLLOWING MACRO ACTIONS:
    • ALL Actions that are shown in the magenta color

USE AT YOUR OWN RISK

  • While I have given this limited testing, and to the best of my knowledge it will do no harm, I cannot guarantee it.
  • If you have any doubts or questions:
    • Ask first
    • Turn on the KM Debugger from the KM Status Menu, and step through the macro, making sure you understand what it is doing with each Action.

3 Likes

Thank you so much @JMichaelTX . This is amazing.

It can solve one of the biggest problems struggled for tens of years: finding references from published articles and books. Google Scholar rarely offers a complete reference data. And, there is no good source like Pubmed in my field (and, many of the non-medicine fields, really).

How you solve some of the problems is amazingly brilliant.

I used to copy and paste every line; and then paste each line to Jabref.

I tried Pasetbot for a while to use the Sequential Paste feature.

This is one of the most functional macros I will have, once all the details are worked out. I will be back with more details if you don't mind.

1 Like

Thank you @Tom for showing me my problems.

No problem. Feel free to ask any follow-up questions.
If it is a new problem/issue, please post as a new topic.