Replace text Macro (v11.0.2)

I have this text:

Parus montanus Conrad, 1827
Stratigraphic range: Late Pleistocene (MNQ 26)-present
LP: AU 21 (cf); CR 16; FR 97

I need to replace with codes with words e.g. MNQ needs to be replaced with Mammal Neogene-Quaternary.

I have a list of replacements in this format:

MNQ__Mammal Neogene-Quaternary

I'm trying to get the attached macro to do the replacements. But at the moment, the text being outputted is only the last line (LP__Late Palaeolithic) in the replacements list.

The output text should be:

Parus montanus Conrad, 1827
Stratigraphic range: Late Pleistocene (Mammal Neogene-Quaternary 26)-present
Late Palaeolithic: Austria 21 (cf); Croatia 16; France 97

Tyrberg replacements.kmmacros (5.6 KB)

1 Like

You're repeatedly searching and replacing in the variable OriginalText, writing the updated version to UpdatedText. So, whatever happens earlier in the "For Each", the last loop will do only the last S'n'R on OriginalText and put the result into UpdatedText. You need to repeatedly replace within the same text.

For your own sanity, use different names for the ReplacementList variable and the ReplacementList Collection. Having them the same works because KM can differentiate the two, but it may confuse you later!

You aren't seeing it because of the "only does the last replacement" problem, but your S'n'R will replace any matching text, eg the "ar" in "Parus" will be replaced with "Armenia" to give "PArmeniaus" and then the "Ar" in that gets replaced again, then the "rm"s get replaced with "Romania" and the "Ro" in that is replaced with "Russia". You can probably solve that by making the search case-sensitive, but watch carefully for problems -- you may have to rethink your approach.

And you don't need that complicated regex and the extra variables -- just treat each line as a pseudo array, split on __:

image

(Yes, I've changed them to local variables -- you may, or may not, want to do the same depending on your use-case.)

1 Like

I got there 12 minutes later. :laughing:

@layo Those are great tips by @Nige_S there. If you need to know more about local variables (this is a sensible default kind of variable for most macros), the Wiki does of course have a page about those and the other kinds of Variables.

Tried to incorporate suggestions in attached, but in this version, nothing at all seems to be happening in the original text.

Key is that I'm struggling to understand what's going on within the For Each action. I understand the concept – do something to each item in list – but I do not understand how to implement the action.

Tyrberg replacements.kmmacros (5.1 KB)

I'm sure I can leave @Nige_S to get stuck into this properly in due course, but in the meantime, may I suggest that you try making one small change at a time. Start afresh only once you have tinkered a little more with your original version. If nothing else, that will keep you confident! :slight_smile: Treat your V. 1 as the version "to throw away", if you like. Don't rush into V. 2 until you have gained some experience with V. 1.

For instance, you could:

  1. Start by changing the two appearances of "UpdatedText" to "OriginalText". That will get you some different output.
  2. Then consider renaming "OriginalText" to something that will help you more easily hold in your mind what it refers to—perhaps "SampleText".
  3. Then change all the global variables to local variables.
  4. Then pick some other aspect to consider, such as the over-eager substitutions, e.g. "Stratigraphic" becoming "StratiGreeceaphIceland".
  5. and so on.

Regardless of the order of tweaking that you chose, get each tweak right before moving on to attempt the next one.

@Nige_S or you might disagree for one reason or another, but that is the approach that would suit me. :wink:

"Substrings" is wrong -- you're saying "make a Collection by splitting the text on every occurrence of __". So for the text:

AL__Algeria
AR__Armenia
AU__Austria

...your substrings would be:

AL

...then:

Algeria
AR

...then:

Armenia
AU

...and so on. Not what you want!

"Lines in" was the correct choice -- you want to work with each line of text in turn.

Your pseudoarray item delimiter is only one _ -- the "dividers" within your lines of text are double _s, so you need to make your item delimiter a double _ too, eg:

%Variable%Replacements[1]__%

Fix those things and you'll at least get some replacements happening. They'll be wrong because of the "over-eager substitutions" previously mentioned -- but the fix for that is earlier in the thread too.

I'll echo @kevinb -- take it one change at a time, testing as you go. If you aren't comfortable using KM's built-in Macro Debugger, the "Display Text" action is your friend -- pop a

image

...into your "For Each" action and you'll get a window for each item in the Collection, so you can see if you're extracting the right text. (You'll get a lot of windows with a large collection -- you can Option-click the "Close" button of any one of them to close them all.)

1 Like

Attached seems to do what I need, but as you say the "over-eager substitutions" have replaced 'MNQ' with 'MontenegroQ' when the replacement should be 'Mammal Neogene-Quaternary'. I think the fix suggested further up was to use a case-sensitive replacement, but does not seem to be making a difference. Any suggestions?

Tyrberg replacements.kmmacros (5.0 KB)

I'm a late arrival, but I think if you place MNQ before MN in your replacement list, it might fix it. The problem looks like some of your acronyms are subsets of other acronyms (eg, MN<MNQ).

1 Like

Thought that would fix it, but 'MNQ' still being replaced with 'MontenegroQ'. Tried putting word boundary regular expression around '%Variable%Local_Replacements[1]__%' but that did not work either. Any other fixes?

I think you overlooked something. There are THREE entires starting with MN, as follows:

MN__Montenegro
MN__Mammal Neogene
MNQ__Mammal Neogene-Quaternary

You probably moved the third one before the second one, but not before the first one. I think you didn't notice your duplicates. It might help you if you put them in alphabetical order so you don't make this mistake. When you have duplicates, the second one will never match.

1 Like

I notice that your list includes at least one key ( MA BP ) which includes a space.

Are there many ? It will be easier to find keys which are continuous sequences of upper case letters, containing no spaces. ( MABP , for example ).

The attached is horribly complicated but appears to work. Whether or not it results in complete carnage when used in real world remains to be seen...

Tyrberg replacements.kmmacros (17.8 KB)

@Airy's given you the reason -- a solution is to use Filter actions to alphabetically sort then reverse the contents of Local_ReplacementList before using it in your "For Each" action. That way your lines will be ordered so that your "fullest matches" are processed first, eg:

...
IS__Israel (including West Bank)
IRQ__Iraq
IRN__Iran
IR__Ireland
...
2 Likes