What's wrong with my regex?

What's wrong with my regex?

Insert spaces after abbrev. of intl. standards.kmmacros (2.8 KB)

Insert spaces after abbrev. of intl. standards.kmmacros (2.8 KB)

The regex - Find string:

(\b(DIN)|(DIN EN)|(DIN EN ISO)|(DIN ISO)|(DIN-EN)|(DIN-EN-ISO)|(DIN-ISO)|(EN)|(IEC)|(ISO)|(NEN)|(NEN‑EN)|(NEN‑EN‑IEC)\b)(\s)(\d)

The Substitution (Replacement) expression:

$1‡$3

The test text:

You can find the specifications in DIN 12345, DIN EN 12345, DIN EN ISO 12345, DIN ISO 12345, DIN-EN 12345, DIN-EN-ISO 12345, DIN-ISO 12345, EN 12345, IEC 12345, ISO 12345, NEN 12345, NEN‑EN 12345, NEN‑EN‑IEC 12345.

The incorrect replacement result:

You can find the specifications in DIN‡2345, DIN EN‡DIN EN2345, DIN EN ISO‡2345, DIN ISO‡2345, DIN-EN‡2345, DIN-EN-ISO‡2345, DIN-ISO‡2345, EN‡2345, IEC‡2345, ISO‡2345, NEN‡2345, NEN‑EN‡2345, NEN‑EN‑IEC‡2345.

There are at least these errors:

  1. The first digit after the “‡” is missing.
  2. No “‡” inserted in “DIN EN”.
  3. The digits after “DIN EN” are missing.

What am I doing wrong here?

Background: I’d like to insert non-breaking spaces between common uppercase abbreviations in titles of international standards and following digits and between these abbreviations themselves. For demonstration purposes I have represented the non-breaking space with “‡”.

Link to regex101: regex101: build, test, and debug regex

Confusing yourself with too many parentheses! Easily done :wink:

You don't need them for each of your alternative terms.

(abc|def|g h)`

...will match "abc" OR "def" OR "g h". Get rid of those extra ( )s and it becomes much easier to see the correct capture group you to use in your replacement (the reason why you were losing the digit). Try:

(\b(DIN|DIN EN|DIN EN ISO|DIN ISO|DIN-EN|DIN-EN-ISO|DIN-ISO|EN|IEC|ISO|NEN|NEN‑EN|NEN‑EN‑IEC)\b)(\s)(\d)

...replacing with:

$1‡$4
1 Like