Only using groups that exists in find and replace

Hi,

I have citations of legal articles that look like this

Art. 25
Art. 25(1)
Art. 25(1)(a)

and want to transform them in a special format used by a flashcard app that looks like this

Art. {{c1::25}}({{c2::1}})({{c3::a}})

As you see, the numbers and letters are in a special environment. It tells the app that it shall create flashcards. On each flashcard one of the three numbers/letters is omitted (cloze text).
You have to answer the omitted information.

Here is my approach so far:

I catch the numbers and letters in regex groups and want transform to the cloze format.
However, the ICU regex engine does not support conditionals, thus I don't know which
group is empty. But this information is required for not using the cloze environment for empty groups.

Any ideas?

[quote=“Ben_Feldman, post:1, topic:904”]Art. 25
Art. 25(1)
Art. 25(1)(a)

and want to transform them into:

Art. {{c1::25}}({{c2::1}})({{c3::a}})
[/quote]

Hey Ben,

The above example appears to be a logical progression of article-subarticle.

What you show in your macro is not so straightforward, so it appears to me that you need to supply a more complete set of examples.

For problems like this with complex regular expressions it’s a good idea to either post the exported macro file OR the exact text used in all the fields, because few people are going to want to transcribe it to test it. (For that matter some text in the macro graphic is obscured and cannot be transcribed.)

My solution is based on your first example and therefore appears unlikely to work in all cases.

Execute AppleScript Action

{ Requires installation of the Satimage.osax AppleScript Extension }

# Auth: Christopher Stone <scriptmeister@thestoneforge.com>
# dMod: 2015/01/31 00:51

# Output Format: Art. {{c1::25}}({{c2::1}})({{c3::a}})

set newCitation to {}

# Simulate capture to variable from clipboard:
set _citation to "
Art. 25
Art. 25(1)
Art. 25(1)(a)
"

# Remove vertical whitespace from top:
set _citation to change "\\A\\s+" into "" in _citation with regexp without case sensitive
# Remove vertical whitespace from bottom:
set _citation to change "\\s+\\Z" into "" in _citation with regexp without case sensitive
# Remove 'Art. ' leader text:
set _citation to change "^Art\\. *" into "" in _citation with regexp without case sensitive
# Remove all but what is contained in the last parentheses:
set _citation to change "^.+(\\([^)]*\\))$" into "\\1" in _citation with regexp without case sensitive
set _citation to change "^(\\(?)" into "\\1::" in _citation with regexp without case sensitive
# Insert appropriate braces into line 1:
set _citation to change "(\\A.+)" into "{{\\1}}" in _citation with regexp without case sensitive
# Replace opening parentesis with itself and appropriate braces:
set _citation to change "(\\()" into "\\1{{" in _citation with regexp without case sensitive
# Replace closing parentesis with appropriate braces and itself:
set _citation to change "(\\))" into "}}\\1" in _citation with regexp without case sensitive
set _citation to paragraphs of _citation
set _cntr to 0
repeat with i in _citation
  set _cntr to _cntr + 1
  set (contents of i) to change "(::)" into ("c" & _cntr & "\\1") in (contents of i) with regexp without case sensitive
end repeat
set _citation to "Art. " & (join _citation using "")

--> {{c1::25}}({{c2::1}})({{c3::a}})

* Everything I’ve done with AppleScript can be done with KM alone.


Best Regards,
Chris

2 Likes

Thanks, Chris, for your speedy reply. I still try to understand the regular expressions you used. What I am trying to achieve in the end is the following. In a text file I have a series of questions and answers. They look like this:

Q: Which article provides for the right to walk silly in public?

A: The right to walk silly in public is provided in Art. 1(1)(b) Sillywalk Act 1919.

Q: …
A: …

I have another macro (with Peters help) that can extract question and answer. If the answer contains an article, the article shall be transformed to the cloze form automatically.

As far as I can see, this can achieved with the regular expressions you provided in your answer.

Why not simply do this with three search and replace actions, one for each form, starting with the largest form first.

Transform Art. 25(1)(a) ➤ Art. {{c1::25}}({{c2::1}})({{c3::a}})
Transform Art. 25(1) ➤ Art. {{c1::25}}({{c2::1}})
Transform Art. 25 ➤ Art. {{c1::25}}

Each one is a Search and Replace action, similar to what you have shown.

If the first one matches, then the transformed value will no longer match the later ones.

Otherwise, you can use a negative lookahead assertion to match “Art. 25”.

1 Like

That's a good thought, but the question then is how many possible subsections can there be in that notation?

--
Best Regards,
Chris

1 Like

True enough, although in practice these things tend to be relatively finite, and even if you had to do ten search & replace actions, they are all trivial - the biggest danger is making a mistake in the repetition when creating it, but its not hard to test all the possibilities as well.

Agreed.

Sequential find/replace actions should be faster and easier to manage than a looped solution.

-ccs

1 Like