Capturing and moving different types of text

Help? Ideas? Suggestions? I have 60 files that all have the same format but different contents.

I’d like to move the sections around. If I could capture each section into a variable then I could move them easily. For the life of me I can’t seem to get the regex right. I can for the first two lines then things falls apart when I try and capture the multi line body.

Is KM the right tool for this? Are there alternate ways of approaching this? Here is a sample file. (This is the shortest one.)

I’m looking to grab

  1. title

  2. number (without the '()’)

  3. body (variable length amongst 60 notes)

  4. section before ‘Alternate Titles’ (variable length amongst 60 notes)

  5. Alternate titles (variable length amongst 60 notes)

  6. Markdown media link

  7. Time stamp

  8. HTML page break code

  9. NO NEED FOR LINES OF DASHES

Ultimately the final file would be

  1. number

  2. title (on same line as number)

  3. Markdown medial link

  4. body

  5. Alternate titles

  6. section before ‘Alternate Titles’

  7. Time stamp

  8. HTML page break code

# Don’t make everything so painful

(33)

This is about how you treat yourself and others. Things can be bad enough without festering them with attitude, comments, feelings, and gossip. Work with what is actually happening without faultfinding or complaining or elaboration. Don't get angry. This only makes things worse. 'Don’t put a head on top of your head. One head is enough.' Do this for others. Do this for yourself.  

Confidence is undermined by focusing on vulnerabilities and pain points.

The antidote for this to provide encouragement and support to others. Build confidence though small actions, remind others of their adequacies.

----------------------------------------------------------------

Don’t malign others  ...........................................[[201903300509]]

Cultivate an ever-present joyful mind ..........................[[201903200647]]

Let confusion awaken and practice emptiness ....................[[201903130526]]

## Alternate Titles

- Don’t bring things to a painful point

- Do not strike at the heart

- Don't bring things to a painful point

- Don't make things painful

- Do not strike at weaknesses

![](media/palouse-sunset-2206.jpg)

----------------------------------------------------------------

04-01-2019 - 4:59 AM

<div style="page-break-after: always;"></div>

I'm not sure I agree with your approach. Regex may work fine, but that's a pretty big Regex which will be a major hassle (at least for me) to work with. There is another feature of KM called Dictionaries which I think would work pretty effectively here. I'm not sure if you know how to use Dictionaries yet. If not, I can provide some help. I'm not an expert with them but I think I'm getting the drift of them.

Consider this approach which uses Dictionaries (it's pseudocode for the moment):

Foreach fileX in folderZ
   open FileX
   read the title
   set Dictionary1[fileX.title] to the title data
   read the number
   set Dictionary1[fileX.number] to the number data
   <<<do this for all 9 fields>>>
end loop
Delete all files in folderZ (or for safety pick a different folder!)
Foreach **number key** in Dictionary1:
       break up the Dictionary1 key into filename and field name
       append Dictionary1[filename.fieldname]'s value to Filename
    end loop
    Foreach **title key** in Dictionary1:
       break up the Dictionary1 key into filename and field name
       append Dictionary1[filename.fieldname]'s value to Filename
    end loop
    <<<Do this for all 9 fields, using the NEW ORDER in your requirement>>>
end loop

I think that's a pretty elegant solution. The first half has to read all 9 sections for each file into an appropriate dictionary. The second half has to have 8 loops which go through all the keys of a certain type and write the data for that key into a new file.

It's an interesting approach. You may not like it.

Part of the problem with any solution is that it's not clear precisely what the nature of the files could be. You gave an example, which is extremely helpful, but it's not apparent from the example what all possible variations could be. So any approach will probably require asking you for some clarifications.

I'm not entirely sure I want to write 100% of the code for you on this. If you want to try this approach, I'm very willing to assist with explaining it, but I think you should be the main coder for this, due to the fact that only you know all the variations of the text files.

I think there are completely different approaches to solving this. Perhaps we should see if other people want to suggest a different approach. I'm toying with a couple of other approaches but in the end they all have to deal with the exact same issues here.

A completely different approach (but in the end all approaches are solving the same issues) would be to extract the various sections of text into separate files in the same folder. For example if your filenames happened to be:

File1
File2
File3

then a result of the extraction would be a set of 27 files:

File1.title File1.number File1.body File1.section File1.alternates File1.markdown File1.timestamp File1.html
File2.title File2.number File2.body File2.section File2.alternates File2.markdown File2.timestamp File2.html
etc.

I think this approach could be much simpler to code. For example, to create the files ending with .title it would probably be as simple as this:

image

Maybe that's not "simple" What it does is extract the title (the first line) of each file starting with the letter f and creating a new file called SAMEFILENAME.title which contains the title.

Now if you did that for each of the 8 sections, you'd have everything split up correctly. Then with a very small amount of code we could re-merge all the separated files into the order you prefer. If we get that far I'm sure I could write that code for you, it would probably be a single line.

In some ways this code is clearer. The example I showed of extracting the titles was rather simple. The other 8 actions which will extract the other sections of code will be SLIGHTLY more complex using different shell commands.

But in the end this is a good approach because it's actually a very minimal amount of code, and you can work on each piece of code independently. So it has certain advantages. The only downside is that it takes a skilled person to write the 9 lines of code that deal with shell commands, but I'd be happy to carry the load on that issue.

Thanks so much for the ideas. The second suggestion is intriguing. If I understand correctly, the result would be 480 files One file for each of 8 pieces in each of 60 files. Reassembly would be relatively easy and I can see where I could experiment with different layouts.

Extracting the chunks seems where the trouble is. Line 1 and Line 2 can be easily extracted, I think. The next chunk is variable in length (sometimes up to 50 lines) between files but always ends with the multiple '-----' line. How to chunk that portion boggles my mind.

ps. will$ ls 2* |xargs -t -l % sh -c 'head -n 1 %>%.title'
reports xargs: illegal option -- l

Yes, KM is definitely the right tool for this.

Here is an example macro that just processes the text from one file. You will need to adjust the macro as follows:

  1. Fine-tune the layout of the final text in KM Variable Local__FinalFile
  2. Put in a For Each action using either a Finders Selection collection OR a Folder Contents collection
  3. Use the filePath from #2 to Read a File action into the KM "Local__SourceStr" Variable
  4. Use a Write to a File action to output the results for each file.

Uses this RegEx:

(?s)^#\h+(.+?)\R.+?\((\d+)\).*?\R(.+?)\R---.+?\R(.+?)##\h+Alternate Titles\h*\R\R(.+?)\R!\[\]\((media.+?)\).+?(\d+-\d+-\d+.+?)\R.+?(<div.+?<\/div>)

For details see regex101: build, test, and debug regex

Example Output


MACRO:   Extract & Recombine Article Info [Example]

**Requires: KM 8.2.4+&nbsp;&nbsp;&nbsp;macOS 10.11 (El Capitan)+**
(Macro was written & tested using KM 9.0+ on macOS 10.14.5 (Mojave))

#### DOWNLOAD Macro File:
<a class="attachment" href="/uploads/default/original/3X/3/7/37a325eb887502f175a0aef0fe60154588fa61d2.kmmacros">Extract & Recombine Article Info [Example].kmmacros</a>
**Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.**


---

### ReleaseNotes

Author.@JMichaelTX 

**PURPOSE:**

* **Extract Fields from Article Text & Recombine in New String**

**HOW TO USE**

1. First, make sure you have followed instructions in the _Macro Setup_ below.
2. Trigger this macro.

**MACRO SETUP**

* **Carefully review the Release Notes and the Macro Actions**
  * Make sure you understand what the Macro will do.  
  * You are responsible for running the Macro, not me.  ??
.
**Make These Changes to this Macro**
1. Assign a Trigger to this macro.
2. Move this macro to a Macro Group that is only Active when you need this Macro.
3. ENABLE this Macro, and the Macro Group it is in.
.
* **REVIEW/CHANGE THE FOLLOWING MACRO ACTIONS:**
(all shown in the magenta color)
   * Set Source String 
   * Change Layout as Desired for Final File Output 

**REQUIRES:**

1. **KM .2.4+** (may work in KM 8.2+ in some cases)
2. **macOS 10.11.6 (El Capitan)+**

TAGS:  @RegEx @Strings

USER SETTINGS:

* Any Action in _magenta color_ is designed to be changed by end-user

ACTION COLOR CODES

* To facilitate the reading, customizing, and maintenance of this macro,
      key Actions are colored as follows:
* GREEN   -- Key Comments designed to highlight main sections of macro
* MAGENTA -- Actions designed to be customized by user
* YELLOW  -- Primary Actions (usually the main purpose of the macro)
* ORANGE  -- Actions that permanently destroy Variables or Clipboards,
OR IF/THEN and PAUSE Actions


**==USE AT YOUR OWN RISK==**

* While I have given this a modest amount of testing, and to the best of my knowledge will do no harm, I cannot guarantee it.
* If you have any doubts or questions:
  * **Ask first**
  * Turn on the KM Debugger from the KM Status Menu, and step through the macro, making sure you understand what it is doing with each Action.


<img src="/uploads/default/original/3X/8/0/8026f030a544b706e6ea15126f47c3ab67507dff.png" width="670" height="1702">
1 Like

That was a capital I (eye), not a lowercase l (ell). Sorry for not making that clear. My bad.

You are right I picked the easiest extraction. The other 7 or 8 extractions will be slightly more difficult. I didn't want to do them unless you approved of this approach.

JM has just provided a Regex for you. Yes, as I said, Regex will work. I just find it a little intimidating. Look at his three line solution! That's not how I like to program. If his solution works for you, just use it. But if you want to learn the tricks of the Shell trade I can help you with the second approach I outlined for you. It's your call.

Wow! You are truly a Regex Master, a KM Master. Thank you for the detailed outline. Now the work begins. Absorbing knowledge. I don't want to waste your efforts.

I can read most of the regex but I'll have to work with it a bit to get the nuances. I knew Regex could handle this with ease but at a level beyond me. Been learning Regex intently over the last few months and this builds on that. Thank you. This not only helps me with my Lojong Commentary Project but advances my understanding of KM and regex. The workflow of developing such a long regex statement seems my stumbling block. Should one start at the beginning, the end or randomly chunk by chunk. regex101.com is a great tool.

Just tried this out on my system and worked flawlessly in original form. Tweaking begins.

Thanks again.

1 Like

Well, that somewhat depends on the task, the text, and your preferences.

I generally start at the beginning, so I can see how each piece is matching.

I highly recommend regex101.com. I do all of my development and testing there.
You may also want to review the KM Wiki: Regular Expressions , which includes some good references at the bottom of the page.