New user, request for help with text manipulation in Bruji's BookPedia

15 posts were split to a new topic: Backups, iCloud, Language, & Other Stuff

I think your current requirement is a great opportunity to learn how to use KM. I'm hesitant to give you a "one-line solution" because I want you to jump out of your nest and learn to fly.

1 Like

Weird that the forum script doesnā€™t allow more than 3 posts a day. But we got your PSs.

And Yes, me too I always have to de-dust after having written the last Awk script 8 months ago. Similarly, but not that hard with Perl.

But, since you said you wanted to go one by one, the UI scripting approach seems more reasonable either way. Could you try out the script?

1 Like

@MarkSealey, what about the UI AppleScript? Is it fine or do we have to prepare for further efforts? :wink:

1 Like

Mark, I've removed the posting restriction for you, so you should be able now to make more posts and new topics. You may need to log out / log in for this change to take effect.

1 Like

Thanks so much!

All extremely helpful :slight_smile:

Have been experimenting with Gabe's script for the past couple of hours.

I set the BookPedia window to '125, 125' - per a version of Sleepy's Action here:
Move%20Window

But where I'm stuck is the four parameters in the OCR Area action:

I understand the basic principles of X,Y co-ordinates, of course - but not how they're implemented in this caseā€¦ the Wiki docs are still in progress; and I couldn't find anything anywhere else. Nor what the various 'inner' and 'outer' (???) co-ordinates and negative values must mean in the Mouse Display window.

Mouse%20display
I think if I can get the params to capture the exact text I need, I'll be close.

Thanks!

Mark, I understand you are just experimenting, and that is fine.
But I have to say that using OCR for your use case is going to be the slowest and least reliable method.

Of all of the methods I've seen presented (and I haven't studied any of them in detail), I suggest trying @tom's UI scripting. I'm a bit concerned about being in a loop of 8,000 records, but if you test with 10 records or so, and put a good pause (say 0.2 sec) between each record, then I think you should be OK,

But as the other's mentioned, be sure to make a backup of your BookPedia just before you start testing and before you start a production run.

Some other thoughts:

  1. As part of the UI scripting, do a save every few records.
  2. Make sure the script does a test to ensure that the data needs to be converted; else just skip to the next record (but issue a log statement for the skipped record).
  3. The more I think about this, the more I'd suggest running the script from Script Debugger 7, rather than KM. This will give you more control, and make it easier to make changes to the script and test.

Let us know if you have any questions.

1 Like

Me, too (afraid of a 8000 loops). Thatā€™s why I limited the script by default to three rows.

But later the OP said thatā€¦

I'm also quite happy to do this record by record a) to keep an eye on things; b) because some colons are to be retained and c) because some Titles have more than one colon.

So, the script should be at least an ideal testing ground.

2 Likes

Thanks, JMichaelTX!

Your points are well taken. Yes, this has been a great way for me to experiment and find out how KM really works.

I always back everything up. Fanatically so!

Thanks for the endorsement of Tom's work. In fact, I really need to be able to edit BookPedia records one by one - chiefly because, although splitting the titles on a colon is the main objective, there are times when:

  1. a title has more than one colon - and so cannot be reliably split
  2. I need to be able to override such a redistribution of the title string - and so must not be split

I've actually learnt a lot today - courtesy of all the posters who kindly contributed.

This is what I've come up with: BookPedia Title Split.kmmacros (8.9 KB)

BookPedia doesn't lend itself easily to export (as CSV etc), external manipulation, and then re-import.

The only thing which appears not to be working in the Macro I've built is that it cuts off characters at the end of the 'Original Title' (second portion, after the colon) string.

I really didn't expect such an overwhelmingly friendly and helpful reception here. Very grateful!

Tom,

Indeed it is. I'll take another closer look at it tomorrow and see if I can adapt the text-massage portions of it to work record by record. Thousand thanksā€¦

The script already works record by record (by nature of the script). Just set the maxEntries variable to 1. So the loop will stopped after the first record.

Not sure what you mean with thatā€¦ Any chance to explain?, sometimes Iā€™m blindā€¦

1 Like

If you can provide several real-world examples of both kinds, then it is likely that a RegEx pattern canb e designed to handle both properly. Just post your examples in a Forum Code Block, with "text" as the language.

1 Like

Tom,

Of course. I'll try that.

I did get an error on the first record when I tried running it this morning. I'll look more closely tomorrow :slight_smile: .

My fault, sorry. I meant the regex parts to do the pasting, given that I need to be able to spot two colons in a field and those which don't need splitting.

If I may, I'll rerun and report back on the error. If I get it :slight_smile:

Thanks againā€¦

OK guys. To make some things clearer, three questions:

  • You definitely do want to proceed one record by one record, right?
  • You do not want to proceed all at once (e.g. by manipulating the dumped CSV file)?
  • It can happen that there are more than one colons in the first field?
1 Like

Our posts crossed. Just reading your post.

There is no regex in my script. Are you speaking of some other solution?

1 Like

Titles which should be split are in the format:

Title: Subtitle

An example would be:

Lost: The real life journey of Scott of the Antarctic

Titles which are more complex to regex are in the format:

Title:First Subtitle: Second Subtitle

An example would be:

Lost Again: The real life journey or Scott and the Antarctic: how he lost the race to the pole

Presumably the expression would be un-'greedy' and split only on the first occurrence of the colon.

OK. I got that. Except the double colons which you didnā€™t mention in the OP. (But no issue.)

Have you tried the script on an example database (of course without any titles with double colons), and does it work there?

1 Like

First thing is to see if the script basically works on your setup.

Edit: As said I have only downloaded the app and tried the default setup of the app.

1 Like