New user, request for help with text manipulation in Bruji's BookPedia

Mark, I understand you are just experimenting, and that is fine.
But I have to say that using OCR for your use case is going to be the slowest and least reliable method.

Of all of the methods I've seen presented (and I haven't studied any of them in detail), I suggest trying @tom's UI scripting. I'm a bit concerned about being in a loop of 8,000 records, but if you test with 10 records or so, and put a good pause (say 0.2 sec) between each record, then I think you should be OK,

But as the other's mentioned, be sure to make a backup of your BookPedia just before you start testing and before you start a production run.

Some other thoughts:

  1. As part of the UI scripting, do a save every few records.
  2. Make sure the script does a test to ensure that the data needs to be converted; else just skip to the next record (but issue a log statement for the skipped record).
  3. The more I think about this, the more I'd suggest running the script from Script Debugger 7, rather than KM. This will give you more control, and make it easier to make changes to the script and test.

Let us know if you have any questions.

1 Like

Me, too (afraid of a 8000 loops). That’s why I limited the script by default to three rows.

But later the OP said that…

I'm also quite happy to do this record by record a) to keep an eye on things; b) because some colons are to be retained and c) because some Titles have more than one colon.

So, the script should be at least an ideal testing ground.

2 Likes

Thanks, JMichaelTX!

Your points are well taken. Yes, this has been a great way for me to experiment and find out how KM really works.

I always back everything up. Fanatically so!

Thanks for the endorsement of Tom's work. In fact, I really need to be able to edit BookPedia records one by one - chiefly because, although splitting the titles on a colon is the main objective, there are times when:

  1. a title has more than one colon - and so cannot be reliably split
  2. I need to be able to override such a redistribution of the title string - and so must not be split

I've actually learnt a lot today - courtesy of all the posters who kindly contributed.

This is what I've come up with: BookPedia Title Split.kmmacros (8.9 KB)

BookPedia doesn't lend itself easily to export (as CSV etc), external manipulation, and then re-import.

The only thing which appears not to be working in the Macro I've built is that it cuts off characters at the end of the 'Original Title' (second portion, after the colon) string.

I really didn't expect such an overwhelmingly friendly and helpful reception here. Very grateful!

Tom,

Indeed it is. I'll take another closer look at it tomorrow and see if I can adapt the text-massage portions of it to work record by record. Thousand thanks…

The script already works record by record (by nature of the script). Just set the maxEntries variable to 1. So the loop will stopped after the first record.

Not sure what you mean with that… Any chance to explain?, sometimes I’m blind…

1 Like

If you can provide several real-world examples of both kinds, then it is likely that a RegEx pattern canb e designed to handle both properly. Just post your examples in a Forum Code Block, with "text" as the language.

1 Like

Tom,

Of course. I'll try that.

I did get an error on the first record when I tried running it this morning. I'll look more closely tomorrow :slight_smile: .

My fault, sorry. I meant the regex parts to do the pasting, given that I need to be able to spot two colons in a field and those which don't need splitting.

If I may, I'll rerun and report back on the error. If I get it :slight_smile:

Thanks again…

OK guys. To make some things clearer, three questions:

  • You definitely do want to proceed one record by one record, right?
  • You do not want to proceed all at once (e.g. by manipulating the dumped CSV file)?
  • It can happen that there are more than one colons in the first field?
1 Like

Our posts crossed. Just reading your post.

There is no regex in my script. Are you speaking of some other solution?

1 Like

Titles which should be split are in the format:

Title: Subtitle

An example would be:

Lost: The real life journey of Scott of the Antarctic

Titles which are more complex to regex are in the format:

Title:First Subtitle: Second Subtitle

An example would be:

Lost Again: The real life journey or Scott and the Antarctic: how he lost the race to the pole

Presumably the expression would be un-'greedy' and split only on the first occurrence of the colon.

OK. I got that. Except the double colons which you didn’t mention in the OP. (But no issue.)

Have you tried the script on an example database (of course without any titles with double colons), and does it work there?

1 Like

First thing is to see if the script basically works on your setup.

Edit: As said I have only downloaded the app and tried the default setup of the app.

1 Like

Tom,

I changed maxEntries to 1, which solves both problems vis-à-vis colons present but split not required, and two colons present, thanks.

But working on one record at a time starts the process off always from the first record in the database.

I can - and will - certainly experiment on a backup subset.

I know that you don't use BookPedia, so am doubly appreciative. One thing I have done is create a Smart Group consisting entirely of titles with one or more colon(s). Could your script be made to work on just that subset? There is also a View mode much more like a spreadsheet in columns where - again - splitting, cutting and pasting would be between fields by Tabbing.

In the case of single records, if it were possible to advance the split and paste one record at a time using the appropriate BookPedia Keystroke (not sure that that can be scripted), I suspect that'd also do it!

Yep. That’s why I proposed to order the rows in a way that is independent of recent changes (maybe creation date or somtehing).

But here come another Unknowns into play: how fast is the main window re-ordering the entries?

Try to add a pause on some locations. The AppleScript syntax for pause is: delay 1.0 (which means 1s). I would say try to add the delay just before the “repeat”, pretty much at the beginning.

1 Like

No, this app is not scriptable. That’s why I proposed UI scripting. I would be glad if it was scriptable :wink:

1 Like

Sure, you don’t want an one-in-all modification of your CSV dump?

You can always correct it afterwards? (And you have your backup anyway…)

PS:

I’m saying this because any GUI actions (via AppleScript or via KM) are unreliable. And, since the app already provides the possibility of a CSV dump, why not use it? The after-works are probably easier than doing the task at once, and one-by-one.

1 Like

Mark, could you possibly create a separate Book with, say only 100 records, from your master, that we could all have for testing? If so, maybe you could zip it and post somewhere like Dropbox that we could all access.

I've got a RegEx that should work:
(?mi)^(.+?):\h*(.+)

I've got an ASObjC RegEx handler that you can use with Tom's script.
Is everyone still working with Tom's original script posted here?

BTW, Mark, the general approach here is to put the post quote first, then add your reply comments.

1 Like

I think the main thingy here is: Should we work on the CSV dump… or work with a GUI based script.

The OP’s desire (“one record by one”) seems to point into the direction of a GUI script, but I’m not sure that the OP’s desire is the optimal one.

1 Like

Using CSV is preferable from several POV, but I thought @MarkSealey said he didn't know how to UPDATE the Book with the results. Is this correct?

1 Like