Find and highlight long sentences

I am trying to find a way to create a macro to search a documents and find sentences longer than 20 words then highlight that sentence.

Any help would be great!

Highlight the sentences in a text editor ? PDF viewer ? other editor ?

yes, in a text editor like pages / dreamweaver. The highlight part I am sure I can figure out depending on the program I try to run this in. But I can’t seem to figure out how to tell it to look for sentences with more than 20 words in them.

This regex:

(\w+\s+){19,}?(\w+[\.|?]) {NB: edited to correct formatting gremlin attack.}

should either work or get you close :blush: . It assumes that every “sentence” terminates with either a period or a question mark.

(\w+\s+) — any number of word characters followed by any number of space characters
{19,}? — repeat the previous group (it’s in parenthesis) 19 times or more, where “more” is “as many as possible”
(\w+[\.|?]) — any number of word characters followed by either a period or a question mark. “|” is the pipe character (shift\ on most keyboards). Note the period before the pipe.

Test before using.

and here's another way - with a JavaScript function which returns a list of long sentences and their indices (and word counts):

Long sentences.kmmacros (21.5 KB)

(function (strText, lngThreshold) {

  var lstSentences = strText.split(/\./);

  return JSON.stringify(
      function (a, s, i) {
        var lngWords = s.split(/\s+/).length;

        return lngWords > lngThreshold ? a.concat({
          sentence: i + 1,
          words: lngWords,
          text: s
        }) : a;
      }, []
    ), null, 2);
	"Keyboard Maestro Engine"

JSON object output, which could be used at the selection stage:

    "sentence": 1,
    "words": 47,
    "text": "During dinner, Mr Bennet scarcely spoke at all; but when the servants\nwere withdrawn, he thought it time to have some conversation with his\nguest, and therefore started a subject in which he expected him to\nshine, by observing that he seemed very fortunate in his patroness"
    "sentence": 5,
    "words": 46,
    "text": " The subject elevated him\nto more than usual solemnity of manner, and with a most important aspect\nhe protested that \"he had never in his life witnessed such behaviour in\na person of rank--such affability and condescension, as he had himself\nexperienced from Lady Catherine"
    "sentence": 6,
    "words": 24,
    "text": " She had been graciously pleased to\napprove of both of the discourses which he had already had the honour of\npreaching before her"
    "sentence": 7,
    "words": 30,
    "text": " She had also asked him twice to dine at Rosings,\nand had sent for him only the Saturday before, to make up her pool of\nquadrille in the evening"
    "sentence": 8,
    "words": 21,
    "text": " Lady Catherine was reckoned proud by many\npeople he knew, but _he_ had never seen anything but affability in her"
    "sentence": 9,
    "words": 45,
    "text": "\nShe had always spoken to him as she would to any other gentleman; she\nmade not the smallest objection to his joining in the society of the\nneighbourhood nor to his leaving the parish occasionally for a week or\ntwo, to visit his relations"
    "sentence": 10,
    "words": 57,
    "text": " She had even condescended to advise him to\nmarry as soon as he could, provided he chose with discretion; and had\nonce paid him a visit in his humble parsonage, where she had perfectly\napproved all the alterations he had been making, and had even vouchsafed\nto suggest some herself--some shelves in the closet up stairs"

Thank both of you!! I will try to get this up and running and let you know how I made out after bit.

I think Java might be the way to go here… At the end of the day this is what I am after.

I want it to scan over the selected text then highlight any sentences over 20 words long.

The window was nice to be able to see what sentence had 20+ words but some of the projects might be 20 pages long. So I would just want it to highlight the whole sentence that has the 20+ words in it.

Anyway to do that?

Here is something for MS Word but I am on a mac and want it to work across multiple applications.

Sub ScratchMacro()
Dim oSent As Range
For Each oSent In Sentences
If oSent.Words.Count 20 Then
oSent.HighlightColorIndex = wdRed
oSent.HighlightColorIndex = wdNoHighlight
End If
End Sub

Also this “(\w+\s+){19,}?(\w+[.|?])” Does work great but not sure how to complete it and make it highlight all sentences over set number of words in a sentence.

What is

  • your end goal (what will you do with "highlighted" sentences that have ≥20 words)?
  • your interim goal
  • highlight all at once?
  • highlight each in turn?
  • mark each in some other way so you can see/select them?

No need to provide a solution that doesn't meet both your stated need and your actual workflow. What exactly

  • do you mean by "highlight"
  • what will you do with the "highlighted" sentences?

The regex can be used in KM to produce a list of matches that you step through, processing each in turn.

I am just learning KM and regex. Regex is, afaict, a subset of KM's functions, whereas AppleScript is an addition. Your end goal will likely determine whether it is better to use KM with its built-in functions, or use AppleScript (imho).

Note that your OP fudges this:

You can't "find sentences" and then "highlight that sentence".

Yes I would like all to be highlighted at once.

I dont need to select them just see where they are so I can edit them.
My end goal is to make sure all sentences are 20 words or less. Once highlighted I would go back though and change the wording to make sure I hit the goal of 20 words or less.

… but highlighting (I assume you mean “select”) is lost once you move the insertion point.

Perhaps it is better to style the sentences with ≥20 words in some way? Red? Bold? (I really don’t know. That’s why I asked for your end goal.)

What I mean by highlighted is the background color of the sentence would be red if over 20 words.

I don’t want it selected for the reason you said.

Well, here's an interim solution :blush: : (er ... image corrected ... sorry)

I like your output here and from what others had said I am convinced this will be the way I will go.

With that said I like how your output looks like but when I use your code mine looks like this. It is all grouped together.

How can i fix this?

[{"sentence":6,"words":25,"text":" If a caller has been parked for a longer time than the specified time limit then Asterisk will again ring the originally dialed extension"},{"sentence":9,"words":60,"text":"conf:\n include => parkedcalls\n\nIf you have a more complex dialplan and want to be able to Goto() a more elaborate 'parkedcalls' handler then you'll need to be sure to include a handler for the 'i' priority to catch calls to parkinglot without call in them as well as the 's' priority to give timeouts somewhere to go, thus:\n\n "},{"sentence":18,"words":26,"text":"\n include => parkedcalls\n exten => i,1,Playback(pbx-invalidpark)\n exten => i,2,Hangup\n\nNotes\nIt won't show the parking extension when you do a show dialplan in the CLI"},{"sentence":21,"words":30,"text":"\nThe user must be allowed to do transfers for being able to park a call, so check the t and T options of the Dial() command\nFor Asterisk 1"},{"sentence":23,"words":39,"text":"x, the user must be allowed to use call parking by adding the k and/or K options of the Dial() command\nAsterisk-based transfers only work if Asterisk in the media path (which can be enforced through "canreinvite=no" in sip"}]

This is extremely simplistic, but it shows how easy the task is in Tex-Edit Plus.

# Highlight Lines with More than 20 Words.
tell application "System Events" to keystroke "c" using {command down}
delay 0.05
set theText to the clipboard

tell application "Tex-Edit Plus"
  make new document with properties {contents:theText}
  tell front document
    set text width to wide open
    set lineCount to count of lines
    set theLineContent to get lines
    repeat with i from 1 to lineCount
      if length of (words of (item i of theLineContent)) > 20 then
        set hilite color of line i to red
      end if
    end repeat
  end tell
end tell

I would not use AppleScript’s words. I’d either use text items or a regular expression depending upon what James really means by a sentence.

I would probably insert a marker at the beginning of each problematic sentence, so I could rapidly get to them by using G.

There are a lot of ways to go about this job but only two apps I know of that can easily handle highlighting text – Tex-Edit Plus & Microsoft Word.


You can choose between the compact and indented views by using or omitting an argument to the JSON.stringify() method:

, null, 2);

creates an indented version (each indent 2 spaces), whereas:

, null);

yields the default compact layout.

Hi All!

This has been working very well the past year. Thank you!

I am trying to update it a little more and cant seem to understand how to do so.

What I would like to add is the following:

  1. If a sentence ends with : (colon) then count it as the end of the sentence. Currently it will include everything else after it till it sees a (.) or (?).

  2. I would also like it if there is a space between paragraphs it also would end the counting of that sentence.

(Example: (following was a copy and past off the internet)

Put simply, all of these different types of paragraphs simply involve layering on a different purpose or intent. When students have the right foundation, it’s just that simple

What are you trying to achieve in this paragraph and in your whole composition? What is your purpose right here? Do you wish to describe? Do you want to evaluate? Is your goal to narrate? Is your intent to persuade?)

Because the first paragraph is missing a period then it will think it is part of the next paragraph. How can I tell it to not to?

At the very end of the day I really need the 1st one in my list (colon) to work. The second is a want.

Want to let you know that I got it working by adding>>

var lstSentences = strText.split(/.|\n/);