Analyze Contents of Text Files

Is it possible, using KBM, to analyze text files and report the number of times each found common noun is used?

Of course. Although I'm not sure if your questions means to exclude actions such as "Execute Shell Script." Here's how I would do it:

image

You place your file in the first box. Then the 'tr' command breaks every space into a newline. Then the grep command returns the count of the number of lines that contain the word Maestro.

There are some ways to fine-tune this. For example, it may not see "Maestro+Maestro" as two counts of the same word, because there is no space between these words. So my solution is simply a point for discussion and possible improvement. My solution does, however, account for case differences, like "maestro" and "Maestro."

If you put the "word" you want into a variable, then I think you can simply replace Maestro above with $KMVAR_YourVariableName

Thanx Airy. However, I am not looking for a particular word, just the top most used common nouns whatever they may be and I do not have a list of those words.

So I thought perhaps I could use KBM to control flow and Applescript to count and report back to KBM the findings, i.e. the three most used common nouns in each file that get analyzed.

I am trying to get the Applescript to function properly, but I am having trouble getting it to compile. Here's what I have so far, but it is returning errors and won't compile.

use framework "Foundation"

property directoryPath : "/Path/To/TextFiles"

tell application "Finder"
    set fileList to selection as list
end tell

repeat with eachFile in fileList
    set fileContents to (read text from file eachFile)
    set commonNouns to {}

    -- Extract common nouns using a regular expression
    set commonNounPattern to "(?<!\S)[A-Z][a-z]*(?=\s|$)"
    set commonNouns to (every match of commonNounPattern in fileContents)

    -- Count common noun occurrences
    set commonNounCounts to {}
    repeat with eachNoun in commonNouns
        set nounCount to (value of key eachNoun in commonNounCounts)
        if nounCount is equal to missing value then
            set nounCount to 1
        else
            set nounCount to nounCount + 1
        end if
        set value of key eachNoun in commonNounCounts to nounCount
    end repeat

    -- Sort by count
    set sortedCounts to (items of commonNounCounts) as list
    sort sortedCounts using {key: "value", ascending: false}

    -- Extract top three common nouns
    set topThreeNouns to (items 1 through 3 of sortedCounts)
    set topThreeNounsText to (text items of topThreeNouns as list, using ", ")

    -- Display results
    display topThreeNounsText as "Top 3 common nouns in " & name of eachFile
end repeat

I can't give advice about AppleScript, but I think I have a much simpler way to solve your problem. If you run this single KM action, you will see a list of the five most common words in your last message to me, with their frequencies. Shell tools are very mature and powerful. Maybe you'll be satisfied with this, or maybe you won't.

Count Words Macro (v11.0.1)

Count Words.kmmacros (2.6 KB)

If you like this solution, you may want to modify it to account for things like punctuation, capitalization, etc. This is just a very simple solution that may not take everything you want into account. It can be improved.

Thanx, Airy... I'll give that a try!

Perhaps also worth looking at DEVONthink for that kind of thing ?

DEVONthink : Analyze Text Documents

DEVONthink Inspectors : Concordance

1 Like

Do a search here for Text Toolbox. It does that and more.