Numerically sort clipboard contents

The amount of pages in a manual is rarely an indicator of the power or the usefulness of a tool :wink:
Awk shines when it comes to structured data, or better: data that can be structured. Recognizing fields in an record and then attemptimg to work with the fields (which includes the separators) is always better than treating the separators as mere strings and brute-forcely replacing them via regex. I think.

But, I think, you can achieve the same with Perl without regexes (i.e. only by manipulating separators). Beyond my current Perl knowledge :wink:

Concerning the gawk on macOS, I’m not sure. Most likely macOS has just awk, who knows which version. Have to look it up. I got a related problem here.


Edit:

No, gawk, as expected, comes only when installed via Homebrew or similar. But I don’t know the differences between gawk and awk, and I don’t know if awk/gawk scripts tend to use only the specific implementation (gawk or awk) or if they generously default to the one that is installed. (I have no practice with awk scripting.)

1 Like

To pick up a part of your quote:

If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest.

That’s what I meant with “data that can be structured”. I tend to use Awk for that. Although “in general” I prefer Perl for text (not for data) manipulation. But applying regexes while clearly a data structure is present (delimeters, fields, etc.) feels a bit clunky/forced/out-of-place. But I’m sure Perl also has Awk-like capabilities (if not better!) to handle fields and such. I just don’t know them.

1 Like

Yes, I started out (ie, several decades ago) using sed & awk for text processing. But since perl does everything they do and more, I tend to use perl directly instead of sed or awk for most tasks.

For example, a close approximation to @Tom’s:

awk  '$1=$1'  RS="" FS="\n" OFS=", "

Might be something like this:

perl -e 'use English; $OFS = ", "; @F = <>; chomp(@F); print @F'

But in perl you would probably write it more like this:

perl -e '@F = <>; chomp(@F); print join( ", ", @F )'

Basically, sed and awk are very useful tools, but they tend to be limited in what they can do, and when you hit the limit then you have to start again from scratch, which is why I tend to just use perl, which can do the sort of text processing that sed and awk can do, but then extends out to anything else as needed.

But just like @ComplexPoint reaches for JavaScript and @ccstone reaches for AppleScript, we all tend to use whatever tools we are most familiar with to solve any given task.

3 Likes

Yep, something like that is what I had in mind as I said “But I’m sure Perl also has Awk-like capabilities (if not better!) to handle fields and such. I just don’t know them.” :slight_smile:

Thanks for showing. Perl is just awesome!

2 Likes

Actually my first effort was:

read -r -d '' numList <<'EOF'
77
74
32
EOF

echo "$numList" | sort -n | awk 'BEGIN {RS=""}{gsub(/\n/,", ",$0);print $0};'

Result  -->  32, 74, 77

:sunglasses:

But here's why I would normally reach for AppleScript and the Satimage.osax:

----------------------------------------------------------------
# REQUIRES: Satimage.osax --> http://tinyurl.com/satimage-osaxen
# AppleScript and the Satimage.osax are fully Unicode-aware.
----------------------------------------------------------------
# Make sure the clipboard has what we want on it:
set the clipboard to text 2 thru -2 of "
77
74
32
"
----------------------------------------------------------------
--» Main
----------------------------------------------------------------
set numList to join (sortlist (find text "\\d+" in (get the clipboard) with regexp, all occurrences and string result) comparison 2) using ", "
----------------------------------------------------------------

--> "32, 74, 77"

----------------------------------------------------------------

I really like sed and awk, but they are fairly obsolete (GNU sed and GNU awk notwithstanding), because they don't handle Unicode text very adroitly.

So – my recommendation is:

Learn as much sed and awk as you want – particularly snippets that are useful.

But if you want to seriously study something then study Perl. It's Unicode-aware and many times more powerful than sed and awk combined.

-Chris

2 Likes

Updated 2018/08/11 18:50 CDT
Fixed the problem Peter noted in post #27.


Hey Folks,

Might as well post a Perl-only solution.

This one will remove any blank lines in the imput.

-Chris


Sort Numeric Column -- Transform to '- ' Separated Values.kmmacros (5.4 KB)

I believe Perl’s sort sorts alphabetically, so that would sort 100 before 22. You need to add { $a <=> $b } or something like that to the sort.

Hey Peter,

Bleep! You're quite right.

Thanks – I'll fix that later today.

Right now I'm knackered and require a nap. 😴

-Chris

Of all of the bash solutions, this script is the most readable to me.
Of course, readability depends in large part on the reader's knowledge of the language, and I'm clearly a novice at bash.

But I do enjoy RegEx, so it is fairly easy for me to read this script, and make changes to it if need be.

However, we have a number of good choices posted here, so you can pick the one whose language you understand/like the best.

You find

sort -n | perl -pe 's/\n/, /' | perl -pe 's/, $//'

more readable than

printf '%s, ' $(cat | sort -n) | rev | cut -c 3- | rev

?

I don't feel a Satimage scripting addition was especially needed in this situation:

set the clipboard to "
77
74
32
"

set my text item delimiters to {", ", linefeed}
text items of (the clipboard)'s words as text

Hey @CJK,

I didn't say it was, but your script doesn't sort the numbers – mine does.

Mine will also scrape numbers out of other text and is therefore a bit more versatile.

-Chris

Good point. I jumped in too early thinking that it was just the latter part of the puzzle we were solving again (i.e. concatenation). Egg on my face.

1 Like

Yes. I thought I explained why. I already know some RegEx so it is easy for me to read the perl commands. Whereas I don't know any of the other commands you used (other than sort).

For those that don't know RegEx, the perl probably looks like gibberish. :wink:

BTW, I didn't say one was better than the other -- just which one was more readable to me. YMMV.

Hey Folks,

I've update my macro in post #26 to fix the issue Peter noted in post #27.

Lexical sort changed to ascending numeric sort.

-Chris

That makes sense.

Oh, I know. I'm not butthurt. Just quizzical. I try to keep ego out of these things, so don't be concerned.

Perl will let you change the field-separators in the search command.

I don't like the “leaning-toothpick” appearance of the forward-slash, so I usually use the exclamation point like so:

sort -n | perl -pe 's!\n!, !' | perl -pe 's!, $!!'

I find this more readable.

Quite a few characters are available for this – amongst peoples favorites are:

s#search-pattern#replace-pattern#
s~search-pattern~replace-pattern~
s|search-pattern|replace-pattern|

Just another one of the niceties in Perl.

(Sed also permits this.)

** PersonalIy don't like using the pipe character, since it so often is part of a regular expression – but some people swear by it.

-Chris

Hey Folks,

Here's a version using the shell sort command and a single invocation of Perl which does multiple search/replace operations.

I've mixed and matched the search-command-separators, because I use a negative-lookahead in the last pattern (which contains a “!” character.

The -0777 command line option changes the Perl input line separator to undefined – which lets us slurp STDIN and feed all the lines to Perl in one go.

(Perl normally operates on one line at a time.)


Perl in the Shell – Search-Replace multiple times in a Single Invocation of Perl.kmmacros (4.9 KB)

You could write the shell script this way to make it more readable:

sort -n | 
perl -0777 -e '
$_ = <>;
s!\A\s+!!s;
s#\s(?!\Z)#, #g;
print;
'

-Chris

and for any who might feel more at home with an Applescript translation of the original solution:

Paste copied col of numbers as comma-delimited sorted row (AS).kmmacros (22.2 KB)
asVersion

Applescript source

use AppleScript version "2.4"
use framework "Foundation"
use scripting additions


on run
    
    script pageNum
        property digit : my isDigit
        on |λ|(para)
            set s to strip(para)
            if 0 < length of s and all(digit, (characters of s)) then
                {s as integer}
            else
                {}
            end if
        end |λ|
    end script
    
    intercalateS(", ", ¬
        sort(concatMap(pageNum, ¬
            paragraphs of (the clipboard))))
end run


-- GENERIC FUNCTIONS ------------------------------------------------------

-- https://github.com/RobTrew/prelude-applescript

-- Applied to a predicate and a list, `all` determines if all elements 
-- of the list satisfy the predicate.
-- all :: (a -> Bool) -> [a] -> Bool
on all(f, xs)
    tell mReturn(f)
        set lng to length of xs
        repeat with i from 1 to lng
            if not |λ|(item i of xs, i, xs) then return false
        end repeat
        true
    end tell
end all

-- concatMap :: (a -> [b]) -> [a] -> [b]
on concatMap(f, xs)
    set lng to length of xs
    if 0 < lng and class of xs is string then
        set acc to ""
    else
        set acc to {}
    end if
    tell mReturn(f)
        repeat with i from 1 to lng
            set acc to acc & |λ|(item i of xs, i, xs)
        end repeat
    end tell
    return acc
end concatMap

-- drop :: Int -> [a] -> [a]
on drop(n, xs)
    if n < length of xs then
        if text is class of xs then
            text (n + 1) thru -1 of xs
        else
            items (n + 1) thru -1 of xs
        end if
    else
        {}
    end if
end drop

-- dropWhile :: (a -> Bool) -> [a] -> [a]
-- dropWhile :: (Char -> Bool) -> String -> String
on dropWhile(p, xs)
    set lng to length of xs
    set i to 1
    tell mReturn(p)
        repeat while i ≤ lng and |λ|(item i of xs)
            set i to i + 1
        end repeat
    end tell
    drop(i - 1, xs)
end dropWhile

-- dropWhileEnd :: (a -> Bool) -> [a] -> [a]
-- dropWhileEnd :: (Char -> Bool) -> String -> String
on dropWhileEnd(p, xs)
    set i to length of xs
    tell mReturn(p)
        repeat while i > 0 and |λ|(item i of xs)
            set i to i - 1
        end repeat
    end tell
    take(i, xs)
end dropWhileEnd

-- intercalateS :: String -> [String] -> String
on intercalateS(sep, xs)
    set {dlm, my text item delimiters} to {my text item delimiters, sep}
    set s to xs as text
    set my text item delimiters to dlm
    return s
end intercalateS

-- isDigit :: Char -> Bool
on isDigit(c)
    set n to (id of c)
    48 ≤ n and 57 ≥ n
end isDigit

-- isSpace :: Char -> Bool
on isSpace(c)
    set i to id of c
    i = 32 or (i ≥ 9 and i ≤ 13)
end isSpace

-- min :: Ord a => a -> a -> a
on min(x, y)
    if y < x then
        y
    else
        x
    end if
end min

-- Lift 2nd class handler function into 1st class script wrapper 
-- mReturn :: First-class m => (a -> b) -> m (a -> b)
on mReturn(f)
    if class of f is script then
        f
    else
        script
            property |λ| : f
        end script
    end if
end mReturn

-- sort :: Ord a => [a] -> [a]
on sort(xs)
    ((current application's NSArray's arrayWithArray:xs)'s ¬
        sortedArrayUsingSelector:"compare:") as list
end sort

-- strip :: String -> String
on strip(s)
    script isSpace
        on |λ|(c)
            set i to id of c
            i = 32 or (i ≥ 9 and i ≤ 13)
        end |λ|
    end script
    dropWhile(isSpace, dropWhileEnd(isSpace, s))
end strip

-- take :: Int -> [a] -> [a]
on take(n, xs)
    if class of xs is string then
        if n > 0 then
            text 1 thru min(n, length of xs) of xs
        else
            ""
        end if
    else
        if n > 0 then
            items 1 thru min(n, length of xs) of xs
        else
            {}
        end if
    end if
end take