Numerically sort clipboard contents

Peter, thanks for the cool script.
It sorts fine, but the format is a bit off.
How can I adjust to make it consistent, with a ", " (comma space) after every number?

image

image

If the space is not essential, you can just do this (no need for egrep):

sort -n | paste -sd, -

If the space following the comma is essential, that's a bit trickier. Maybe something like this:

csv=$(cat | sort -n | paste -sd, -)
osascript <<< "set my text item delimiters to {\", \", \",\"}
text items of \"$csv\" as text"

or:

csv=$(cat | sort -n | paste -sd, -)
osascript -l JavaScript <<< "var csv=\"$csv\"; csv.replace(/,/g,\", \");"

but maybe that's cheating.

Surely there is a simple Bash command to replace "\n" with ",␣" ?
(what's the best symbol for SPACE?)
EDIT: "␣" is according to What character can I use to represent the space bar? - User Experience Stack Exchange

Sorry, my mistake, paste -d option does not work like I thought, it rotates through the characters.

So you'd have to follow the paste -d, with yet another pipe and replace the commas with comma-space.

You would think. Part of the issue is not wanting to include the trailing , which is why paste works well.

So a good solution would be:

paste -sd, - | sed 's/,/, /g'

Alternatively, perl has no problem changing the trailing \n into a comma-space.

perl -pe 's/\n/, /'

And some of this comes back to the discussion about a line ending character at the end of the text. I contend that normal multiline text ends with a linefeed. But either way you need to consider the end of the text and how that behaves as regards to replacing text. \n will only match a linefeed, but $ or \z or \Z will match at the end of the last line even if it does not have a trailing linefeed, although they are all zero width assertions so can't replace the linefeed.

Traditionally I have used a triangle (eg △), and word processors with “show invisibles” typically use a grey middle dot (eg ·) (BBEdit uses this for example). But realistically if you don't write it out no one is going to know what you mean.

This is harder than it looks.
This
image

produces this:
image

Perfect. . . except for the last comma.

Being ignorant in Bash, I resorted to JavaScript:

image

image

I am generally not a fan of compacting code just for the sake of doing so, but if we limit the input to the clipboard, the JavaScript could be:

image

var app = Application.currentApplication()
app.includeStandardAdditions = true

var sourceList = app.theClipboard().split(/[\r\n]/).map(e => {return parseInt(e, 10)});
sourceList.sort(sortNumber).join(', ');

function sortNumber(a,b) { return a - b; }

In case anyone is following the RegEx from the other thread, JavaScript does NOT support \R. So we have to use workarounds like [\r\n], which really should be:
\r?\n|\r

Where is @Tom? He is always really good at these games. :smile:

Yes, that was why paste was good, because it avoids that.

You can also remove the trailing “, ” with:

perl -pe 's/, $//'

so

perl -pe 's/\n/, /' | perl -pe 's/, $//'

But yes, a surprisingly complicated problem to get right. And a good reminder that you really have to know what the limitations on the input and what the desired output is.

1 Like

Not simple, but not hideous either, I finally came up with this:

printf '%s, ' $(cat | sort -n) | rev | cut -c 3- | rev
2 Likes

Bingo! that did the trick. The complete script is:

sort -n | perl -pe 's/\n/, /' | perl -pe 's/, $//'

image

image

Actually, that's simple enough. :+1:

I've got to learn Perl RegEx.


Bingo! Nice job.

Using Peter's Bash script from above, this should do the trick:

Example Output:

image


MACRO:   Sort Numbers on Clipboard and Format Output [Example] @Bash


#### DOWNLOAD:
<a class="attachment" href="/uploads/default/original/3X/c/5/c54e84508977cf0674da8ff602de29cd17ca4ab9.kmmacros">Sort Numbers on Clipboard and Format Output [Example] @Bash.kmmacros</a> (3.0 KB)
**Note: This Macro was uploaded in a DISABLED state. You must enable before it can be triggered.**

---



![image|489x563](upload://d1kgozmfslt2XfWyQYeJDFWZWSl.jpg)

I would also use "␣". Or, maybe easier to understand: "<space>"

Treating the stuff in-between the numbers as field separators (what they actually are), you can use Awk instead of regex replacements:

sort -n | awk  '$1=$1'  RS="" FS="\n" OFS=", "

This also eliminates the problem with the trailing comma, since awk automatically ignores any (superfluous) field separator at the end of the record.


Explanation:

With the variables at the end you set the different separators:

  • RS ‘Record Separator’: empty, because we don’t have any (it’s one record that ends with the end of text)
  • FS ‘[the original] Field Separator’: line feed
  • OFS ‘Output Field Separator’: the desired comma plus space (", ")

$1=$1 seems to be needed to force awk to reevaluate the fields.

A print is not needed since by default awk prints all fields. So, what awk actually does, is simply this:

  1. Learning the existing separators (RS, FS)
  2. Returning the whole record using the new field separator (OFS)

More info

2 Likes

awk is my favourite instrument in the shell cabinet – always repays a little experimentation.

See - Effective Awk Programming – 4th edition

1 Like

Wow! I have rarely, if ever, used awk, but it is evidently a major tool, with over 500 pages in its PDF manual:

GNU implementation: gawk

If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest. To write a program to do this in a language such as C or Pascal is a time-consuming inconvenience that may take many lines of code. The job is easy with awk, especially the GNU implementation: gawk.

Does the macOS use "GNU awk"?

The amount of pages in a manual is rarely an indicator of the power or the usefulness of a tool :wink:
Awk shines when it comes to structured data, or better: data that can be structured. Recognizing fields in an record and then attemptimg to work with the fields (which includes the separators) is always better than treating the separators as mere strings and brute-forcely replacing them via regex. I think.

But, I think, you can achieve the same with Perl without regexes (i.e. only by manipulating separators). Beyond my current Perl knowledge :wink:

Concerning the gawk on macOS, I’m not sure. Most likely macOS has just awk, who knows which version. Have to look it up. I got a related problem here.


Edit:

No, gawk, as expected, comes only when installed via Homebrew or similar. But I don’t know the differences between gawk and awk, and I don’t know if awk/gawk scripts tend to use only the specific implementation (gawk or awk) or if they generously default to the one that is installed. (I have no practice with awk scripting.)

1 Like

To pick up a part of your quote:

If you are like many computer users, you would frequently like to make changes in various text files wherever certain patterns appear, or extract data from parts of certain lines while discarding the rest.

That’s what I meant with “data that can be structured”. I tend to use Awk for that. Although “in general” I prefer Perl for text (not for data) manipulation. But applying regexes while clearly a data structure is present (delimeters, fields, etc.) feels a bit clunky/forced/out-of-place. But I’m sure Perl also has Awk-like capabilities (if not better!) to handle fields and such. I just don’t know them.

1 Like

Yes, I started out (ie, several decades ago) using sed & awk for text processing. But since perl does everything they do and more, I tend to use perl directly instead of sed or awk for most tasks.

For example, a close approximation to @Tom’s:

awk  '$1=$1'  RS="" FS="\n" OFS=", "

Might be something like this:

perl -e 'use English; $OFS = ", "; @F = <>; chomp(@F); print @F'

But in perl you would probably write it more like this:

perl -e '@F = <>; chomp(@F); print join( ", ", @F )'

Basically, sed and awk are very useful tools, but they tend to be limited in what they can do, and when you hit the limit then you have to start again from scratch, which is why I tend to just use perl, which can do the sort of text processing that sed and awk can do, but then extends out to anything else as needed.

But just like @ComplexPoint reaches for JavaScript and @ccstone reaches for AppleScript, we all tend to use whatever tools we are most familiar with to solve any given task.

3 Likes

Yep, something like that is what I had in mind as I said “But I’m sure Perl also has Awk-like capabilities (if not better!) to handle fields and such. I just don’t know them.” :slight_smile:

Thanks for showing. Perl is just awesome!

2 Likes

Actually my first effort was:

read -r -d '' numList <<'EOF'
77
74
32
EOF

echo "$numList" | sort -n | awk 'BEGIN {RS=""}{gsub(/\n/,", ",$0);print $0};'

Result  -->  32, 74, 77

:sunglasses:

But here's why I would normally reach for AppleScript and the Satimage.osax:

----------------------------------------------------------------
# REQUIRES: Satimage.osax --> http://tinyurl.com/satimage-osaxen
# AppleScript and the Satimage.osax are fully Unicode-aware.
----------------------------------------------------------------
# Make sure the clipboard has what we want on it:
set the clipboard to text 2 thru -2 of "
77
74
32
"
----------------------------------------------------------------
--» Main
----------------------------------------------------------------
set numList to join (sortlist (find text "\\d+" in (get the clipboard) with regexp, all occurrences and string result) comparison 2) using ", "
----------------------------------------------------------------

--> "32, 74, 77"

----------------------------------------------------------------

I really like sed and awk, but they are fairly obsolete (GNU sed and GNU awk notwithstanding), because they don't handle Unicode text very adroitly.

So – my recommendation is:

Learn as much sed and awk as you want – particularly snippets that are useful.

But if you want to seriously study something then study Perl. It's Unicode-aware and many times more powerful than sed and awk combined.

-Chris

2 Likes

Updated 2018/08/11 18:50 CDT
Fixed the problem Peter noted in post #27.


Hey Folks,

Might as well post a Perl-only solution.

This one will remove any blank lines in the imput.

-Chris


Sort Numeric Column -- Transform to '- ' Separated Values.kmmacros (5.4 KB)

I believe Perl’s sort sorts alphabetically, so that would sort 100 before 22. You need to add { $a <=> $b } or something like that to the sort.