Identifying the highest number from a list

hayleyh · March 2, 2022, 8:06pm

The problem is pretty simple, I'm just looking to extract only the highest number from a list like this one:

80k
255k
73k
65k
96k
456k
145k
221k
669k
321k
769k
405k
1185k
1316k
635k
680k
4455k
1301k
1496k

Any good ways to do this? Thanks!

Sleepy · March 2, 2022, 9:12pm

I love problems like this. There are probably many ways to do it. The first thing I need to know is whether all numbers have "k" and whether any number has commas or periods.

I'll write up a solution in about 1 minute...

That solution removes the letter k, then sorts the list by numeric value, then takes the last item on the list which should be the largest number. Mind you, it doesn't add the k back. If you really want the k back, I can probably patch it up for you.

ALYB · March 2, 2022, 9:16pm

Using bubble sort in JS? Bubble Sort Algorithm in JavaScript

ComplexPoint · March 2, 2022, 10:05pm

No need for any algorithm, though, if you are going to use JS, it has a .sort() method built in.

(You could probably get away with just using the JS Math.max function, rather than a full sort)

ComplexPoint · March 2, 2022, 10:39pm

In a Keyboard Maestro Execute JavaScript for Automation action, you could write something like this, using the Math.max() function.

Sorted numeric strings (affixed by 'k').kmmacros (2.2 KB)

Expand disclosure triangle to view JS Source

(() => {
    "use strict";

    const ns = Application("Keyboard Maestro Engine")
        .getvariable("affixedNumbers")
        .split("\n")
        .map(
            s => s.slice(0, -1)
        );

    return `${Math.max(...ns)}k`;
})();

Incidentally, I distracted myself with talk of 'sorting', which we don't need here

The macro should have been called something like "maximum from a list of numeric strings, affixed by k"

hayleyh · March 2, 2022, 11:47pm

That worked perfectly for me, thank you! I shouldn't have included the k, it isn't important. There shouldn't ever be periods or commas in this list. Though if you're up for a more challenging data set, this one is potentially more useful to me:

63.15KiB
53.64KiB
66.78KiB
164.80KiB
119.47KiB
100.77KiB
320.71KiB
93.96KiB
82.56KiB
121.13KiB
571.62KiB
182.50KiB
277.41KiB
838.39KiB
403.15KiB
974.04KiB
508.67KiB
1.45MiB
1.61MiB
796.54KiB
852.84KiB
5.45MiB
1.59MiB
1.83MiB

As you can probably guess these are file sizes. The goal is the same, return the largest file size. The KiB to MiB seemed more complicated so I stayed away from it.

Thanks again!

Sleepy · March 2, 2022, 11:48pm

Actually, handling those extra characters isn't a big deal. I think I know what they mean. But this time it will take more than 2 minutes because I'm a tad busy with supper.

Okay here's the solution. Sorry it took a whole 3 minutes.

hayleyh · March 3, 2022, 12:21am

Pretty close! I get 5450000.00 as a result, definitely the right number. The ideal for me here would be to return "5.45MiB" in this case, sorry I didn't make that clear.

Sleepy · March 3, 2022, 12:21am

I see. Okay, fair point. Let me think about that. There's always a solution, but this one might take two lines of code.

I think I have a solution, but my dinner is ready now. So it will likely take 30+ minutes to create here.

Okay, here's my first attempt, which seems to work. It requires that you have the data in a file called numbers.txt and it has a work file called number index.txt

Here's the text:

cat ~/data/numbers.txt | tr -d "iB" | sed "s/M/*1000000/;s/K/*1000/" | bc | nl | sort -n -k 2 | tail -n 1 | awk '{print $1}' > ~/data/numberindex.txt
cat ~/data/numbers.txt | sed -n `cat ~/data/numberindex.txt`p

I'm still working on a cleaner solution than this. I like this sort of challenge. Test it out, see if it works. If it's still not right, I can fix it. I've got an idea for a cleaner solution that doesn't involve files. I need a few minutes.

Ok try this simple solution. I used a few tricks. I'm not sure if it's perfect. I'm still testing it. So far it's working.

awk '{print $1}' | sed 's/iB//' | sort -h | tail -n 1 | sed 's/$/iB/'

You can remove the first command. I had that there as a hangover from an earlier version. All you need it this. Pretty short and beautiful, isn't it?

sed 's/iB//' | sort -h | tail -n 1 | sed 's/$/iB/'

hayleyh · March 3, 2022, 1:20am

NICE! It seems pretty perfect, I'll use it a bit more to make sure. Thank you SO MUCH for your help!

Sleepy · March 3, 2022, 1:22am

I'm happy if you are happy. Now I can go get some dessert. I've earned it.

ComplexPoint · March 3, 2022, 7:28am

As a footnote to this, if your lists are very long it might begin to matter that solutions depending on a sort slow down much faster as the number of items increases.

Just finding the maximum only requires time directly proportional to the number of items, without any exponential component in the slowing down.

Intuitively, this is, of course, because a sort has to find the ordered position of every item in a list, whereas a maximum involves finding position of only one item, and the number of comparisons required is smaller.

If for any reason you do prefer command lines to python and javascript (which both have max functions built in) you can also use a max function in one of the standard Perl libraries.

There is some discussion here:
text processing - Finding the maximum of the values in a file - Unix & Linux Stack Exchange

PS I'm not sure that the shell is really a very good place in which to solve problems – fiddly syntaxes, often with weakish readability, and a horrible "everything boiled down to string" data type.

Perfect recipe for happy hours of puzzlement and debugging : -)

Sleepy · March 3, 2022, 8:13am

Your words are true and wise. I was only worried about compactness and speed of coding.

Now you've got me wondering what the simplest solution would be using the idea of not sorting.

Here's a short action that does the work, but doesn't handle the KiB or MiB strings yet....

I think that I can see roughly how to modify this to make it work with "KiB" and "MiB," but I'm a little tired. And I'm not very skilled with awk.

Ok, this seems to solve the problem without using "sort":

Here's the command text:

tr -d "iB" | sed 's/M/000000/g;s/K/000/g' | awk 'BEGIN{m=-inf};{m = $0>m? $0: m};END{print m}' | sed 's/000000$/MiB/;s/000$/KiB/'

unlocked2412 · March 4, 2022, 9:28pm

FWIW, a Haskell solution.

We could define a custom datatype (the order of the constructors matter for the ordering):

data FileSize = B | KiB | MiB | GiB | TiB deriving (Eq, Ord, Read, Show)

We could chain a series of functions (composition):

taking every line in the string,

lines

separating each into pairs of number strings and file sizes,

map (break isAlpha)

reading the first component as a Float and the second one as FileSize,

map (bimap (read :: String -> Float) (read :: String -> FileSize))

obtaining the maximum according to a custom comparator function, based on

file size (second component)
number (first component)

maximumBy (comparing snd `mappend` comparing fst)

transforming each component to its string representation,

bimap show show

appending (<>) the resulting pair to obtain a String.

uncurry (<>)

Finally, assembling the pieces (composing):

uncurry (<>)
  . bimap show show
  . maximumBy (comparing snd `mappend` comparing fst)
  . map (bimap (read :: String -> Float) (read :: String -> FileSize))
  . map (break isAlpha)
  . lines

Maximum File Size.kmmacros (4.7 KB)

Expand disclosure triangle to view Haskell Source

import Data.Bifunctor
import Data.Char
import Data.List
import Data.Ord

data FileSize = B | KiB | MiB | GiB | TiB deriving (Eq, Ord, Read, Show)

interact' :: (String -> String) -> IO ()
interact' f = do
                path <- getContents
                s <- readFile path
                putStr (f s)

main :: IO ()
main =
  interact' $
    uncurry (<>)
      . bimap show show
      . maximumBy (comparing snd `mappend` comparing fst)
      . map (bimap (read :: String -> Float) (read :: String -> FileSize))
      . map (break isAlpha)
      . lines

Identifying the highest number from a list

Options