Padding a two-colum section of text for monospace display

For another macro I'm working on, I wanted to clean up the presentation of a report it creates. It's a text file, in a monospace font, but it contains rows of two-columnar data that don't align, because both sides are of variable length. The output comes directly from a Unix command, so I can't really (easily) control the incoming format.

I wanted a way to insert a divider between the two sections, at the same column number such that all the left-side terms would fit. I eventually settled on two passes through the list with some regular expression games—the first pass figures out the longest left-side term, which sets the location of the divider. The second pass pads each entry, if needed, to that column.

If there's a quicker/easier way to do this, I'd love to hear it—thankfully, my lists are always short (shorter than in this demo), so there's no real time issue from looping all entries twice. But I'm always open to a better solution.

Here's the before-and-after with my test script:

And here's the test script:

Character Padding experiment.kmmacros (12 KB)

This solution would be horrid for super long lists, but it's fine for short ones. Any better way to attack this?

-rob.

Here's another approach:

1 Like

Thanks, I was pretty sure I could export the text to some Terminal app to process, but didn't want to get lost in that tangent. Bookmarked for a future update to the other macro where I use this, as I'm sure any of those will be faster than what I'm doing.

-rob.

1 Like

I dug through the other forum, but it was all for stuff more complex than what I needed. So I asked ChatGPT for help, and it came up with a wicked fast Perl script.

The new one is just a bit faster than the old :).


-rob.

1 Like

Is that faster than your initial method or the bash script I linked to?

That's faster than my original method. The bash script was just a touch slower than Perl:

-rob.

1 Like

Ok here's a perl version of my approach:

Align Text as Spaced Columns.kmmacros (23 KB)

Macro screenshot

Thanks for that; using bits of your code, I was able to get rid of the save/read/delete file steps in mine. I couldn't get the variable reading to work right originally, as I was using double quotes. But I also don't know Perl very well at all, so the split lines function was also critical. Feels much better running without using the temp file I was writing!

-rob.

1 Like

During development, I discovered something that seems to throw all of these out of whack: Emoji. If there are emoji in the words, the alignment is off, I think because emoji are multi-character characters that appear as one? So while your eye sees one character, the system counts three, which throws off the alignment.

I couldn't find a way around this, as some emoji seem to be two characters in length, others are three—I was thinking I could just pad the character count based on counting emoji (assuming I could figure out how to do that), but it wouldn't work.

In the end, I gave up, and my columns won't align for strings with emojis.

-rob.

1 Like

Here's what GPT4 has to say about it. I'm not near my mac at the moment so I can't try it yet, but have a look...

In Perl, handling emojis can be tricky due to their multi-byte nature in UTF-8 encoding. The script you have does not explicitly handle the variable-width characters like emojis, which may lead to alignment issues.

Here’s why: The length function in Perl, by default, operates on the number of bytes in the string, not on the number of characters or grapheme clusters (which is what you'd be interested in for proper display width when dealing with emojis or other multi-byte characters).

Since Perl v5.6, Unicode support has been available, but you need to explicitly tell Perl to treat the input as Unicode. To properly handle emojis, you would need to ensure that your script:

  1. Reads the input as UTF-8.
  2. Counts characters in terms of Unicode grapheme clusters rather than bytes.

Here's how you can modify the script to better handle Unicode characters such as emojis:

#!/usr/bin/perl
use strict;
use warnings;
use utf8; # Indicate that this script is written in UTF-8
use Encode qw(decode encode);
use Unicode::GCString;

# Get the value of the Keyboard Maestro variable "Local__Input"
my $input = decode('UTF-8', $ENV{'KMVAR_Local__Input'} // '');

# Get the value of the Keyboard Maestro variable "Local__Spaces"
my $spaces = $ENV{'KMVAR_Local__Spaces'} // 2;

# Get the value of the Keyboard Maestro variable "Local__Delimiter"
my $delimiter = $ENV{'KMVAR_Local__Delimiter'} // ',';  # default to a comma if no delimiter is provided

# Split input into lines
my @lines = split("\n", $input);

# Calculate the number of columns
my $num_columns = scalar(split($delimiter, $lines[0]));

# Calculate the length of the longest line in each column
my @max_lengths;
foreach my $line (@lines) {
    my @columns = split($delimiter, $line);
    for (my $i = 0; $i < $num_columns; $i++) {
        my $gcstring = Unicode::GCString->new($columns[$i]);
        $max_lengths[$i] = $gcstring->columns() if (!$max_lengths[$i] || $gcstring->columns() > $max_lengths[$i]);
    }
}

# Calculate the total width of each column
my @widths = map {$_ + $spaces} @max_lengths;

# Print the aligned columns
foreach my $line (@lines) {
    my @columns = split($delimiter, $line);
    for (my $i = 0; $i < $num_columns; $i++) {
        my $gcstring = Unicode::GCString->new($columns[$i]);
        printf("%-${widths[$i]}s", $gcstring->as_string());
    }
    print "\n";
}

1;

Here's what's added/changed:

  • use utf8; tells Perl that your script is written in UTF-8.
  • use Encode qw(decode encode); allows you to explicitly decode and encode input/output as UTF-8.
  • use Unicode::GCString; to use the Unicode::GCString module which correctly calculates the display width of a string, taking into account Unicode grapheme clusters.

To run this script, you would need the Unicode::GCString module installed, which can be done via CPAN:

cpan Unicode::GCString

With these changes, the script will be better suited to handle emojis and other complex Unicode characters when aligning text in columns.

I had found something similar in my talks with ChatGPT, but the need for an external module is a bit of a deal breaker—trying to install it for the user in a script may take them into the CPAN configuration module, and/or the install may fail.

The other option is to stop and ask them to install and verify the module themselves first, which also seems less than ideal.

-rob.

1 Like

My thoughts exactly.