How to Get String Byte Size in Applescript?

My apologies to Peter since this is not strictly a KM question.
But I know my good friends @ccstone and @ComplexPoint are likely to know the answer off the top of their heads. :smile:

So, using AppleScript, how do I calculate the storage size of a string in bytes without actually writing it to a file? Is it as simple as

NumBytes = (the count of strVariable) * 4

or is it more complicated?

I know you guys love details, so here’s my use case:

I need to get the total storage size of a selection of Evernote Notes.
Evernote shows the size in the EN Mac UI, but does NOT provide a “size” property for a Note object in AppleScript. So I have to calculate it.

My logic is this:

  1. Get each Note
  2. Get the ENML (Evernote’s internal markup lang.) string, which is the contents of the Note, but does NOT include any attachments (only a link to the attachment).
  3. Get the count of the ENML string and multiply by 4
  4. Get the size of each attachment
  5. Sum the sizes from #3 and #4

My initial test of this script seems to validate #3.

Comments?

There is no one answer to this because it depends entirely on the encoding of the output string.

So it might be:

  • 1 * length of string - for ASCII or ISO Latin or other 8 bit encodings (generally bad since it does not support a wide range of Unicode characters).
  • 2 * length of string - for UTF16 (sometimes used, but expensive in space and misses some Unicode characters)
  • 2 * length of string plus a few - for UTF16 with surrogate pairs (sometimes used, but expensive in space)
  • 4 * length of string - for UTF32 (rarely used because it is very expensive in space).
  • (1 to 6) * length of string - for UTF8 (commonly used, encodes all of Unicode, but varies in length)

On top of that is the question of what "length of string" is measured in. On the Mac, it is often UTF16 (with surrogate pairs counting as 1 or 2 characters).

1 Like

Bingo! Question Answered!

I don't know why I didn't try this before, but now after your post I tested a number of individual Notes ranging from 100KB to 3MB that were text only -- no images or attachments. And it turns out that in every case 1 char uses 1 byte, or at least that is how Evernote is reporting the size in the UI.

Many thanks!

Probably it is using UTF8 encoding, which is 1 byte per char for ASCII characters (normal simple latin character set, things like letters and numbers and basic symbols.

If you fill up a note with Chinese characters, it will probably be more like 2 or 3 bytes per character.

Thanks. I struggle enough with English, much less Chinese. No chance of anything other than latin characters appearing in my notes. But it does raise an issue for trying to make my AppleScript usable by a wide range of users.

The best solution is for Evernote to expose the Note Size property in AppleScript. I'm going to request this.

Thanks for all your help.