Get Some Text Info From the File Name


I have some file names like this:

science 2022-10-01.pdf
science 20231015.pdf
11012022 science.pdf

I need to get the ''year'' info (2022 or 2023, if has both 2022 and 2023, get the first one; the file name may contain 2024 or 2025 in the future) from the file name.

How can achieve this?


What does some consist of ? (i.e. what form does the input take)

A list of lines in a text file ?
A selection of files in the Finder ?
Something else ?

1 Like

files in selected folder

You could search for a regular expression pattern like (20\d\d),

and save that (first and only) parenthesized capture group to a Keyboard Maestro variable.

For example, if you have directly selected all the files that interest you, some variant of:

First 21c year in filename.kmmacros (3.9 KB)

And the For Each Item in a Collection action also allows you to specify all files in a given folder:

Hey @som,

Will dates like this one:

11012022 science.pdf

Always have the year on the back end of the string?

Or could it be written thus?

20221101 science.pdf


Using the find command in the Terminal may do want you want.

i.e. Print all files containing the years starting with 20 to the terminal

find . -iname ‘*20??*.*’ -print

If you want to move them to another folder, use the exec option.

find . -iname ‘*20??*.*’ -exec mv ‘{}’ ~/Documents/someFolder \;

You can try using any shell file commands after the exec option so you can rename, delete, copy, sort etc.

For files that don't have an extension, leave out the dot and star from the search string.

Unfortunately that'll also match "MyFile-20-01-1999.txt", "20 Ways to get to the Top.pdf", etc. You really do need to pattern match, and even then will need reasonably consistent file name structures to match against...

1 Like

That's true and well spotted, but if I change the search string a little to the one I should of used originally, I find all the filename examples given in the original post but not the samples you gave.

find . -iname '*202*.*' -print

There will always be exceptions of course, but this should get 'som' a fair way and maybe he could use the command a few times using different search strings in a Run Shell Script action.

Ah, but if you change the search command a little more...

find . -regex '.*20[0-9][0-9].*' -print can narrow things down even more, ensuring you match "20 then two digits" rather than "202 followed by anything". You can even go

find -E . -regex '.*20[0-9][0-9][.-]?[0-9][0-9][.-]?[0-9][0-9].*' -print

for "contains: 20 then two digits, an optional . or -, two more digits, another optional . or -, two more digits, then anything" to match all of

science 2022-10-01.pdf
science 20221001.pdf
science 2022.10.01.pdf

in one sweep.

You can expand that further and catch both "delimited with year first" and "delimited with year last" forms.

Where it will fall down is with eg "science 20121999.pdf" -- it's obvious to us that the year is "1999", but you'd have to build extra logic into your pattern to infer day and month digit-pairs, assess their validity, work out the date format, etc. At which point you're probably better off sanitising the data beforehand!