Filter Out Names From a List

cfriend · March 28, 2018, 4:18pm

I can’t believe I can’t figure this out.

I’ve got a long list of IG followers like this:

niknogiko
Nikki G
Follow

yocarrot
James
Follow

jasonferro
Jason F!
Follow

and I want to filter that list to get just the usernames, end result:

niknogiko
yocarrot
jasonferro

I had something like this from @JMichaelTX a long time ago but I cannot find it and cannot piece it together from what I can find.

The Regex I was trying to use was:

(Follow|Following)\n\n(.*)

But all the attempts and all the action-sh*t I’m throwing at the wall isn’t sticking

Can y’all help me out with this? Thank you!

JMichaelTX · March 28, 2018, 7:34pm

cfriend:

I can't believe I can't figure this out.

I've got a long list of IG followers like this:
niknogiko
Nikki G
Follow

yocarrot
James
Follow

jasonferro
Jason F!
Follow
and I want to filter that list to get just the usernames, end result:
niknogiko
yocarrot
jasonferro

Try this:

Search for:
(?mi)^(.+)\n.+?\nFollow.*(?:\n|$)

Replace with:
\1

See regex101: build, test, and debug regex

cfriend · March 28, 2018, 9:00pm

I made a mistake in my original post, I need to check for Follow and Following, also. How would that change the pattern? I came to edit once I got home but you were too quick! Thank you!

JMichaelTX · March 28, 2018, 10:12pm

Did you try the pattern I gave you?
It works for me in the Regex101 snippet.

cfriend · March 28, 2018, 11:11pm

I haven’t yet (still away from my Mac) but the full text contains both “Follow” and “Following” and I thought that would change the required RegEx. I meant to include both in the sample but made an error.

mrpasini · March 28, 2018, 11:31pm

JMichaelTX’s regex accounts for both “Follow” and “Following” – the “.*” after “Follow” performs the magic.

JMichaelTX · March 28, 2018, 11:33pm

It is the .* after "Follow" that allows for "Following":
Follow.*

You could also use:
Follow(?:ing)?

which would technically be more precise and limiting.

ComplexPoint · March 29, 2018, 12:35pm

There are many ways of doing this, and tastes vary about the balance between concision and maintainability.

If your source text consists of groups of three lines, delimited by a blank line, then another approach would be to reach for an Execute JavaScript action, and write a function which gathers the first line of each group:

// firstLinesOfGroups :: String -> [String]
const firstLinesOfGroups = strLines =>
		
		// Groups of N lines,
		strLines.split('\n\n')

		// Mapped to a list 
		// which holds just the first line of each group.
		.map(x => x.split('\n')[0]);

The action could then return a value with a final JS line like:

return firstLinesOfGroups(str).join('\n');

(Where str is your input text)

(Away from macOS at the moment, with only an iOS device, so this code was tested in the JSContext of iOS 1Writer)

cfriend · March 29, 2018, 1:34pm

All right! I've sat down to test it and this works great! Thank you! I have one minor request though:

It turns out I was not thorough enough with checking my source. There are instances where there are only two lines, not three (where someone did not enter a "real name.") With this modified source data, how would I accomplish the same end result?

nikno_giko
🄺🄸🄲🄷
Follow

yocarrot🐶
Follow

jasonferro
🦊Jason F아리 
Following

As I just discovered some of the text is only two lines. Would this JS approach still work?

Thank you all!

P.S.: MAN that regex101 link is COOL. What an awesome teaching tool.

ComplexPoint · March 29, 2018, 1:37pm

Would this JS approach still work

It will work (without modification) with line groups of any length, but if the missing names take the form of blank lines, then it will overproduce – treating any group with a blank line in the middle as two groups.

cfriend · March 29, 2018, 1:43pm

I don't understand this part. There are no groups with a blank line in the middle, if that's what you mean.

ComplexPoint · March 29, 2018, 1:46pm

Good – that’s fine then. No need to modify it – it doesn’t depend on how many lines there are in each group.

ComplexPoint · March 29, 2018, 3:48pm

and paring it down, you should be able to paste something like this into an Execute JavaScript for Automation action.

(assuming, for example, that the data is held in a kmVar called ‘theList’)

(() => {

	return Application('Keyboard Maestro Engine')
		.getvariable('theList')
		.split('\n\n')
		.map(x => x.split('\n')[0])
		.join('\n');
	
})();

JMichaelTX · March 29, 2018, 10:11pm

I know you know better than to post incomplete examples, but for everyone's benefit:
Please make sure you provide a real-world, complete example of all use cases in your original post asking for help.

This is my last pass.

Search:
(?im)^(.+?\n)(?:.+\n)*?Follow(?:ing)?(?:\n*|\Z|\z)

Replace:
\1

For details, see regex101: build, test, and debug regex

Example Results

cfriend · March 29, 2018, 11:21pm

I know, I'm ashamed I rushed to the forums too quickly. I didn't know you didn't have to put a name and could wind up with only two lines! This one works perfectly, thank you very much.

ccstone · March 30, 2018, 9:13am

Hey Christian,

Let's see if I can add anything to this little conundrum...

Trudging on the RegEx Treadmill

With this sort of text parsing problem if I don't have confidence in the input format I usually make at least two passes.

First pass:

Strip leading and trailing vertical whitespace. Removing cruft often makes data extraction much easier.

\A\s+|\s+\Z

Second pass:

(?is)\v*(\S.+?\n).*?follow(?:ing)?

If I wanted to make just one pass in this task I'd probably do something like this:

(?is)\v*(\S.+?\n).*?follow(?:ing)?\s*

Or maybe this:

(?si)\v*(\S.+?\n).+?(\n{2,}|\z)

I would prefer to use \w (word-character) instead of \S (non-space-character), but the blinking username might start with an emoji or something else.

Perl's motto TIMTOWTDI is more than a little true for regular expressions as well.

(Pronounced Tim Toady – “There's More Than One Way To Do It.”)

First attempts to construct patterns tend to be clunky and ugly – then after beating your head against the wall for hours and/or taking a break (or sleeping on it) you may get that “I could have had a V8!” moment and find a simpler and more elegant approach.

The AppleScript Avenger

If this task's data structure is completely predictable then plain vanilla AppleScript can do the job easily.

Method 01 – The “hard” way:

----------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2018/03/30 01:34
# dMod: 2018/03/30 03:13
# Appl: Vanilla AppleScript
# Task: Extract the first line of each paragraph of the given text.
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @Extract, @First, @Line, @Paragraph, @Text, @Parse
----------------------------------------------------------------

set dataStr to "nikno_giko
🄺🄸🄲🄷
Follow

yocarrot🐶
Follow

jasonferro
🦊Jason F아리 
Following
"

set printIt to 1

set dataStr to paragraphs of dataStr

repeat with theLine in dataStr
   if contents of theLine ≠ "" then
      if printIt = 1 then
         set printIt to 0
      else
         set contents of theLine to 0
      end if
   else
      set printIt to 1
      set contents of theLine to 0
   end if
end repeat

set AppleScript's text item delimiters to linefeed
set dataStr to (text of dataStr) as text

----------------------------------------------------------------

Method one will tolerate uneven spacing between text-blocks – as long as there are at least two linefeeds between them. It will also tolerate leading and trailing vertical whitespace.

Method 02 – Easier and faster using TIDS:

(This method is less robust, because it is more dependent upon consistency of the input data structure, but AppleScript's text item delimiters are powerful and very fast when used appropriately.)

----------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2018/03/30 01:34
# dMod: 2018/03/30 01:43 
# Appl: Vanilla AppleScript
# Task: Extract the first line of each paragraph of the given text.
# Libs: None
# Osax: None
# Tags: @ccstone, @Applescript, @Script, @Extract, @First, @Line, @Paragraph, @Text, @Parse
----------------------------------------------------------------

set dataStr to "nikno_giko
🄺🄸🄲🄷
Follow

yocarrot🐶
Follow

jasonferro
🦊Jason F아리 
Following
"
set AppleScript's text item delimiters to linefeed & linefeed
set dataList to text items of dataStr

repeat with listItem in dataList
   set contents of listItem to paragraph 1 of listItem
end repeat

set AppleScript's text item delimiters to linefeed
set newDataStr to dataList as text

----------------------------------------------------------------

The Perl in the Ointment

Now – just for fun let's have a go with Perl...

Extract Line 1 of Each Separate Text Block.kmmacros (4.9 KB)

What I've done here is very similar to what I did in the first AppleScript, but the code is more flexible and compact.

It's pretty cool what you can do with a very few lines of code, but then this sort of job is exactly what Perl was designed for.

PERL

Practical Extraction and Reporting Language.

OR

Pathologically Eclectic Rubbish Lister!

Peter – The <STDIN> source selector in the Execute a Shell Script action is waay cool!

-Chris

ComplexPoint · March 30, 2018, 11:42am

As a footnote to this set of jazz variations, for my own purposes I find that a fast and maintainable way of light scripting is to use a set of c. 300 generic functions, in both JavaScript and AppleScript.

If I needed this snippet myself, I might, (assuming a header for the function inclusions, or a macro which brings in only the generic functions used) have written something like:

Javascript

return unlines(
	map(
		x => head(lines(x)),
		splitOn('\n\n', strInput)
	)
);

Applescript

script firstLine
	on |λ|(x)
		head(lines(x))
	end 
end script

return unlines(map(firstLine, splitOn("\n\n", strInput)))

cfriend · March 30, 2018, 6:47pm

It really is incredible to see the three of y’all come up with so many ways to accomplish a task. It’s one of my favorite things about this forum. Thank you for all the examples.

JMichaelTX · March 30, 2018, 9:19pm

Chris, I have no doubt that you can. I was hoping that you'd join in here. I always learn something from your posts.

Yep, mine too.

I really like having a lot of tools in my tool kit. It allows me to pick the best tool I know how to use, given the time constraints. I like and use both scripting and RegEx, while continuing to learn both, and of course, KM.

There is no doubt that this is the best forum I've ever belonged to, and there have been many over the decades.

But I have to say, that given the use case, my favorite solution here is:

Chris posted the regex101.com snippet for his other solution, but here is the one for this solution:
regex101: build, test, and debug regex

ccstone · March 31, 2018, 2:14am

Hey Guys,

Okay – let’s add some more AppleScript solutions.

Most Keyboard Maestro users don’t know that it can search and replace from AppleScript, so here’s an example using the current task.

NOTE – Remember to try AppleScripts in the Script Editor.app before fooling with the Execute an AppleScript action in Keyboard Maestro.

Anyone who uses AppleScript should be at least generally familiar with the SE (or Script Debugger), because it provides more feedback when running scripts than Keyboard Maestro can.

----------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2018/03/30 20:29
# dMod: 2018/03/30 20:34 
# Appl: Keyboard Maestro, Keyboard Maestro Engine
# Task: Use Keyboard Maestro's RegEx Search & Replace directly with AppleScript
# Libs: None
# Osax: None
# Tags: @Applescript, @Script, @Keyboard_Maestro_Engine, @RegEx, @Search, @Replace
----------------------------------------------------------------

set dataStr to "
nikno_giko
🄺🄸🄲🄷
Follow

yocarrot🐶
Follow

jasonferro
🦊Jason F아리 
Following
"

tell application "Keyboard Maestro Engine"
   set foundItemList to search dataStr for "(?is)\\v*(\\S.+?\\n).*?follow(?:ing)?\\s*" replace "$1" with regex
end tell

----------------------------------------------------------------

Here’s a single-pass method of doing the job with AppleScript and the Satimage.osax (AppleScript Extension).

----------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2018/03/30 20:20
# dMod: 2018/03/30 20:26 
# Appl: Satimage.osax, AppleScript
# Task: Using the Satimage.osax and RegEx to extract user-names from structured data.
# Libs: None
# Osax: Satimage.osax --> http://tinyurl.com/satimage-osaxen
# Tags: @Applescript, @Script, @Satimage.osax, @RegEx, @Extract, @User-Names, @Names, @Structured, @Data
----------------------------------------------------------------
# REQUIRES the Satimage.osax to be installed for RegEx support.
----------------------------------------------------------------

set dataStr to "
nikno_giko
🄺🄸🄲🄷
Follow

yocarrot🐶
Follow

jasonferro
🦊Jason F아리 
Following
"

try
   set foundItemsList to find text "(?im)[\\n\\r]*(\\S.+?)\\n.*?follow(?:ing)?" in dataStr using "\\1" with regexp, all occurrences and string result
on error
   set foundItemsList to "false"
end try

if foundItemsList ≠ false then
   set foundItemsList to join foundItemsList using linefeed
end if

----------------------------------------------------------------

Here’s the sort of method I’d use when I was a beginner with regular expressions. (Again using the Satimage.osax.)

----------------------------------------------------------------
# Auth: Christopher Stone
# dCre: 2018/03/30 20:49
# dMod: 2018/03/30 21:05
# Appl: Satimage.osax, AppleScript
# Task: Using the Satimage.osax and RegEx to extract user-names from structured data.
# Libs: None
# Osax: Satimage.osax --> http://tinyurl.com/satimage-osaxen
# Tags: @Applescript, @Script, @Satimage.osax, @RegEx, @Extract, @User-Names, @Names, @Structured, @Data
----------------------------------------------------------------
# REQUIRES the Satimage.osax to be installed for RegEx support.
----------------------------------------------------------------

set dataStr to "
nikno_giko
🄺🄸🄲🄷
Follow

yocarrot🐶
Follow

jasonferro
🦊Jason F아리 
Following
"

# Strip Vertical Whitespace from top and bottom of data.
set dataStr to change "\\A\\s+|\\s+\\Z" into "" in dataStr with regexp without case sensitive

# Normalize space between data records.
set dataStr to change "^(follow(?:ing)?)$\\s*" into "\\1\\n\\n" in dataStr with regexp without case sensitive

# Add two linefeed before the first data record for consistency.
set dataStr to linefeed & linefeed & dataStr

# Find the first line of each data record (preceded by two linefeeds).
set foundItemsList to find text "\\n{2}(\\S.+)" in dataStr using "\\1" with regexp, all occurrences and string result

set foundItemsList to join foundItemsList using linefeed

I’ve got three steps in there to prepare the data for easier parsing, and this makes the final pattern really simple.

Despite having more steps the script still runs in approximately 0.001 seconds on my system.

So this method can be much more efficient than spending 30 minutes (or hours) finding just the right one-pass pattern.