Parse Email and 5 Digit Zip Code

Ok, I'm really pushing my luck but why not try?
I'm including a fair test sample of the various text formats I get. (the names, emails and phone numbers have been altered to protect identities)
About 60% of the responses are like the first 4 examples.
The other 40% are formatted various ways.
Is there a way to 'fairly' reliably get the email address and the 5 digit zip code parsed out into variables?
really appreciate any input

Shaw Guyery
Newark, NJ 07103
shaxxxx@gmail.com
+1 732 621 0987
Authorized to work in the US for any employer Work Experience
Waitress/Food Service Associate

Ger Hoh
Brooklyn, NY 11203
gmaxxx@aol.com
+1 347 356 1987 Dear Hiring Manger,
I am seeking a position in yo

Jenn Hern
New York, NY 10033
jenniferhernandezxx@gmail.com
+1 911 757 7492
Competent housekeeper with over 4 years of experience in providing excellent housekeeping services in hotel and private residence settings. Capable of handling work and staff pressure in fast- paced environmen

Kat Kla
Wallington, NJ 07057
katrisxxxx@gmail.com
+1 201 702 3899
Work Experience
Housekeeper


errolalexxxx@gmail.com
+1 347 422 1789
To obtain a position that would provide me an opportunity to grow and utilize my skills. Authorized to work in the US for any employer
Work Experience
Scheduler

Bri Bet
4100 Central Ave Rochelle Park, NJ 07662 551-206-1345 Betancurxxx@gmail.com
Objective: Seeking an opportunity within an organization which will utilize my strengths and skills while providing opportunity for professional growth
Summary of Skills

Not sure how this needs to fit into the whole flow, but as always, you can use simpler regexes if they are used from a scripting language which has a split function.

In JS, for example, you might start by trying this kind of thing:

const
    rgxTwoLineGap = /[\n\r]{2,}/ug,
    rgxEmail = /^\w+@/ug,
    rgxZip = /[A-Z]{2} \d{5}/ug;

Email and 5 digit.kmmacros (5.4 KB)

Expand disclosure triangle to view JS Source
(() => {
    "use strict";

    const main = () => {
        const
            kme = Application("Keyboard Maestro Engine"),
            candidateList = kme.getvariable("candidateList")

        const
            rgxTwoLineGap = /[\n\r]{2,}/ug,
            rgxEmail = /^\w+@/ug,
            rgxZip = /[A-Z]{2} \d{5}/ug;

        const chunks = candidateList.split(rgxTwoLineGap);

        return JSON.stringify(
            chunks.flatMap(chunk => {
                const xs = lines(chunk);

                return 1 < xs.length ? ([{
                    fullName: xs[0],
                    emailMatch: xs[
                        xs.findIndex(x => rgxEmail.test(x))
                    ],
                    zipMatch: xs[
                        xs.findIndex(x => rgxZip.test(x))
                    ].split(", ")[1]
                }]) : [];
            }),
            null, 2
        );
    };

    // --------------------- GENERIC ---------------------

    // lines :: String -> [String]
    const lines = s =>
        // A list of strings derived from a single
        // string delimited by newline and or CR.
        0 < s.length ? (
            s.split(/[\r\n]+/u)
        ) : [];

    // sj :: a -> String
    const sj = (...args) =>
        // Abbreviation of showJSON for quick testing.
        // Default indent size is two, which can be
        // overriden by any integer supplied as the
        // first argument of more than one.
        JSON.stringify.apply(
            null,
            1 < args.length && !isNaN(args[0]) ? [
                args[1], null, args[0]
            ] : [args[0], null, 2]
        );

    return main();
})();

Expand disclosure triangle to view JSON results
[
  {
    "fullName": "Shaw Guyery",
    "emailMatch": "shaxxxx@gmail.com",
    "zipMatch": "NJ 07103"
  },
  {
    "fullName": "Ger Hoh",
    "emailMatch": "gmaxxx@aol.com",
    "zipMatch": "NY 11203"
  },
  {
    "fullName": "Jenn Hern",
    "emailMatch": "jenniferhernandezxx@gmail.com",
    "zipMatch": "NY 10033"
  },
  {
    "fullName": "Kat Kla",
    "emailMatch": "katrisxxxx@gmail.com",
    "zipMatch": "NJ 07057"
  }
]
1 Like

Hey man, I am not able to test until tomorrow. Wanted to let you know each one of those ‘paragraphs’ are a different example of the text I will ‘scrape’.
I don’t need to do that whole block. Only one at a time.
FYI - I’m automating getting that web data into my Filemaker pro database. Setting variables in KM, then setting AppleScript variables, then picking them ‘up’ in FileMaker.

Good – zooming out to that overview from the start is the best approach :slight_smile:

But it's still not very clear – you really need to fill us in on the mechanics of:

  1. the input (and its source) in each case
  2. the desired output (and its destination)

( I'll look again tomorrow )

1 Like

To put it another way, everything is a relationship between an incoming arrow and an outgoing arrow.

Unless you specify the former and the latter, we can't really help :slight_smile:

1 Like

Yes.

Use Apple's Data-Detectors. For this task they should be very reliable – even with wildly formatted input data.

-Chris


Extract Email Address and Zipcode from Data Record String v1.00.kmmacros (9.2 KB)

Macro-Image

Keyboard Maestro Export

1 Like

Hi @ComplexPoint I apologize for not being clear on the need. I should have done better.
I am very appreciative of your time and expertise, really.

I used the macro and it works even for a single paragraph so I'm good, the one thing, I don't know how to parse out the 'fullName', 'email' and 'zip from a single result in the %Variable%candidatesJSON%
I tried looking at a previous macro you provided "Name and number from penultimate two lines" and mimicking that but to no avail.

Not a problem at all – showing the inputs and outputs, with their context, just gets there quicker : -)

1 Like

I don't know how to parse out the 'fullName', 'email' and 'zip from a single result in the %Variable%candidatesJSON%
I tried looking at a previous macro you provided "Name and number from penultimate two lines" and mimicking that but to no avail.

What does a single input look like ?

You are selected some lines and copying them ?

Something else ?

A single input is text that I have to highlight and copy on a web page that is embedded. It is not obtainable by a select all / copy all command, so yes, it is manually copied by me.
Any of the following blocks could be a single input: there are 6 different examples in the following text.

Shaw Guyery
Newark, NJ 07103
shaxxxx@gmail.com
+1 732 621 0987
Authorized to work in the US for any employer Work Experience
Waitress/Food Service Associate

Ger Hoh
Brooklyn, NY 11203
gmaxxx@aol.com
+1 347 356 1987 Dear Hiring Manger,
I am seeking a position in yo

Jenn Hern
New York, NY 10033
jenniferhernandezxx@gmail.com
+1 911 757 7492
Competent housekeeper with over 4 years of experience in providing excellent housekeeping services in hotel and private residence settings. Capable of handling work and staff pressure in fast- paced environmen

Kat Kla
Wallington, NJ 07057
katrisxxxx@gmail.com
+1 201 702 3899
Work Experience
Housekeeper


errolalexxxx@gmail.com
+1 347 422 1789
To obtain a position that would provide me an opportunity to grow and utilize my skills. Authorized to work in the US for any employer
Work Experience
Scheduler

Bri Bet
4100 Central Ave Rochelle Park, NJ 07662 551-206-1345 Betancurxxx@gmail.com
Objective: Seeking an opportunity within an organization which will utilize my strengths and skills while providing opportunity for professional growth
Summary of Skills

I am able to use your previous macro,
Email and 5 digit.kmmacros (5.4 KB)

but on a single block of text or candidate. I get the following response and don't know how to get it parsed out 'cleaner' without the the brackets, etc etc......

[
  {
    "fullName": "Shaw Guyery",
    "emailMatch": "shaxxxx@gmail.com",
    "zipMatch": "NJ 07103"
  }
]

again, and again,,, thank you!

For example (with a single candidate)

Email and 5 digit (single candidate).kmmacros (4.5 KB)

Expand disclosure triangle to view JS Source
(() => {
    "use strict";

    const main = () => {
        const
            kme = Application("Keyboard Maestro Engine"),
            chunk = kme.getvariable("candidateChunk");

        const
            rgxEmail = /^\w+@/ug,
            rgxZip = /[A-Z]{2} \d{5}/ug;

        const xs = lines(chunk);

        return 1 < xs.length ? JSON.stringify({
            fullName: xs[0],
            emailMatch: xs[
                xs.findIndex(x => rgxEmail.test(x))
            ],
            zipMatch: xs[
                xs.findIndex(x => rgxZip.test(x))
            ].split(", ")[1]
        }, null, 2) : "Nothing";
    };

    // --------------------- GENERIC ---------------------

    // lines :: String -> [String]
    const lines = s =>
        // A list of strings derived from a single
        // string delimited by newline and or CR.
        0 < s.length ? (
            s.split(/[\r\n]+/u)
        ) : [];

    return main();
})();
2 Likes

erg!!! I am telling you =), I tried that exact code in the last step (obviously not because it didn't work) - I looked at one of your previous macros and mimicked it..... and no workee.....

Full name: %JSONValue%candidate.fullName%

cheers, you rock, and again, thank you....

Hi @ComplexPoint , I'm getting some fails on certain 'candidates' 'data scrape'.
I included some of the Failed data and successful data examples:

Fails:
Troy Krueger
601 West 57th Street, Apt. 14H
New York, New York 10019
Phone: 1-646-220-1111
E-mail: troykrueger.guerra30@yahoo.com

Fails:
Troy Krueger
Ria Batson Resume
Brooklyn, NY 11212
troykrueger.batson@gmail.com
+1 929 225 1111

Fails:
Troy Krueger
Mount Vernon, NY
troykrueger1975@icloud.com
+1 914 573 1111


WORKS:
Troy Krueger
New York, NY 11369
troykruegerl59@gmail.com
+1 347 237 1111

WORKS:
Troy Di Krueger
Bookseller
New Rochelle, NY 10805
troykrueger@gmail.com
+1 908 601 1111

I truly appreciate your help, no worries if I have 'over extended' my stay!
cheers

FWIW, are you sure that ZIP Code is correct ?

This regex (used in the script) will match a pattern of two uppercase letters, I think.

[A-Z]{2}

Yes it is a correct zip code....
I appreciate your response....
I think I'll have to do a work around.... the data is just not formatted in a consistent way, which is making it very difficult to automate 'scraping' it.
Thru the help of these awesome folks on the forum I have gotten it much more automated than it was at the beginning using a bunch of copies and pastes....
all good, cheers

As the pattern matching only needs to find a line number, and the ZIP patterns are not that regular – don't for example, necessarily involve 5 digits at all – you might be able to relax the patterns to something like:

const
    rgxEmail = /^[\w\.]+@/ug,
    rgxZip = /^[^0-9]+, /ug;

As in:

Email and 5 digit (single candidate).kmmacros (4.4 KB)

2 Likes