Need Assistance With Regex and Parsing Data From a Webpage

I am attempting to save to a file with only 5 character words following the text "/dictionary/ in the text collection.

I thought I had it working with the first webpage I used. I have not included all of the text I am processing – only a portion that seems to work/fail.


Dictionary Parse.kmmacros (13 KB)

Macro Image

Hi dglancy,

Your test text does not appear to contain any 5 letter words, so I think your test text will always return nothing.

I also think all of those words contain the letter A, so they will likely match your final IF statement, and again you won't get anything.

Does that make sense? Am I barking up the wrong tree?

What you are trying to achieve is very much doable, probably with as few as two or three actions if you wanted to get really fancy - but I'm keen to make sure that we're on the right page before devoting more time here. Happy to help though :slight_smile:

1 Like

Regardless of your regex, there's some confusion in your "If" action's conditions.

"Contains" will treat the Variety variable as text and will be true if Varty contains "5" -- so, for example, it would be true if Varty was 15. To compare numbers use the "is" conditions -- so for an equality test use "is =".

"Does not match" is for regular expression matching. Because you've used a single character, "Does not match" will only be true if Var does not contain an "A" somewhere, anywhere, in it -- if Var is set to "Ab" it does match.

So at the moment you will only append text to your file if that text does contain a "5" and doesn't contain an "A" -- which is probably not what you want!

The test text is actually larger in the macro than what is shown in the image. When I run the macro, this is the output.

autot
autotransformer
autotransfusion

5 characters long, 15 characters long, 15 characters long, none of them containing the character "A" -- all three match your "If" statement as written. See above about condition confusion.

Thanks. I did want words with 5 characters and did not contain a capital "A". I will revisit the manual on variables and conditions.

in a related question, in the "IF" action, I was originally trying to calculate the length of each variable and test for 5 characters. I searched the site and could not figure that out and used the separate action "Set variable to calculation". any help is appreciated.

thanks, I did get the condition confusion. I was just responding to Vincent. I guess I am not clear who I am responding to and did not read all of the responses first. Sorry.

I think we were both replying at the same time, hence the confusion -- no apology needed!

There's no problem with two-stepping by setting a variable and then testing it. But you can make your macro more succinct by using a "Calculation" condition instead. For example:

length test.kmmacros (3.5 KB)

Image

\"/dictionary\/([a-z]{5}).>

as your regex pattern should give you only 5 letter words, that don't have any capital letters. It's not exactly what you've asked for, but it's easier than excluding just capital A letters, and may well do what you need.

Now that I think about it, that character set [square brackets bit] could be [B-z] and it would work, but that's just because capital A is conveniently at the beginning of the letters alphabetically so is super easy to exclude by just starting at B.