How Do I find a URL in the Text of a File?

ppayne · November 28, 2019, 12:46pm

Hello, all, hope you're settling down for a great Thanksgiving if you're in the U.S.

I've got a question. I needed to search a text file with grep [RegEx] and determine if text (a full length URL that was being reduced with a bit.ly API, and stored in the file) was already stored or not. I have been using this search to do this:

The search code is (?mi)^%Variable%Local_TextToFindEscaped%.*$

I recently realized it was giving errors, because it would return, say, https://website.com/bestsellers/ when the actual line I need to find is https://website.com/bestsellers/someotherinfo.

If I can change the grep search so that there's a hard space inside the search (indicating the break between the URL and the shortened URL), it will work, but playing around with the grep code, I can't figure out how to force it to search for "[whateevr] " with a space after the code.

Can anyone help me out?

JMichaelTX · November 28, 2019, 11:24pm

I hope you don't mind that I have revised your topic title to better reflect the question you have asked.

FROM:
A way to add a space to a grep search?

TO:
How Do I find a URL in the Text of a File?

This will greatly help you attract more experienced users to help solve your problem, and will help future readers find your question, and the solution.

Also, it is best to use "RegEx" instead of "grep" to refer to Regular Expressions in KM. "grep" is a term mostly used in association with command line Shell Scripts.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Now, trying to address your problem...

Since URLs can vary so much, it can be quite tricky to find/identify the full URL from a string of arbitrary text. So the problem is more complex that knowing how to match a SPACE in RegEx.

But, for future reference, you can search for a SPACE by a number of ways, including these:

Just enter the SPACE character directly
If you wanted to allow for either a SPACE or a TAB, then:
[ \t]
And if you wanted to allow for one or more, the add a suffix of "+":
[ \t]+

Now, to the more general problem of matching a full URL, here is one solution:
(http|https|ftp):\/\/(\S+)

The pattern will return the full URL and its parts in the Capture Groups:
(http|https|ftp):\/\/(\S+)

For details, see regex101: build, test, and debug regex

Note that this KM Action will find the FIRST URL in the text.

ppayne · November 29, 2019, 10:03am

Thank you very much! Yes that's a much clearer topic. Adding the [ \t] to the end of my string was exactly what I needed, thanks!

How Do I find a URL in the Text of a File?

Options