Warning: I'm still learning RegEx, so this might be basic for some of you. In this particular case I just copied this from a website.
After some research I was able to find the RegEx to find a specific word while excluding words that include that word, for example, if I want to find anything with the word rocket, but exclude rocketed
Now when I test it at https://regex101.com, I get this result where it includes the 2 spaces before and after:
Also, if my string is rocket. it will also validate. I want it to be exclusive to what I type and exclude everything else, no matter what comes before or after
I also noticed that using roCket for example, will not validate using that expression.
I was able to change that by using: (?i)(?:^|\W)rocket(?:$|\W)(?-i)
Is this a good approach?
I'll start with a disclaimer that I am by no means a regex expert, so if one of them happens along listen to them, not me
I don't think you need the final "(?-i)" as there is nothing after it, the opening "(?i)" is sufficient to make the subsequent pattern match case insensitive.
Adding the parentheses around "rocket" creates a capturing group. You're not doing anything with the captured data but that doesn't matter, and if you find it makes it more readable in this case then I don't see it's doing any harm. However, it's probably best not to get into the habit of adding parentheses to every regex for readability as it might cause confusion if you are trying to use capturing groups in the future.
That's impossible to answer, as the scenarios the regex pattern could be used against are almost infinite. In my experience (which largely consists of blundering my way to a solution through trial and error) you can't assume that a regex that works perfectly to match some text in document A will work at all in document B, since, for example, an extra space or line break in document B can mean it doesn't match what you expect it to. It's best if you know the text that you're using the regex against so that you can test it thoroughly and then you can be confident it will work as you expect. Having said all of that, in the case of putting parentheses around "rocket" here, I can't see it causing any problems, but I can't promise it won't.
Well you are way above my level or expertise anyway, so I'll take your advice, especially when it works as expected
I tried it without and it works, as you said.
I'm not sure I understand what you mean by "there is nothing after it"?
Can you share a real example where (?-i) would be used?
But in that case, wouldn't that be a scenario where the parenthesis for readability could represent an empty (or useless) capture group? I would be able to see, according to the context (such as a macro), that that particular case is for readability. Or is there a real downside to that? Like, would something stop working or something?
Yeah, my question was more related to something that would be more obvious. like something that you could think of that happens 90% of the time or something.
Sure, each scenario is different and when the time comes where something doesn't seem to work, I will must likely have to check why it isn't and will learn an exception to the rule
Appreciate your contribution to this! I'm taking notes here so I can learn more as I go.
I simply meant that it's at the end of the regex pattern, nothing else follows it. What your regex was saying is:
switch to case insensitive mode for the pattern that follows
match a word boundary
match the literal string "rocket"
match a word boundary
switch to case sensitive mode for the pattern that follows
Since you have the (?-i) at the very end of the pattern, with nothing after it, it doesn't do anything, so it might as well not be there.
As for a "real" example, you would use it if you needed part of your pattern to match with case insensitivity and part without, it allows you to switch between the two modes. To see it working copy and paste this pattern into regex101 and then experiment with test strings to see what matches it:
Start with a test sting of "case insENsitiVE", which matches the pattern, and notice that you can change the case of any of the letters in the word "case" and it continues to match, but if you change the case of any letter in "insensitive" it no longer matches.
I don't think it will stop working as such, I'm just pointing out that parenthesis have a specific meaning to the regex engine (forming a group), and if you are using them for readabiity just be aware that you might confuse yourself at some point in the future if you're trying to create a complex pattern with capture groups. As long as you're aware of that you can watch out for it.
You see, this is one of the "issues" when you know a bit of "everything": things can start becoming confusing when you mix things up.
Even though I'm not an expert when it comes to HTML, I'm pretty comfortable with it and so I looked at (?i) as an opening "tag", the way I would use for example in HTML, and then (?-i) as a closing tag, like . So to me, the way I was reading the RegEx was like "everything inside the opening and closing tag, make it case insensitive".
Now I understand it. (?i) makes the first match case insensitive. If I wanted another match after that to be case sensitive, I would use (?-i)
In this case the -i means it's "negativing" the insensitivity.
It is clear now. Again, thanks for taking the time to clarifying this for me. A new thing to add to my notes
Yes, that's what I mean. As long as I'm aware of what that is, since I don't use RegEx to share it with other people who could misinterpret it, it's ok.