java, regex

Java anchored regex

I just discovered this today when doing some regex in Java. When I first started doing regex in Java, I was surprised to learn that Java seems to treat all regular expressions as anchored. That is, if you have a string foobar and search for “foo” it will not match. This is different than grep, perl, and other tools. In other words, for Java, the following regexes are equivalent:

"foo"
"^foo$"

If you want to find foo within foobar you need to use

".*foo.*"

I discovered one more interesting tidbit. If you put explicit anchors in, leading and trailing parts of the regex are ignored.

Here are some examples:

// some tests which illustrate implicit anchoring
"foobar".matches("foo"); //false - rewrite = "^foo$"
"foobar".matches("bar"); //false - rewrite = "^bar$"
"foobar".matches("foo.*"); //true - rewrite = "^foo.*$"
"foobar".matches("bar.*"); //false - rewrite = "^bar.*$"
"foobar".matches(".*foo.*"); //true - rewrite = "^.*foo.*$"
"foobar".matches(".*bar.*"); //true - rewrite = "^.*bar.*$"
"foobar".matches(".*oo.*"); //true - rewrite = "^.*oo.*$"
// now some tests with optional characters before or after explicit anchors
// optional characters before or after initial/final anchors have no effect
"foobar".matches(".*^foo"); //false - rewrite = "^foo$"
"foobar".matches(".*^foo.*"); //true - rewrite = "^foo.*$"
"foobar".matches(".*^foo$.*"); //false - rewrite = "^foo$"
"foobar".matches(".*^foobar$.*"); //true - rewrite = "^foobar$"
"foobar".matches("[a-z]*^foobar$.*"); //true - rewrite = "^foobar$"
"foobar".matches(".+^foobar$.*"); //false can't match a character before the beginning of the string