“Some people, when confronted with a problem, think ‘I know, I’ll use regular expressions.’ Now they have two problems.”
—Jamie Zawinski
a - match a
ab - match a followed by b
a*
- match a 0 or more times
a+
- match a 1 or more times
a?
- match a 0 or 1 times
a{n}
match a n times.
a{n,m}
match a between n and m times, inclusive
.
- any character
[abc]
- one of the characters 'a'
, 'b'
, or 'c'
.
\s
- any whitespace character
\S
- any non-whitespace character
\w
- any word character (letter or digit)
\d
- any digit character
When we write regexp expressions containing \
in Java strings we need to escape the \
. So "\\s"
rather than "\s"
.
Yeah, that’s annoying.
String.split
"abc def xyz".split("\\s+") ⟹ { "abc", "def", "xyz" }
Split the string at each place the regexp matches.
String.match
"510-867-5309".matches("\\d{3}-\\d{3}-\\d{4}") ⟹ true
Test whether the whole string matches the regexp.
java.util.regex.*
Pattern p = Pattern.compile("\\d{3}-\\d{3}-\\d{4}");
Matcher m = p.matcher("510-867-5309");
m.matches() ⟹ true
Pattern
is a thing that defines a pattern we want to use in matching.
Matcher
is an object that combines a Pattern
with an actual String
to match against.
// Look for successive matches of the pattern
while (m.find()) {
System.out.println(m.group()); // prints what matched
}
The Matcher
is a very stateful object. Each call to find searches for the next occurrence of the pattern and m.group()
returns the text of the last match.
Pattern p = Pattern.compile("(\\d{3})-(\\d{3})-(\\d{4})");
Matcher m = p.matcher("510-867-5309");
m.matches() ⟹ true
m.group() ⟹ "510-867-5309"
m.group(1) ⟹ "510"
m.group(2) ⟹ "867"
m.group(3) ⟹ "5309"
Parenthesized sections of the pattern create “capture groups” that we can use to extract parts of what matched.
Matcher
with streamsm = p.matcher(someBigBlobOfText);
Stream<MatchResult> rs = m.results();
List<String> numbers = rs.map(MatchResult::group).toList();
The MatchResult
object has the same methods as Matcher
for getting the results of the match such as group
.
The code above computes a list of all the telephone numbers in someBigBlobOfText
.
toUnary
public String toUnary(int n) {
return "1".repeat(n);
}
Translates a Java int
to a unary number represented as a String
.
toUnary(5) ⟹ "11111"
public boolean isPrime(String num) {
return !num.matches(".?|(..+)\\1+");
}
If we wanted this to be more efficient, we’d use Pattern.compile
to prepare the pattern once, outside the method.
But if we wanted this to be efficient we wouldn’t be doing it this way at all because it’s ridiculous.
jshell> IntStream.range(0, 20)
...> .filter(n -> isPrime(toUnary(n)))
...> .forEach(System.out::println)
2
3
5
7
11
13
17
19