Skip to main content

Regular Expressions

A regular expression is notation for specifying a set of strings. e.g., the set of all valid email addresses or the set of all binary strings with  an even number of 1s.

There are five basic operations for creating regular expressions, as below,

OperationRegular ExpressionYesNo
Concatenationaabaabaabaabevery other string
Logical OR
(Alternation)
aa | baabaa
baab
every other string
Replication
(Kleene closure)
ab*aaa
aba
abba
ε
ab
ababa
Groupinga(a|b)aabaaaab
abaab
every other string
Wildcarda..aabba
abaa
aa
aaaaa
  • Concatenation: the simplest type of regular expression is formed by concatenating a bunch of symbols together, one after the other, like aabaab. This regular expression matches only the single stringaabaab. We can perform simple spell checking by using the concatenation operation. For example, we could form the regular expression niether and then for each word in a dictionary check whether that word matches the regular expression. Presumably no word in the dictionary would match, and we would conclude that niether is misspelled.
  • Logical OR: the logical OR operator enables us to choose from one of several possibilities. For example, the regular expression aa | baab matches exactly two strings aa and baab. Many spam filters (e.g., SpamAssassin) work by searching for a long list of common spamming terms. They might form a regular expression such as AMAZING | GUARANTEE | Viagra. The logical OR operator enables us to specify many strings with a single regular expression. For example, if our phone number is 734-8527, we might like to know whether it spells out any word on the phonepad (2 = abc, 3 = def, 4 = ghi, 5 = jkl, 6 = mno, 7 = prs, 8 = tuv, 9 = wxy). The following regular expression specifies all of the 3^7 possible combinations (p|r|s)(d|e|f)(g|h|i)(t|u|v)(j|k|l)(a|b|c)(p|r|s). It turns out that the only English word that matches is the word regular. (Replace this example with decoding an IM message that uses the "phone code.")
  • Replication: the replication operator enables us to specify infinitely many possibilities. For example, the regular expression ab*a matches aaabaabbaabbba, an so forth. Note that 0 replications of b are permitted.
  • Grouping: the grouping operator enables us to specify precedence to the various operators. The replication operator has the highest precedence, then concatenation, then logical OR. If we want to specify the set of strings aabaababaabababa, and so forth, we must write (ab)*a to indicate that the ab pattern must be replicated together.
  • Wildcard: the wildcard symbol matches exactly one occurrence of any single character.


The first four basic operations above (concatenation, logical or, replication, grouping) are the theoretical minimum needed to describe regular expressions. Most programming environments support additional operations for convenience (including the wildcard operation), and Java is no exception. The table below includes some of the highlights.

OperationJava Regular ExpressionYesNo
One or morea(bc)+deabcde
abcbcde
ade
abc
Once or not at alla(bc)?deade
abcde
abc
abcbcde
Character classes[a-m]*blackmail
imbecile
above
below
Negation of character classes[^aeiou]b
c
a
e
Exactly N times[^aeiou]{6}rhythm
syzygy
rhythms
allowed
Between M and N times[a-z]{4,6}spider
tiger
jellyfish
cow
Whitespace characters[a-z\s]*hellohello
say hello
Othello
2hello


Comments

Popular posts from this blog

Stretch a row if data overflows in jasper reports

It is very common that some columns of the report need to stretch to show all the content in that column. But  if you just specify the property " stretch with overflow' to that column(we called text field in jasper report world) , it will just stretch that column and won't change other columns, so the row could be ridiculous. Haven't find the solution from internet yet. So I just review the properties in iReport one by one and find two useful properties(the bold  highlighted in example below) which resolve the problems.   example: <band height="20" splitType="Stretch" > <textField isStretchWithOverflow="true" pa...

Live - solving the jasper report out of memory and high cpu usage problems

I still can not find the solution. So I summary all the things and tell my boss about it. If any one knows the solution, please let me know. Symptom: 1.        The JVM became Out of memory when creating big consumption report 2.        Those JRTemplateElement-instances is still there occupied even if I logged out the system Reason:         1. There is a large number of JRTemplateElement-instances cached in the memory 2.     The clearobjects() method in ReportThread class has not been triggered when logging out Action I tried:      About the Virtualizer: 1.     Replacing the JRSwapFileVirtualizer with JRFileVirtualizer 2.     Not use any FileVirtualizer for c...

JasperReports - Configuration Reference

Data Source / Query Executer net.sf.jasperreports.csv.column.names.{arbitrary_name} net.sf.jasperreports.csv.date.pattern net.sf.jasperreports.csv.encoding net.sf.jasperreports.csv.field.delimiter net.sf.jasperreports.csv.locale.code net.sf.jasperreports.csv.number.pattern net.sf.jasperreports.csv.record.delimiter net.sf.jasperreports.csv.source net.sf.jasperreports.csv.timezone.id net.sf.jasperreports.ejbql.query.hint.{hint} net.sf.jasperreports.ejbql.query.page.size net.sf.jasperreports.hql.clear.cache net.sf.jasperreports.hql.field.mapping.descriptions net.sf.jasperreports.hql.query.list.page.size net.sf.jasperreports.hql.query.run.type net.sf.jasperreports.jdbc.concurrency net.sf.jasperreports.jdbc.fetch.size net.sf.jasperreports.jdbc.holdability net.sf.jasperreports.jdbc.max.field.size net.sf.jasperreports.jdbc.result.set.type net.sf.jasperreports.query.chunk.token.separators net.sf.jasperreports.query.executer.factory.{language} net.sf.jasperreports.xpath....