This tutorial is a self-testing literate Haskell programme introducing the vanilla API of the regex package. There are other tutorials for explaining the more specialist aspects of regex and you can load them into into you Haskell REPL of choice: see the regex Tutorials page for details.
QuasiQuotes
as regex uses them to check that REs are well-formed at compile time.
{-# LANGUAGE QuasiQuotes #-}
If you are trying out examples interactively at the ghci prompt then you will need
:seti -XQuasiQuotes
Before importing the regex
API into your Haskell script you will need to answer two questions:
Which flavour of REs do I need? If you need Posix REs then the TDFA
is for you, otherwise it is the PCRE back end, which is housed in a seperate regex-with-pcre
package.
Which Haskell type is being used for the text I need to match? This can influence as, at the time of writing, the PCRE
regex
back end does not support theText
types.
The import statement will in general look like this
import Text.RE.<back-end>.<text-type>
As we have no interest in Posix/PCRE distinctions or performance here, we have chosen to work with the TDFA
back end with String
types.
import Text.RE.TDFA.String
You could also import Text.RE.TDFA
or Text.RE.PCRE
to get an API in which the operators are overloaded over all text types accepted by each of these back ends: see the Tools Tutorial for details.
Match
with ?=~
The regex API provides two matching operators: one for looking for the first match in its search string and the other for finding all of the matches. The first-match operator, ?=~
, yields the result of attempting to find the first match.
(?=~) :: String -> RE -> Match String
The boolean matched
function,
matched :: Match a -> Bool
can be used to test whether a match was found:
> matched $ "2016-01-09 2015-12-5 2015-10-05" ?=~ [re|[0-9]{4}-[0-9]{2}-[0-9]{2}|]
ghciTrue
To get the matched text use matchText
,
matchedText :: Match a -> Maybe a
which returns Nothing
if no match was found in the search string:
> matchedText $ "2016-01-09 2015-12-5 2015-10-05" ?=~ [re|[0-9]{4}-[0-9]{2}-[0-9]{2}|]
ghciJust "2016-01-09"
> matchedText $ "2015-12-5" ?=~ [re|[0-9]{4}-[0-9]{2}-[0-9]{2}|]
ghciNothing
Matches
with *=~
Use *=~
to locate all of the non-overlapping substrings that match a RE,
(*=~) :: String -> RE -> Matches String
anyMatches :: Matches a -> Bool
anyMatches
can be used to determine if any matches were found
> anyMatches $ "2016-01-09 2015-12-5 2015-10-05" *=~ [re|[0-9]{4}-[0-9]{2}-[0-9]{2}|]
ghciTrue
countMatches
will tell us how many sub-strings matched:
> countMatches $ "2016-01-09 2015-12-5 2015-10-05" *=~ [re|[0-9]{4}-[0-9]{2}-[0-9]{2}|]
ghci2
matches
will return all of the matches.
matches :: Natches a -> [a]
> matches $ "2016-01-09 2015-12-5 2015-10-05" *=~ [re|[0-9]{4}-[0-9]{2}-[0-9]{2}|]
ghci"2016-01-09","2015-10-05"] [
regex
Macros and Parsersregex supports macros in regular expressions. There are a bunch of standard macros that you can just use, and you can define your own.
RE macros are enclosed in@{
… ‘}’. By convention the macros in the standard environment start with a ‘%’. @{%date}
will match an ISO 8601 date, this
> countMatches $ "2016-01-09 2015-12-5 2015-10-05" *=~ [re|@{%date}|]
ghci2
will pick out the two dates.
There are also parsing functions for analysing the matched text. The@{%string}
macro will match quoted strings (in which double quotes can be escaped with backslashes in the usual way) and its companion parseString
function will extract the string that was being quoted, interpreting any escaped double quotes:
> map parseString $ matches $ "\"foo\", \"bar\" and a quote \"\\\"\"" *=~ [re|@{%string}|]
ghciJust "foo",Just "bar", Just "\""] [
See the macro tables page for details of the standard macros and their parsers.
See the testbench tutorial for more on how you can develop, document and test RE macros with the regex test bench.
If you need to edit a string then SearchReplace
[ed|
… |]
templates can be used with ?=~/
to replace a single instance or *=~/
to replace all matching instances.
> "0000 40AA fab0" ?=~/ [ed|${adr}([0-9A-Fa-f]{4}):?///0x${adr}:|]
ghci"0x0000: 40AA fab0"
> "0000: 40AA fab0" *=~/ [ed|[0-9A-Fa-f]{4}///0x$0|]
ghci"0x0000: 0x40AA 0xfab0"
> countMatches $ "0a\nbb\nFe\nA5" *=~ [re|[0-9a-f]{2}$|]
ghci2
will find 2 matches, the ‘$’ anchor matching each of the newlines, but only the first two lowercase hex numbers matching the RE. The case sensitivity and multiline-ness can be controled by selecting alternative parsers.
long name | short forms | multiline | case sensitive |
---|---|---|---|
reMultilineSensitive | reMS, re | yes | yes |
reMultilineInsensitive | reMI | yes | no |
reBlockSensitive | reBS | no | yes |
reBlockInsensitive | reBI | no | no |
> countMatches $ "0a\nbb\nFe\nA5" *=~ [reMultilineSensitive|[0-9a-f]{2}$|]
ghci2
> countMatches $ "0a\nbb\nFe\nA5" *=~ [reMultilineInsensitive|[0-9a-f]{2}$|]
ghci4
> countMatches $ "0a\nbb\nFe\nA5" *=~ [reBlockSensitive|[0-9a-f]{2}$|]
ghci0
> countMatches $ "0a\nbb\nFe\nA5" *=~ [reBlockInsensitive|[0-9a-f]{2}$|]
ghci1
finds the final match.
For the hard of typing the shortforms are available.> matched $ "SuperCaliFragilisticExpialidocious" ?=~ [reMI|supercalifragilisticexpialidocious|]
ghciTrue
It is possible to compile a dynamically aquired RE string at run-time using compileRegex
:
compileRegex :: (Functor m, Monad m) => String -> m RE
> matches $ "2016-01-09 2015-12-5 2015-10-05" *=~ (maybe (error "evalme_CPL_01") id $ compileRegex "[0-9]{4}-[0-9]{2}-[0-9]{2}")
ghci"2016-01-09","2015-10-05"] [
These will compile the RE using the default multiline, case-sensitive options, but you can specify the options dynamically using compileRegexWith
:
compileRegexWith :: (Functor m, Monad m) => SimpleREOptions -> String -> m RE
where SimpleREOptions
is a simple enumerated type.
-- | the default API uses these simple, universal RE options,
-- which get auto-converted into the apropriate back-end 'REOptions_'
data SimpleREOptions
= MultilineSensitive -- ^ case-sensitive with ^ and $ matching the start and end of a line
| MultilineInsensitive -- ^ case-insensitive with ^ and $ matsh the start and end of a line
| BlockSensitive -- ^ case-sensitive with ^ and $ matching the start and end of the input text
| BlockInsensitive -- ^ case-insensitive with ^ and $ matching the start and end of the input text
deriving (Bounded,Enum,Eq,Ord,Show)
> matches $ "2016-01-09 2015-12-5 2015-10-05" *=~ (maybe (error "evalme_CPL_01") id $ compileRegexWith MultilineSensitive "[0-9]{4}-[0-9]{2}-[0-9]{2}")
ghci"2016-01-09","2015-10-05"] [
If you need to compile SearchReplace
templates for use with ?=~/
and *=~/
then the compileSearchReplace
and compileSearchReplaceWith
,
compileSearchReplace :: (Monad m, Functor m, IsRegex RE s) => String -> String -> m (SearchReplace RE s)
compileSearchReplaceWith :: (Monad m, Functor m, IsRegex RE s) => SimpleREOptions -> String -> String -> m (SearchReplace RE s)
work analagously to compileRegex
and compileRegexWith
, with the RE and replacement template (either side of the ‘///’ in the [ed|...///...|]
quasi quoters) being passed into these functions in two separate strings, to compile to the SearchReplace
type expected by the ?=~/
and *=~/
operators.
-- | contains a compiled RE and replacement template
data SearchReplace re s =
SearchReplace
getSearch :: !re -- ^ the RE to match a string to replace
{ getTemplate :: !s -- ^ the replacement template with ${cap}
,-- used to identify a capture (by number or
-- name if one was given) and '$$' being
-- used to escape a single '$'
}deriving (Show)
The escape
and escapeWith
functions are special compilers that compile a string into a RE that should match itself, which is assumed to be embedded in a complex RE to be compiled.
escape :: (Functor m, Monad m) => (String->String) -> String -> m RE
The function pased in the first argument to escape
takes the RE string that will match the string passed in the second argument and yields the RE to be compiled, which is returned from the parsing action.
> "fooe{0}bar" *=~/ SearchReplace (maybe (error "evalme_CPL_03") id $ escape id "e{0}") ""
ghci"foobar"
=~
and =~~
match operators are still available for those that have mastered them.
> "bar" =~ [re|(foo|bar)|] :: Bool
ghciTrue
> "quux" =~ [re|(foo|bar)|] :: Bool
ghciFalse
> "foobar" =~ [re|(foo|bar)|] :: Int
ghci2
> "foo" =~~ [re|bar|] :: Maybe String
ghciNothing