This tutorial is a literate Haskell program whwre we start by specifying the language pragmas and imports we will need for this module.
{-# LANGUAGE QuasiQuotes #-}
For this module we will work with the PCRE nativeoptions which are based on bit masks, so Data.Bits
will be needed.
Text.RE.REOptions
provides the generic regex types and functions for handling options, regardless of the selected back end.
Note that we import the PCRE String
APi Text.RE.PCRE.String
and the general regex PCRE back end, Text.RE.PCRE
, needed for the the types and functions is supplies for accessing the PCRE native options. We could have just imported Text.RE.PCRE
but it is useful to see which extra types and functions being used from this module (they will be qualified with PCRE.
).
We also import Text.Regex.PCRE
from the regex-pcre
package for the native pcre-regex types and functions themselves.
import Data.Bits
import qualified Text.RE.PCRE as PCRE
import Text.RE.PCRE.String
import Text.RE.REOptions
import Text.Regex.PCRE
regex-tools
back end — TDFA and PCRE — has it own complile-time options and execution-time options, called in each case CompOption
and ExecOption
. The SimpleREOptions selected with the RE parser, e.g.,
> countMatches $ "0a\nbb\nFe\nA5" *=~ [reBlockInsensitive|[0-9a-f]{2}$|]
ghci1
REOptions
type, defined by each of the back ends in terms of the REOptions_
type of Text.RE.REOptions
as follows.
type REOptions = REOptions_ RE CompOption ExecOption
(Bear in mind that RE
, CompOption
and ExecOption
are defined differently in the TDFA and PCRE back ends.)
The REOptions_
type is defined in Text.RE.REOptions
as follows:
-- | the general options for an RE are dependent on which back end is
-- being used and are parameterised over the @RE@ type for the back end,
-- and its @CompOption@ and @ExecOption@ types (the compile-time and
-- execution time options, respectively); each back end will define an
-- @REOptions@ type that fills out these three type parameters with the
-- apropriate types (see, for example, "Text.RE.TDFA")
data REOptions_ r c e =
REOptions
optionsMacs :: !(Macros r) -- ^ the available TestBench RE macros
{ optionsComp :: !c -- ^ the back end compile-time options
, optionsExec :: !e -- ^ the back end execution-time options
,
}deriving (Show)
optionsMacs
contains the macro definitions used to compile the REs (see the test bench tutorial for details on how to define your own macro environments);
optionsComp
contains the back end compile-time options;
optionsExec
contains the back end execution-time options.
(For more information on the options provided by the back ends see the decumentation for the regex-tdfa
and regex-pcre
packages as apropriate.)
compileRegexWithOptions :: (IsOption o, Functor m, Monad m)
=> o
-> String
-> m RE
where o
is one of the following RE-configuring types:
()
(the unit type), representing the default multi-line case-sensitive used with the re
parser.
SimpleREOptions
(explained in the main tutorial), which will be converted into the apropriate CompOption
and ExecOption
for the beck end in question);
CompOption
to directly specify the compile-time options for the back end;
ExecOption
to specify the execution-time options for the back end;
Macros RE
to specify the alternative macros to use instead of the standard environment;
REOptions
to specify all together, the back-end options and the macro table to use.
check :: Maybe a -> a
= maybe (error "booyah") id check
> countMatches $ "0a\nbb\nFe\nA5" *=~ check (compileRegexWith BlockInsensitive "[0-9a-f]{2}$")
ghci1
This will allow you to compile regular expressions when the either the text to be compiled or the options have been dynamically determined.
If you need to build SearchReplace
templates then there is an analagous compilation function for that:
compileSearchReplaceWithOptions :: (Monad m,Functor m,IsRegex RE s)
=> REOptions
-> String
-> String
-> m (SearchReplace RE s)
re_
and ed_
re_
quasi quoter, which yields a function takes an option type and returns the RE compiled with the given options:
> countMatches $ "0a\nbb\nFe\nA5" *=~ [re_|[0-9a-f]{2}$|] BlockInsensitive
ghci1
Any option o
such that IsOption o RE CompOption ExecOption
(i.e., any option type accepted by compileRegex
above) can be used with [re_
… |]
.
The [ed_
… ///
… |]
for compiling SearchReplace
templates works analagously, yielding a function that takes an option type and returns the SearchReplace
template comoiled with those RE options.
The function unpackSimpleREOptions
, used to generate PCRE native options from the generic SimpleREOptions
is defined like this. (We have made some minor organizational changes for this presentaion, but this is equivalent to the library code used for PCRE.unpackSimpleREOptions
.)
unpackSimpleREOptions :: SimpleREOptions -> PCRE.REOptions
=
unpackSimpleREOptions sro REOptions
= PCRE.prelude -- the standard 'prelude' macro environment
{ optionsMacs = comp -- our calculated PCRE compile options
, optionsComp = defaultExecOpt -- the default PCRE run-time options
, optionsExec
}where
=
comp $
wiggle ml compMultiline
wiggle ci compCaseless
defaultCompOpt
= case sro of
(ml,ci) MultilineSensitive -> (,) True False
MultilineInsensitive -> (,) True True
BlockSensitive -> (,) False False
BlockInsensitive -> (,) False True
-- set or clear a PCRE option bit according to the
-- Bool in its first argument using the bit mask
-- passed in the second argument
wiggle :: Bits a => Bool -> a -> a -> a
True m v = v .|. m
wiggle False m v = v .&. complement m wiggle
Now we will set up a apecial set of PCRE options based on BlockInsensitive
, but with the PCRE DOTALL
option bit set.
myOptions :: PCRE.REOptions
=
myOptions
PCRE.defaultREOptions= wiggle True compDotAll $ optionsComp biOptions
{ optionsComp
}
biOptions :: PCRE.REOptions
= unpackSimpleREOptions BlockInsensitive biOptions
Now we can test myOptions
with the [re_| ... |]
quasi quoter as follows.
> matched $ "0a\nbbxFe&A5 " ?=~ [re_|^([0-9a-f]{2}.){4}$|] myOptions
ghciTrue
That test matched, but if we provide just BlockInsensitive
options set up in biOptions
above,
> matched $ "0a\nbbxFe&A5 " ?=~ [re_|^([0-9a-f]{2}.){4}$|] biOptions
ghciFalse
the match fails.