The Regex Tools Tutorial

Language Options and Imports

This tutorial is a literate Haskell program whwre we start by specifying the language pragmas and imports we will need for this module.

{-# LANGUAGE QuasiQuotes                      #-}
import qualified Data.ByteString.Lazy.Char8               as LBS
import           Data.List
import           Text.RE.Replace
import           Text.RE.TDFA.String
import           Text.RE.Tools

IsRegex, PCRE and TDFA

The IsRegex re tx provides regex methods for the RE type re (belonging to either the TDFA or PCRE back end) and a text type tx that the re back end accepts. The Text.RE.TDFA and Text.RE.PCRE API modules provide functions that work over all the text types, with the following match operators:

(*=~)  :: IsRegex RE s
       => s
       -> RE
       -> Matches s

(?=~)  :: IsRegex RE s
       => s
       -> RE
       -> Match s

(*=~/) :: IsRegex RE s => s -> SearchReplace RE s -> s

(?=~/) :: IsRegex RE s => s -> SearchReplace RE s -> s

General IsRegex Functions

The IsRegex class is located in Text.RE.Tools.IsRegex:

Using these functions you can write your own regex tools. As a trivial example we will define fully overloaded regex match operators as follows.
(?=~%) :: IsRegex re s => s -> re -> Match s
(?=~%) = flip matchOnce

(*=~%) :: IsRegex re s => s -> re -> Matches s
(*=~%) = flip matchMany
ghci> matched $ (LBS.pack "2016-01-09 2015-12-5 2015-10-05") ?=~% [re|[0-9]{4}-[0-9]{2}-[0-9]{2}|]
ghci> countMatches $ (LBS.pack "2016-01-09 2015-12-5 2015-10-05") *=~% [re|[0-9]{4}-[0-9]{2}-[0-9]{2}|]

regex provides some classic tools that have quickly proven themselves in the examples and scripts used to maintain regex itself.

The regex Tools

The classic tools assocciated with regular expressions have inspired some regex conterparts.

These tools are built on top of the core library and act as good examples of how to use the regex library as well as useful tools.

The following sections will present some of the internal library code used to build the tools as well as some code from the example programs. These fragments work best as starting points for studying these tools.

Sed and Edit

Edits scripts are applied to each line of the text by the sed functions.

sed' applies the script in its first argument to each line in the text in its second argument.

The sed' function is used to build the include processor in the TestKit utility modules used by the example scripts and programs. To filter lines to exclude the grepFilter function is used.


The grepFilter function takes an RE and a text and returns the result of matching the RE to every line in the file.

The sortImports utility in the TestKit utility module used by the scripts and example programs. It uses grep to sort all of the imports by the name of the module in a single block located at the position of the first import statement in the module, where each import statement is in a standard form matched by the regex

[re|^import +(qualified|         ) ${mod}([^ ].*)$|]
We have reproduced sortImports under the name sortImports_ here.
sortImports_ :: LBS.ByteString -> LBS.ByteString
sortImports_ lbs =
    LBS.unlines $ map (matchesSource . getLineMatches) $
      hdr ++ sortBy cMp bdy
    cMp ln1 ln2 = case (extr ln1,extr ln2) of
        (Nothing,Nothing) -> EQ
        (Nothing,Just _ ) -> GT
        (Just _ ,Nothing) -> LT
        (Just x ,Just  y) -> compare x y

    extr ln = case allMatches $ getLineMatches ln of
      mtch:_  -> mtch !$$? [cp|mod|]
      _       -> Nothing

    (hdr,bdy) = span (not . anyMatches . getLineMatches) lns
    lns       = grepFilter rex lbs
    rex       = [re|^import +(qualified )? *${mod}([^ ].*)$|]
ghci> sortImports_ $ LBS.pack "-- preamble\nimport qualified Data.Text as T\nimport Data.List\n-- done\n"
"-- preamble\nimport Data.List\nimport qualified Data.Text as T\n-- done\n"


The Lex toolkit can be used for quickly knocking together scanners that do not need to be efficient.

It has been used in the library to scan REs so that the captures can be picked out, numbered and that number associated with a name where one has been given.


The findMatches_ function lists all of the files in a directort tree that match an RE.

It is used by the re-sort-imports program to discover all of the Haskell scripts in the regex source tree and sort their import statements into a standard order (ultimately using the above-mentioned sortImport function).