'''Tcl Regular Expression Match Requirements''' describes a set of [regular expressions], and what, ideally, they would match. ** Description ** Tcl's regular expression engine has a particular design that leads to some unexpected results. The collection below of regular expression applications on this page and their ideal results is intended as a guide for the further development of regular expression routines in Tcl. ** To Do ** On 2015-09-21 [https://core.tcl-lang.org/tcl/info/c8dfe06653dbef5d%|%A significant patch] to `regcomp.c` was contributed by the [postgresql] project. See [https://core.tcl-lang.org/tcl/info/1115587%|%Regexp backreference fail with a * closure] and [https://core.tcl-lang.org/tcl/info/0e0e150e49%|%Fix for quantified regexp back-references]. One of premises behind the patch is that ======none (a*)+ ====== can be understood as ======none (?:a*)*(a*) ====== . Due to this, the following expressions behave differently: ======none % regexp -indices -inline {(a*)*} aaa {0 2} {0 2} % regexp -indices -inline {(a*)+} aaa {0 2} {3 2} ====== The difference is that in the second expression, `(a*)` manages to capture nothing because it matches the empty string after `aaa`. Identify which of the behaviours listed below are due to this patch. See also, [https://www.postgresql.org/message-id/16133-a8934caee4e53035%40postgresql.org%|%Regexp quantifier issues], pgsql-bugs, 2019-11-22. ** `(a*)(b*?)` ** Ideal: ======none % regexp -indices -inline {(a*)(b*?)} aaaabbbb {0 7} {0 3} {4 3} ====== Actual: ======none % regexp -indices -inline {(a*)(b*?)} aaaabbbb {0 7} {0 3} {4 7} ====== ** `(t*?)?` ** Ideal: ======none % regexp -inline -indices {(t*?)?} ttt {0 -1} {0 -1} ====== Actual: ======none % regexp -inline -indices {(t*?)?} ttt {0 2} {0 2} ====== ** `^(a*)+$` ** Ideal: ======none % regexp -indices -inline {^(a*)+$} aaa {0 2} {0 2} ====== Actual: ======none % regexp -indices -inline {^(a*)+$} aaa {0 2} {3 2} ====== ** `.*(a*){1,3}?` ** Ideal and actual: ======none regexp -indices -inline {.*(a*){1,3}?} aaaa {0 3} {4 3} ====== ** `(a.*?f)*` ** If there is a quantifier on a capturing expression, it should return a list of matches: Ideal: ======none % regexp -indices -inline {(a.*?f)*} aaafaaafjkl {0 7} {{0 3} {4 7}} ====== Actual: ======none % regexp -indices -inline {(a.*?f)*} aaafaaafjkl {0 7} {4 7} ====== Ideal: ======none % regexp -indices -inline {(a*[^a])+} aaabbaacaa {0 7} {5 7} ====== Actual: ====== % regexp -indices -inline {(a*[^a])+} aaabbaacaa {0 7} {{0 3} {5 7}} ====== ** `((a*)+)` ** Ideal: ======none % regexp -indices -inline {((a*)+)} aaa {0 2} {0 2} {0 2} ====== Actual: ======none % regexp -indices -inline {((a*)+)} aaa {0 2} {0 2} {3 2} ====== ** `(?:a*b)+c` ** Ideal: ======none % regexp -indices -inline {(?:a*b)+c} aaaabbbbcc {7 8} ====== Actual: ======none % regexp -indices -inline {(?:a*b)+c} aaaabbbbcc {0 8} ====== ** If the First Branch is Greedy all Branches are Greedy ** Ideally, the greediness of a branch would not affect another branch: ======none % regexp -indices -inline {z*|(a*?)(r+)} aaaarr {0 4} {0 3} {4 4} ====== But currently, if the First branch is greedy all branches are greedy: ======none % regexp -indices -inline {z*|(a*?)(r+)} aaaarr {0 5} {0 3} {4 5} ====== ** Page Authors ** [pyk]: <> regular expressions