Version 17 of cmdSplit

Updated 2013-04-15 04:49:38 by pooryorick

Summary

[cmdSplit], by dgp, parses a script into its constituent commands while properly handling semicolon-delimited commands and the "semicolon in a comment" problem. It was written to support parsing of class bodies in an itcl-like, pure Tcl, OO framework into Tcl commands.

See Also

cmdStream
Config file using slave interp, by AMG
more-or-less the same thing, implemented using a slave interpreter

Description

[cmdSplit] returns a list of the commands in a script. The original post is How to split a string into elements exactly as eval would do Options ,comp.lang.tcl ,1998-09-07 .

proc cmdSplit {body} {
    set commands {}
    set chunk ""
    foreach line [split $body "\n"] {
        append chunk $line
        if {[info complete "$chunk\n"]} {
            # $chunk ends in a complete Tcl command, and none of the
            # newlines within it end a complete Tcl command.  If there
            # are multiple Tcl commands in $chunk, they must be
            # separated by semi-colons.
            set cmd ""
            foreach part [split $chunk ";"] {
                append cmd $part
                if {[info complete "$cmd\n"]} {
                    set cmd [string trimleft $cmd]
                    # Drop empty commands and comments
                    if {![string match {} $cmd] \
                            && ![string match \#* $cmd]} {
                        lappend commands $cmd
                    }
                    if {[string match \#* $cmd]} {
                        set cmd "\#;"
                    } else {
                        set cmd ""
                    }
                } else {
                    # No complete command yet.
                    # Replace semicolon and continue
                    append cmd ";"
                }
            }
            set chunk ""
        } else {
            # No end of command yet.  Put the newline back and continue
            append chunk "\n"
        }
    }
     if {![string match {} [string trimright $chunk]]} {
        return -code error "Can't parse body into a\
                sequence of commands.\n\tIncomplete\
                command:\n-----\n$chunk\n-----"
    }
    return $commands
}

wordSplit

Sarnold: [wordSplit] takes a command and returns its arguments as a list.

proc wordSplit {command} {
    if {![info complete $command]} {error "non complete command"}
    set res ""; # the list of words
    set chunk ""
    foreach word [split $command " \t"] {
        # testing each word until the word being tested makes the
        # command up to it complete
        # example:
        # set "a b"
        # set -> complete, 1 word
        # set "a -> not complete
        # set "a b" -> complete, 2 words
        append chunk $word
        if {[info complete "$res $chunk"]} {
            lappend res $chunk
            set chunk ""
        } else {
            append chunk " "
        }
    }
    set res
}

aspect: forgive my foolishness, but what is [wordSplit] for? From the description it sounds like [wordSplit $command] == [lrange $command 1 end] but it seems to do something different. If you want the elements of $command as a list, just use $command!

AMG: [wordSplit] splits an arbitrary string by whitespace, then attempts to join the pieces according to the result of [info complete]. This results in a list in which each element embeds its original quote characters. Since an odd number of trailing backslashes doesn't cause [[info complete] to return false, [wordSplit doesn't correctly recognize backslashes used to quote spaces.

I agree that [wordSplit] doesn't appear to serve a useful purpose. Its input should already be a valid, directly usable list.

aspect: it also does strange things if there are consecutive spaces in the input. "each element embeds its original quote characters" seems to be the important characteristic, but I can't think of a use-case where this would be desirable .. hoping that Sarnold can elaborate on his original intention so the example can be focussed (and corrected?).