How to debug memory faults in Tcl and extensions

The two primary tools for [debugging] memory issues are `[memory]` and
[valgrind].  Tcl and [extension%|%extensions] based on [sampleextension]
provide a `valgrind` target which executes the test suite under Valgrind's
"memcheck" tool.



** See Also **

   [Hacking on The Core]:   Contains information useful to extension writers as well.



** Using Tcl Memory Debugging **

When Tcl is compiled with `--enable-symbols=mem`, it places guards that are
sensitive to misuse around allocated memory.  This means that some errors which
might not normally produce segmentation faults now will, making it easier to
track them down.

A Tcl built with `--enable-symbols=mem` also includes `[memory]`, which can
be used to investigate memory issues.



** Using Valgrind **

   * The block allocator ("zippy") is very efficient, but also efficiently hides all individual Tcl_Obj leaks from valgrind... So it is important to use `-DPURIFY`, which switches back to a per-object malloc scheme.
   * Compile with `--enable-symbols` but without `--enable-symbols=mem|all`, which instrument allocation in such a way that many things that should appear as "definitely lost" show up as "still reachable" because they are all linked through the user-level heap metadata.

   * As of version [Changes in Tcl/Tk 8.6%|%8.6], for better performance and stability (fewer deadlocks on exit), `[exit]` does not bother to perform all the cleanup chores since the operating system is going to shortly clean up the process anyway (see [https://core.tcl-lang.org/tcl/tktview?name=2001201%|%issue 2001201]).  As of 2011-08-09, `-DPURIFY` enables full cleanup on exit so that Valgrind doesn't issue hundres of "still reachable" reports due to lack of cleanup. Whatever valgrind reports after that is a true leak.

Given these facts, the typical commands to build Tcl for Valgrinding purposes
are:

======none
./configure CFLAGS=-DPURIFY --enable-symbols && make clean && make
valgrind
======

`tests/all.tcl` replaces `[exit]` with an empty procedure.  This makes the
Valgrind reports cleaner.  To do the same thing when debugging your own Tcl
script:

======
proc exit args {}
======

According to [de], on [POSIX] systems, ensuring that the `LANG` environment variable is set to `POSIX` may reduce some debugging "noise":

======
export LANG=POSIX
======

`tools/valgrind_suppress` specifies call chains that allocate memory which is
intended to remain allocated for the lifetime of the program, and also memory
allocations that are not Tcl's concern.  These allocations are ignored by
Valgrind when reporting potential issues.


** Using Address Sanitizer **

https://github.com/google/sanitizers%|%Address Sanitizer%|% can provide a good picture of memory errors, but you need to build with PURIFY and no memory debugging.  The following should do the trick:

======
export CFLAGS="-DPURIFY -fsanitize=address"
export LDFLAGS="-lasan"
../unix/configure --enable-symbols
make tcltest
======

On Fedora, be sure to install the `compiler-rt` package first.  Other sanitizers can also be enabled with similar flags.


** Other Tools **

'''[Rolf Ade] 2002-04-14:'''

For linux, the mostly praised purify (and, as far as I'm aware, also
most of the other commercial memory debugger) is not
available; but Insure++ is, and it's plenty good enough.
http://www.cs.colorado.edu/~zorn/MallocDebug.html lists
some memory debugging tools (commercial ones and free ones); but then,
so does http://phaseit.net/claird/comp.software.testing/mem_test.html, and
the latter is more current.

I've tried some of the free tools and ended up in using mpatrol http://www.cbmamiga.demon.co.uk/mpatrol/, because it not only provides extensive debugging, profiling and tracing capabilities to help fix memory allocation errors, but also can help pinpoint memory leaks with their associated symbolic stack [traceback]s. Another advantage is you don't need to add special memory debugging related code to your code and that you don't have to link against some special libraries. The biggest drawback is that, depending on which debugging options you use, the speed of your application may slow down ''dramatically'', so that memory debugging of the core or of an extension with a long running script may become very time expensive (or even virtually impossible).

(The following is as of tcl8.4a4 and on Linux.) If you really want to memory debug the tcl core, you should build tcl with the -DPURIFY define (as mentioned above), with debugging symbols included and statically linked.

To do this, cd to the unix directory of your tcl source distribution. If you have the results of a prior compilation laying around, first do a

======none
make clean
======

Then do

======none
./configure --enable-symbols --disable-shared
======

After configure has finished, edit the produced Makefile. Search for CFLAGS, and add -DPURIFY to it.

After that, do the usual

======none
make
======

This should result in a (because of the static build unusual big) tclsh binary, that runs without dependences to libtcl, from anywhere, you move it. You don't need to do a ''make install'', nor you should do this, because this would overwrite your normal tcl installation. (Or, if you insist in installing your static tclsh build, use the ''--prefix'' option in the ./configure call above, to point the installation to a location thats comfortable for you.)

If your mpatrol installation was successful (for how to do that, see the really extensive mpatrol documentation, included in the distribution), you now could start debugging. Do

======
mpatrol --dynamic /path/to/your/static/build/tclsh testscript.tcl
======

This does produce a log file, named 'mpatrol.<processID>.log', in which you find the debugging information. Either with additional options or via a config file you could customize, what memory related operations mpatrol checks and / or logs. For example, if you want to check for memory leaks, you could use:

======
mpatrol -g --dynamic --show-unfreed --leak-table tclsh testscript.tcl
======

This gives you a log file, with a summary of all not freed memory and a stack trace of every not freed allocation. To get the exact lines of code for the calls in the stack trace, use the 'mpsym' tool, that's included in the mpatrol distribution (you need to have gdb installed, for this to work):

======
mpsym /path/to/your/static/build/tclsh testscript.tcl <mpatrol-log>
======

See the extensive mpatrol documentation, for which memory debugging options are available and what they do.


More probably, you will debug this great C coded tcl extension, you're writing (or this third party extension, that causes you trouble with your scripts).

For that, you only need a static tcl build (as described above), if you are suspect, that your use of some tcl API's has triggered a otherwise not spotted memory problem of the core - which is, with all respect, not that likely. If a tclsh with an extension loaded crashes or seems to leek memory, it may be wise, first to suspect the extension code. (If your analysis really shows, that the problem is in deed in the tcl core, don't hesitate to fill up a bug report. But first, do really analyze, and try hard, to track it down, and don't forget, to double check.)

Even without the need of a static tcl build it is helpful to compile tcl with the above mentioned -DPURIFY. This reduces the "noise" of false alarms and may help you, to detect your real problem(s) faster.

You get the most useful debugging information, if you use a custom tclsh with the extension in question compiled in. Even in these days of loadable extensions (which are normally recommended!) a lot, if not all extensions have a build target, that build such a custom tclsh. (And at least just for this debugging needs, extension maintainer should still provide such a build target.) Also helpful is, of course, to build the custom tclsh with debugging information included (This is done by adding the -g option to the compiler options).

If the extension in question really doesn't provide a custom tclsh build, it is normally very easy to build it by yourself. You find some information about this at [Building a custom tclsh].

To actually doing the memory debugging just use the mpatrol and mpsym tools, as shown above, but only with your custom tclsh with your extension compiled in instead of the standard tclsh.



** Valgrind for Extensions **

[AMG]: Recent Tcl doesn't attempt to clean up all its memory on exit because that takes time and serves no purpose.  However, this interferes with valgrind's leak detection.  Setting the TCL_FINALIZE_ON_EXIT environment variable is supposed to restore full cleanup, though I couldn't get it to work for me.  Compiling with -DPURIFY works much better, but it seems to have the side effect of making Tcl unload all dynamic libraries on exit, which prevents valgrind from showing the names of functions in loaded libraries.

If you're debugging a library loaded by Tcl and not Tcl itself, you can just use this suppression file:

======none
{
   Tcl allocation 1
   Memcheck:Leak
   match-leak-kinds: possible
   fun:malloc
   fun:GetBlocks
   fun:TclpAlloc
}
{
   Tcl allocation 2
   Memcheck:Leak
   match-leak-kinds: possible
   fun:malloc
   fun:TclThreadAllocObj
}
{
   Tcl allocation 3
   Memcheck:Leak
   match-leak-kinds: possible
   fun:malloc
   fun:TclpAlloc
}
{
   linker uninitialized branch at startup
   Memcheck:Cond
   fun:index
   fun:expand_dynamic_string_token
   fun:_dl_map_object
   fun:map_doit
   fun:_dl_catch_error
   fun:do_preload
   fun:dl_main
   fun:_dl_sysdep_start
   fun:_dl_start
   obj:/lib64/ld-2.19.so
}
======

That last entry isn't for Tcl, but it's useful to me on my [Slackware]64 system, so I'm leaving it in for now.

----

(author unknown, dated '''2002-08-02''') ([MS] pleads guilty)

The recent report on the '''valgrind''' memory debugger [http://valgrind.org/]
prompted me to try it on the tcl sources - help out the poor Jeff, the only '''purify''' licensee among us. 

This is what I did after downloading and installing '''valgrind''' on my linux/i386 laptop:

   * compile tcl/tcltest with -DPURIFY
   * modify tests/all.tcl to incorporate the ''exit trick'' described in [How to debug memory faults in Tcl and extensions], ie, inserted before the return the line
        proc exit args {}
   * ran the testsuite under valgrind:
        /opt/valgrind/bin/valgrind -v --leak-check=yes --num-callers=10 \
          --logfile-fd=9 --leak-resolution=high --show-reachable=no \
          ./tcltest ../tests/all.pfy 9>valgrind.out


The output is

 ==19434== valgrind-1.0.0, a memory error detector for x86 GNU/Linux.
 ==19434== Copyright (C) 2000-2002, and GNU GPL'd, by Julian Seward.
 ==19434== Startup, with flags:
 ==19434==    --suppressions=/opt/valgrind/lib/valgrind/default.supp
 ==19434==    -v
 ==19434==    --leak-check=yes
 ==19434==    --num-callers=10
 ==19434==    --logfile-fd=9
 ==19434==    --leak-resolution=high
 ==19434==    --show-reachable=no
 ==19434== Reading suppressions file: /opt/valgrind/lib/valgrind/default.supp
 ==19434== Reading syms from /CVS/tcl_SF_clean/unix/tcltest
 ==19434== Reading syms from /lib/ld-2.2.4.so
 ==19434== Reading syms from /opt/valgrind/lib/valgrind/valgrind.so
 ==19434== Reading syms from /lib/libdl-2.2.4.so
 ==19434== Reading syms from /lib/i686/libm-2.2.4.so
 ==19434== Reading syms from /lib/i686/libc-2.2.4.so
 ==19434== Estimated CPU clock rate is 598 MHz
 ==19434== 
 ==19434== 
 ==19434== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
 ==19434== malloc/free: in use at exit: 3930 bytes in 22 blocks.
 ==19434== malloc/free: 136637 allocs, 136615 frees, 56516658 bytes allocated.
 ==19434== 
 ==19434== searching for pointers to 22 not-freed blocks.
 ==19434== checked 4470668 bytes.
 ==19434== 
 ==19434== definitely lost: 544 bytes in 17 blocks.
 ==19434== possibly lost:   0 bytes in 0 blocks.
 ==19434== still reachable: 3386 bytes in 5 blocks.
 ==19434== 
 ==19434== 544 bytes in 17 blocks are definitely lost in loss record 4 of 6
 ==19434==    at 0x400467C4: malloc (vg_clientfuncs.c:100)
 ==19434==    by 0x80CF7AE: TclpAlloc (./../generic/tclAlloc.c:680)
 ==19434==    by 0x8066041: Tcl_Alloc (./../generic/tclCkalloc.c:1002)
 ==19434==    by 0x80BC2E9: NewVar (./../generic/tclVar.c:4266)
 ==19434==    by 0x80B9419: TclLookupArrayElement (./../generic/tclVar.c:929)
 ==19434==    by 0x80840B4: TclExecuteByteCode (./../generic/tclExecute.c:1775)
 ==19434==    by 0x8082CB9: TclCompEvalObj (./../generic/tclExecute.c:1007)
 ==19434==    by 0x80AF574: TclObjInterpProc (./../generic/tclProc.c:1074)
 ==19434==    by 0x8061BAD: TclEvalObjvInternal (./../generic/tclBasic.c:3033)
 ==19434==    by 0x808381C: TclExecuteByteCode (./../generic/tclExecute.c:1430)
 ==19434== 
 ==19434== LEAK SUMMARY:
 ==19434==    definitely lost: 544 bytes in 17 blocks.
 ==19434==    possibly lost:   0 bytes in 0 blocks.
 ==19434==    still reachable: 3386 bytes in 5 blocks.
 ==19434== Reachable blocks (those to which a pointer was found) are not shown.
 ==19434== To see them, rerun with: --show-reachable=yes
 ==19434== 
 --19434--       lru: 798 epochs, 0 clearings.
 --19434-- translate: new 14353 (218894 -> 3060822), discard 0 (0 -> 0).
 --19434--  dispatch: 39900000 basic blocks, 14078/524777 sched events, 248895 tt_fast misses.
 --19434-- reg-alloc: 5203 t-req-spill, 573378+33302 orig+spill uis, 77003 total-reg-r.
 --19434--    sanity: 1014 cheap, 41 expensive checks.
 
'''Notes'''
   * '''valgrind''' is amazingly fast: 'time' reports 1m18 user time for the testsuite, vs 0m45 without using it.
   * the option '--show-reachable=yes' produces also a stack trace of the creation of the ''reachable unfreed memory at exit''. In our case, the 5 blocks totalling 3386 bytes are due to tcl not freeing encoding-related memory properly ( See Bug #543549 [http://sourceforge.net/tracker/index.php?func=detail&aid=543549&group_id=10894&atid=110894] )
   * this shows a real leak: 17 times in the testsuite, a new element is added to a local array and the corresponding variable is never freed. ''The hunt is on!''

'''UPDATE:''' ''"if it is too good to be true ... it is probably not true"''

When run as above, '''valgrind''' will not monitor children processes. But tests are normally run in separate processes, so that I was ''not'' monitoring the memory usage of the actual tests.

With the option --trace-children=yes, or running the tests in the same process, the slowdown is enormous - append.test takes 75x longer under valgrind.




** Misc **

Anyone know if there are commonly available [tclkit]'s with the memory command, etc. turned on?

[NEM] I doubt it. [tclkit] is meant as a deployment solution, so it is unlikely you would want debugging turned on in this situation.

----

[[Talk about array statistics.]]

<<categories>> Porting | Debugging