Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    Recursion or Iteration? YOU decide...

    Bob Balaban  April 4 2008 07:23:05 AM
    Image:Recursion or Iteration? YOU decide...
    Greetings, Geeks!

    Today our topic is....recursion. No, fellow geeks, this does NOT mean that we are going to practice cursing, and cursing again....

    The thing is, recursion (loosely defined as the technique whereby a subroutine processes a piece of a data structure, and then invokes itself ("recursively") to process another piece. The classic example that we all learn in Computer Science 101 class is... "walking a tree". Note that this has very little, if anything to do with the technique we learn in real life, called "walking the dog" (and YES, I KNOW that that is also a yo-yo maneuver....sheesh).

    There's even a pretty good example of a recursive tree-walker routine, written in LotusScript, in the Designer Help file (search or index to "Domparser class", go to that article, and follow the "Example" link). In this case, the code is reading in an XML file, parsing it into a tree using the standard XML DOM parser, and then travelling the tree to visit every node. Because the algorithm is recursive, it does a "depth-first" tree walk, meaning that it visits the first child of the top-level node, then the first child of that node, then the first child of that node, etc., until it reaches a node that has no children. Along the way, the "welktree" subroutine calls itself to "process" the next child node. In this way, it stores the "context" at each layer of the tree on the LS invocation stack. When it hits a node with no children, it "processes" that node, then backs up a layer, and continues to the next child of tht layer's parent node.

    Yes, it sounds confusing, but if you step through the code in the debugger, you'll get it pretty quickly.

    One of our developoers actually copied this code and used it in developing an enhancement to one of our products. It was at that point that he discovered a serious limitation of recursion as an algorithm, and thereby learned an important lesson. To wit: Just because they teach you a "standard" algorithm for data structure traversal in CS101, that doesn't automatically mean that you should actually USE it in code that you sell to people for money, and therefore have to support for real.

    Why do I say this? Because, dear Geeks, it turned out that the whole thing blew up when the XML DOM tree got big; especially when it got "deep" (meaning, a lot of nested layers). Why? Because, unlike in Windows (and many other operating systems),  the size of the calling stack in LotusScript is fixed. Yes, that is correct. The calling stack for a LotusScript program is pre-allocated at program start time, and it never changes. In the Windows C (and .NET) runtime, by contrast, you can specify a calling stack's initial size (there is one stack per thred in most OS processes), but if you run out, the OS (or more properly, the language runtime servics) will grow it automtically. But in LS, when you (for example) recurse too deeply at runtime, you get an "out of memory" exception. I'm not sure how big the LS pre-allocated stack is, but if I had to guess, I would guess 64KB.

    So, copying that particular example didn't work out so well for us. My suggestion for fixing this problem was to process the nodes iteratively instead of recursively -- do them in a loop. If you traverse the tree "breadth first" instead of "depth first", you can just build a queue of nodes as you "walk" the tree, and at the same time, you can pull nodes off the front of the queue for "processing". We did indeed verify that for the same data inputs, this new technique did not run out of memory.

    Here's the original example, copied from Designer Help:

    Original version:

    This agent parses the origXML file, walks the DOM node tree and creates a report in outputFile about the nodes encountered.

    (Declarations)
    Dim domParser As NotesDOMParser
    Dim LF As String

    Sub Initialize
    Dim session As NotesSession
    Dim db As NotesDatabase
    Dim inputStream As NotesStream, outputStream As NotesStream
    Dim docNode As NotesDOMDocumentNode

    Dim origXML As String, outputFile As String
    origXML = "c:\dxl\xmldom.xml"
    outputFile = "c:\dxl\DOMtree.txt"

    Dim header As String
    header = "Walk Tree agent"
    LF = Chr(13)+Chr(10)

    On Error Goto errh

    Set session = New NotesSession    
    Set db = session.CurrentDatabase

    'create the output file
    Set outputStream =session.CreateStream
    outputStream.Open (outputFile)
    outputStream.Truncate

    'write report title
    outputStream.WriteText ("DOM Parser Report - " )
    outputStream.WriteText (header+LF)

    'open the XML file
    Set inputStream = session.CreateStream
    inputStream.Open (origXML)
    If inputStream.Bytes = 0 Then
    outputStream.WriteText (origXML+" is empty"+LF)
    Goto results
    End If

    'create DOM parser and process
    Set domParser=session.CreateDOMParser(inputStream, outputStream)
    domParser.Process

    'get the document node
    Set docNode = domParser.Document

    Call walkTree(docNode)

    results:
    Call outputStream.Close
    Exit Sub
    errh:
    outputStream.WriteText ("errh: "+Cstr(Err)+":  "+Error+LF)
    Resume results
    End Sub

    Sub walkTree ( node As notesdomnode)
    Dim child As notesdomnode
    Dim elt As notesdomnode
    Dim attrs As notesdomnamednodemap
    Dim a As notesdomattributenode
    Dim piNode As Notesdomprocessinginstructionnode
    LF = Chr(13)+Chr(10)

    If Not node.IsNull Then  
    Select Case node.NodeType
    Case DOMNODETYPE_DOCUMENT_NODE:        ' If it is a Document node
    domParser.Output( "Document node: "+node.Nodename )
    Set child = node.FirstChild   ' Get the first node
    Dim numChildNodes As Integer
    numChildNodes = node.NumberOfChildNodes
    domParser.Output(" has "+Cstr(numChildNodes)+" Child Nodes"+LF)

    While numChildNodes > 0
      Set child = child.NextSibling ' Get next node
      numChildNodes = numChildNodes - 1
      Call walkTree(child)
    Wend

    Case DOMNODETYPE_DOCUMENTTYPE_NODE:   ' It is a tag
    domParser.Output("Document Type node: "+ node.NodeName+LF)

    Case DOMNODETYPE_TEXT_NODE:           ' Plain text node
    domParser.Output("Text node: "+node.NodeValue+LF)

    Case DOMNODETYPE_ELEMENT_NODE:        ' Most nodes are Elements
    domParser.Output("Element node: "+node.NodeName )
    Set elt = node

    Dim numAttributes As Integer, numChildren As Integer
    numAttributes = elt.attributes.numberofentries
    domParser.Output(" has "+Cstr(numAttributes)+" Attributes"+LF)

    Set attrs = elt.Attributes     ' Get attributes

    Dim i As Integer
    For i = 1 To numAttributes     ' Loop through them
      Set a = attrs.GetItem(i)
      ' Print attr. name & value
      domParser.Output("Attribute "+a.NodeName+": "+a.NodeValue+LF)
    Next

    numChildren =  elt.NumberOfChildNodes
    Set child = elt.FirstChild     ' Get child
    While numChildren > 0
      Call walkTree(child)
      Set child = child.NextSibling   ' Get next child
      numChildren = numChildren - 1
    Wend
    domParser.Output( elt.nodeName+LF)

    Case DOMNODETYPE_COMMENT_NODE:                ' Comments
    domParser.Output("Comment node: "+node.NodeValue+LF)

    Case DOMNODETYPE_PROCESSINGINSTRUCTION_NODE:  ' Handle PI nodes
    Set piNode = node
    domParser.Output("Processing Instruction node: " )
    domParser.Output(" with Target  "+piNode.Target+_
    " and Data "+piNode.Data+LF)

    Case DOMNODETYPE_CDATASECTION_NODE:           ' CDATA sections
    domParser.Output("CDATA Section node: "+node.NodeName)
    domParser.Output(" has value of "+node.NodeValue+LF)

    Case DOMNODETYPE_ENTITYREFERENCE_NODE:        ' Handle entities
    domParser.Output("Entity Reference node: "+node.NodeName+LF)

    Case Else:
    domParser.Output("Ignoring node: "+Cstr(node.NodeType)+LF)

    End Select  'node.NodeType
    End If        'Not node.IsNull
    End Sub

    ------------------End of Example--------------------------

    It's a perfectly fine example, it's just that there are a lot of real-world cases where it will crash (at least, in LotusScript it will).

    Here's a new version of the example that I re-wrote to use a node queue. The startup code parses the input XML file, and gets the top-level ("document") node. It puts that node on a queue (I wrote a LS "Queue" class to manage the queue, it's in the Declarations section). Then, as long as there's something in the queue, we grab the first Node, and see if it has any children. If so, we do a loop and put each child at the end of the queue. You could modify the code to "process" (in this example "process" just means to write out some info about the current node) the node before you put it on the queue, or to do it when you take it off the front of the queue, it doesn't matter that much.

    So, in sum, we're stuffing nodes onto the tail of the queue as we encounter them, and pulling them off the front for processing. We're done when the queue is empty. Not too hard, eh?

    Enjoy!



    New version:
    'TreeWalker:

    Option Public
    Option Declare



    Dim domParser As NotesDOMParser
    Dim LF As String
    Const INCREMENT = 5

    Public Class Queue
       Private q() As NotesDOMNode
       Private first As Integer
       Private last As Integer
       
       Sub new()
               first = 0
               last = -1
               Redim q(INCREMENT - 1)        ' INCREMENT represents the count, not the ubound
       End Sub
       
       Public Sub Add(node As NotesDOMNode)
               Dim origFirst As Integer
               Dim i As Integer
               
               origFirst = Me.first
               
               ' does the queue have space to just add?
               If Me.last < Ubound(Me.q)  Then
                       Me.last = Me.last + 1
                       Set Me.q(last) = node
                       Goto XEnd
               End If
               
               ' if there's space at the head of the queue, shift everything over
               ' (this is cheaper than redim-ing the array)
               If Me.first > 0 Then
                       i = 0
                       While Me.first <= Me.last
                               Set Me.q(i) = Me.q(Me.first)
                               Set Me.q(Me.first) = Nothing
                               Me.first = Me.first + 1
                               i = i + 1
                       Wend
                       Me.last = Me.last - origfirst
                       Me.first = 0
                       
                       ' now add the new node
                       Me.last = Me.last + 1
                       Set Me.q(last) = node
                       Goto XEnd
               End If
               
               ' if we reach here, the array is full but we still need to add a node,
               ' so no choice but to expand the array
               Redim Preserve Me.q(Ubound(Me.q) + INCREMENT )
               Me.last = Me.last + 1
               Set Me.q(last) = node
               Goto XEnd
               
    XEnd:
       End Sub
       
       Public Function FirstNode() As NotesDOMNode
               If Me.first <= Me.last Then
                       Set FirstNode = Me.q(Me.first)
                       Set Me.q(Me.first) = Nothing
               Else
                       Set FirstNode = Nothing
                       Exit Function
               End If
               Me.first = Me.first + 1
       End Function
    End Class
    Sub Initialize
       Dim session As NotesSession
       Dim db As NotesDatabase
       Dim inputStream As NotesStream, outputStream As NotesStream
       Dim docNode As NotesDOMDocumentNode
       
       Dim origXML As String, outputFile As String
       origXML = "c:\temp\xmldom.xml"
       outputFile = "c:\temp\DOMtree.txt"
       
       Dim header As String
       header = "Walk Tree agent"
       LF = Chr(13)+Chr(10)
       
       On Error Goto errh
       
       Set session = New NotesSession    
       Set db = session.CurrentDatabase
       
    'create the output file
       Set outputStream =session.CreateStream
       outputStream.Open (outputFile)
       outputStream.Truncate
       
    'write report title
       outputStream.WriteText ("DOM Parser Report - " )
       outputStream.WriteText (header+LF)
       
    'open the XML file
       Set inputStream = session.CreateStream
       inputStream.Open (origXML)
       If inputStream.Bytes = 0 Then
               outputStream.WriteText (origXML+" is empty"+LF)
               Goto results
       End If
       
    'create DOM parser and process
       Set domParser=session.CreateDOMParser(inputStream, outputStream)
       domParser.Process
       
    'get the document node
       Set docNode = domParser.Document
       
    '        Call walkTree(docNode)
       Dim q As New Queue()
       q.Add docNode                        ' get the first node in there
       Process q
       
    results:
       Call outputStream.Close
       Exit Sub
    errh:
       outputStream.WriteText ("errh: "+Cstr(Err)+":  "+Error+LF)
       Resume results
    End Sub


    Sub walkTree ( node As notesdomnode)
       Dim child As notesdomnode
       Dim elt As notesdomnode
       Dim attrs As notesdomnamednodemap
       Dim a As notesdomattributenode
       Dim piNode As Notesdomprocessinginstructionnode
       LF = Chr(13)+Chr(10)
       
       If Not node.IsNull Then  
               Select Case node.NodeType
               Case DOMNODETYPE_DOCUMENT_NODE:        ' If it is a Document node
                       domParser.Output( "Document node: "+node.Nodename )
                       Set child = node.FirstChild   ' Get the first node
                       Dim numChildNodes As Integer
                       numChildNodes = node.NumberOfChildNodes
                       domParser.Output(" has "+Cstr(numChildNodes)+" Child Nodes"+LF)
                       
                       While numChildNodes > 0
                               Set child = child.NextSibling ' Get next node
                               numChildNodes = numChildNodes - 1
                               Call walkTree(child)
                       Wend
                       
               Case DOMNODETYPE_DOCUMENTTYPE_NODE:   ' It is a tag
                       domParser.Output("Document Type node: "+ node.NodeName+LF)
                       
               Case DOMNODETYPE_TEXT_NODE:           ' Plain text node
                       domParser.Output("Text node: "+node.NodeValue+LF)
                       
               Case DOMNODETYPE_ELEMENT_NODE:        ' Most nodes are Elements
                       domParser.Output("Element node: "+node.NodeName )
                       Set elt = node
                       
                       Dim numAttributes As Integer, numChildren As Integer
                       numAttributes = elt.attributes.numberofentries
                       domParser.Output(" has "+Cstr(numAttributes)+" Attributes"+LF)
                       
                       Set attrs = elt.Attributes     ' Get attributes
                       
                       Dim i As Integer
                       For i = 1 To numAttributes     ' Loop through them
                               Set a = attrs.GetItem(i)
                               domParser.Output("Attribute "+a.NodeName+": "+a.NodeValue+LF)
                       Next
                       
                       numChildren =  elt.NumberOfChildNodes
                       Set child = elt.FirstChild     ' Get child
                       While numChildren > 0
                               Call walkTree(child)
                               Set child = child.NextSibling   ' Get next child
                               numChildren = numChildren - 1
                       Wend
                       domParser.Output( elt.nodeName+LF)
                       
               Case DOMNODETYPE_COMMENT_NODE:                ' Comments
                       domParser.Output("Comment node: "+node.NodeValue+LF)
                       
               Case DOMNODETYPE_PROCESSINGINSTRUCTION_NODE:  ' Handle PI nodes
                       Set piNode = node
                       domParser.Output("Processing Instruction node: " )
                       domParser.Output(" with Target  "+piNode.Target+_
                       " and Data "+piNode.Data+LF)
                       
               Case DOMNODETYPE_CDATASECTION_NODE:           ' CDATA sections
                       domParser.Output("CDATA Section node: "+node.NodeName)
                       domParser.Output(" has value of "+node.NodeValue+LF)
                       
               Case DOMNODETYPE_ENTITYREFERENCE_NODE:        ' Handle entities
                       domParser.Output("Entity Reference node: "+node.NodeName+LF)
                       
               Case Else:
                       domParser.Output("Ignoring node: "+Cstr(node.NodeType)+LF)
                       
               End Select  'node.NodeType
       End If        'Not node.IsNull
    End Sub
    Sub Process(q As Queue)
       Dim node As NotesDOMNode
       Dim count As Integer
       Dim child As NotesDOMNode
       Dim i As Integer
       
       Set node = q.FirstNode
       While Not (node Is Nothing)
               count = node.NumberOfChildNodes
               If count > 0 Then
                       For i = 0 To count-1
                               If i = 0 Then
                                       Set child = node.FirstChild
                               Else
                                       Set child = child.NextSibling
                               End If


    (Need expert application development architecture/coding help? Contact me at: bbalaban, gmail.com)

    Follow me on Twitter @LooseleafLLC
    This article ┬ęCopyright 2009 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.

    Comments

    1Wild Bill  4/5/2008 6:25:32 AM  Recursion or Iteration? YOU decide...

    I used recursion when I wrote a LotusScript parser in .. LotusScript. Ouch. Painful.

    When parsing computer languages, recursion is really the only way to go.. So I found when I hit a recursion depth of around 180 calls, LotusScript failed. This didnt seem to change when I passed more or less parameters, so I suspect its a 'number of slots' issue, rather than 'size' issue.. (But I may be wrong).

    So usually when I *have* to recurse in LotusScript, I usually pass in a 'depth' parameter so at least I can stop it if it heads towards the abyss...

    Wot I mean is:

    function fred(depth as integer)

    if (depth > 150) then

    Print "Ahh the sky is falling!"

    exit function

    else

    call fred(depth+1)

    end if

    end function

    2Devin Olson  4/5/2008 8:15:01 AM  Recursion or Iteration? YOU decide...

    I've been bitten by the recursion beast myself, too may levels to count.

    This doesn't discount the value of being able to perform recursion; there are situations where recursive code is a perfect solution to a thorny problem. One just has to practice "safe code" - design you code in such a way that it protects itself from catastrophic heap and stack issues.

    Your rewrite is a good one - good enough in fact that I've already copied it into my designer help.

    @1 - Bill, I think you are partially correct, but not because there is any fixed "levels of iteration" maximum. I think that every recursive call to a given method requires a certain amount of memory (based on the method's signature and parameters types -different memory requirements for different types), which in turn leads to the out of memory issue after a given number of recursive calls. The gist of this is some recursive methods will hit this limit and fail after a relatively few number of levels (10 to 15); while others will handle well over 100 levels.

    I wonder of there is some way to test (other than your fixed level count) the amount of available memory left in the stack, compared to how much memory would be required for "another" recursive call, and force the recursion "coil" to unwind itself if the limit is about to be reached? Just a thought.

    -Devin.

    3Dan Sickles  4/5/2008 5:15:42 PM  Recursion or Iteration? YOU decide...

    Oh for tail call optimization in Lotusscript.

    4Bryan Schmiedeler  4/5/2008 5:16:46 PM  Recursion or Iteration? YOU decide...

    Interesting article.

    Any chance that the Lotus help will be updated to reflect this?

    I have run into situations like this before, where the code in the help or the function itself doesn't work under certain scope conditions. Of course, some things are not going to work as expected when you try to, say, process 1 million records. But I would really like to see additions to help that accurately reflect how robust a function is or its limitations. It sure would help out developers.

    Bryan

    5Bob Balaban  4/6/2008 5:56:17 AM  Recursion or Iteration? YOU decide...

    @1 Curious, I took your code and commented out the bit where it stops at 150, and added a print of the current depth. It caused an "Out of stack space" at level 250. Seems low to me, even if the stack size allocated is only 64K. Must investigate that someday.

    @4 Bryan, feel free to write up a PMR and send it in. Give them a link to my article and suggest that they contact me for permission to reprint it. Or, maybe Bob Perron still reads my blog... :-)

    6Andre Guirard  4/8/2008 8:56:26 AM  Recursion or Iteration? YOU decide...

    Bryan, there's a feedback link at the bottom of each help document. Click it, and say what you think should be added at that point. These are actively dealt with, and this is what I usually do myself if I have a change to suggest.

    7Erik Brooks  4/8/2008 10:01:31 AM  Recursion or Iteration? YOU decide...

    It's a CS fact that any recursive-based algorithm can be converted to standard looping and vice-versa. A looping-based algorithm will always be more gentle on stack space, and hence much faster to run. The only real benefit of recursion is elegance -- at times it's much easier to read and comprehend than a maze of loops.

    The LS stack limit is definitely a bit low. I've got one piece of code that calls itself recursively and I want to say that it bombs after something that (I estimate) is 16K of stack space.

    8John Smart  4/17/2008 7:27:18 PM  uhhh....

    "Goto" ???

    Seriously, Goto? I'm not asking to be a an annoying 'my kung fu is better' guy, I'm asking because I know you know a LOT more than I will about coding, I've been taught (and _have_ taught) that Goto is bad-very-bad-mkay, and your use of goto could be replaced by turning your if blocks into 1 single if-elseif block. Were you just being quick n dirty? Is there a performance benefit?

    9Bob Balaban  4/18/2008 7:40:01 AM  Recursion or Iteration? YOU decide...

    @8 - Hey John, that's a fair comment. No, it wasn't because I was being quick/dirty (in fact, I generally say, "Long after quick is gone, dirty remains...").

    I, like many long-time developers, have evolved a personal structural style that I just automatically use. A certain way of formatting the location of curly braces ({ }) in C and Java, and so on.

    In C and LotusScript I often use "goto" as a replacement for try/catch/finally. I use it sparingly, and (for the most part) only in case of error conditions, where I want to branch to a small amount of cleanup code at the end of the routine.

    Of course you can often use nested if/then/else blocks instead. However, I find that I don't really like deeply nested blocks (especially in LotusScript). For me (and reasonable people can certainly disagree) it's less readable. I use "goto XEnd" (as one example) essentially as a replacement for a "finally" block. I never use "goto" in Java, because it has the real thing. C++? eh. try/catch is still somewhat new there, I'll catch up eventually.

    So, there you are. I, too, sat through lots of lectures about the evils of "goto" (I remember a Pascal class back in the 70s...). And I agree that, like with almost any bit of programming syntax, if used improperly it leads to buggy, hard to maintain code. But I don't buy the religious prohibitions, never have. I practiced safe FORTRAN programming (professionally as well as for fun) for many years, and you simply cannot live without "goto" in that language (at least, not in the older versions, before FORTRAN77).

    10John Smart  4/18/2008 4:49:20 PM  Recursion or Iteration? YOU decide...

    Thanks for your response, Bob. Now that I get it, I like it. I usually label my cleanup code "ExitSub" purely for error resuming purposes but avoid Goto so automatically it never occurred to me that "GoTo ExitSub" might be more readable.

    I think after reading this I'm going to start calling it "Finally" so that I can have statements like "Goto Finally" in my code.

    11Bob Balaban  4/18/2008 7:59:26 PM  Recursion or Iteration? YOU decide...

    @10 - Yes! You can have LOTS of fun with goto labels. Think about some of these possibilities:

    Hell

    Paradise

    EndOfTheLine

    School

    TheCircus

    And so on!

    12Klaus Terman  6/3/2008 6:27:51 AM  Recursion or Iteration? YOU decide...

    I would love to try your approach but the code seems to be cut at the end! And why is walktree still in there.. do you use it to fill the queue ?

    13Anton Edwards  2/20/2010 10:23:47 AM  Thats me!

    I am reading this article second time today, you have to be more careful with content leakers. If I will fount it again I will send you a link