Bob Balaban's Blog

    Recursion or Iteration? YOU decide...

    Bob Balaban  April 4 2008 06:23:05 AM
    Image:Recursion or Iteration? YOU decide...
    Greetings, Geeks!

    Today our topic is....recursion. No, fellow geeks, this does NOT mean that we are going to practice cursing, and cursing again....

    The thing is, recursion (loosely defined as the technique whereby a subroutine processes a piece of a data structure, and then invokes itself ("recursively") to process another piece. The classic example that we all learn in Computer Science 101 class is... "walking a tree". Note that this has very little, if anything to do with the technique we learn in real life, called "walking the dog" (and YES, I KNOW that that is also a yo-yo maneuver....sheesh).

    There's even a pretty good example of a recursive tree-walker routine, written in LotusScript, in the Designer Help file (search or index to "Domparser class", go to that article, and follow the "Example" link). In this case, the code is reading in an XML file, parsing it into a tree using the standard XML DOM parser, and then travelling the tree to visit every node. Because the algorithm is recursive, it does a "depth-first" tree walk, meaning that it visits the first child of the top-level node, then the first child of that node, then the first child of that node, etc., until it reaches a node that has no children. Along the way, the "welktree" subroutine calls itself to "process" the next child node. In this way, it stores the "context" at each layer of the tree on the LS invocation stack. When it hits a node with no children, it "processes" that node, then backs up a layer, and continues to the next child of tht layer's parent node.

    Yes, it sounds confusing, but if you step through the code in the debugger, you'll get it pretty quickly.

    One of our developoers actually copied this code and used it in developing an enhancement to one of our products. It was at that point that he discovered a serious limitation of recursion as an algorithm, and thereby learned an important lesson. To wit: Just because they teach you a "standard" algorithm for data structure traversal in CS101, that doesn't automatically mean that you should actually USE it in code that you sell to people for money, and therefore have to support for real.

    Why do I say this? Because, dear Geeks, it turned out that the whole thing blew up when the XML DOM tree got big; especially when it got "deep" (meaning, a lot of nested layers). Why? Because, unlike in Windows (and many other operating systems),  the size of the calling stack in LotusScript is fixed. Yes, that is correct. The calling stack for a LotusScript program is pre-allocated at program start time, and it never changes. In the Windows C (and .NET) runtime, by contrast, you can specify a calling stack's initial size (there is one stack per thred in most OS processes), but if you run out, the OS (or more properly, the language runtime servics) will grow it automtically. But in LS, when you (for example) recurse too deeply at runtime, you get an "out of memory" exception. I'm not sure how big the LS pre-allocated stack is, but if I had to guess, I would guess 64KB.

    So, copying that particular example didn't work out so well for us. My suggestion for fixing this problem was to process the nodes iteratively instead of recursively -- do them in a loop. If you traverse the tree "breadth first" instead of "depth first", you can just build a queue of nodes as you "walk" the tree, and at the same time, you can pull nodes off the front of the queue for "processing". We did indeed verify that for the same data inputs, this new technique did not run out of memory.

    Here's the original example, copied from Designer Help:

    Original version:

    This agent parses the origXML file, walks the DOM node tree and creates a report in outputFile about the nodes encountered.

    (Declarations)
    Dim domParser As NotesDOMParser
    Dim LF As String

    Sub Initialize
    Dim session As NotesSession
    Dim db As NotesDatabase
    Dim inputStream As NotesStream, outputStream As NotesStream
    Dim docNode As NotesDOMDocumentNode

    Dim origXML As String, outputFile As String
    origXML = "c:\dxl\xmldom.xml"
    outputFile = "c:\dxl\DOMtree.txt"

    Dim header As String
    header = "Walk Tree agent"
    LF = Chr(13)+Chr(10)

    On Error Goto errh

    Set session = New NotesSession    
    Set db = session.CurrentDatabase

    'create the output file
    Set outputStream =session.CreateStream
    outputStream.Open (outputFile)
    outputStream.Truncate

    'write report title
    outputStream.WriteText ("DOM Parser Report - " )
    outputStream.WriteText (header+LF)

    'open the XML file
    Set inputStream = session.CreateStream
    inputStream.Open (origXML)
    If inputStream.Bytes = 0 Then
     outputStream.WriteText (origXML+" is empty"+LF)
     Goto results
    End If

    'create DOM parser and process
    Set domParser=session.CreateDOMParser(inputStream, outputStream)
    domParser.Process

    'get the document node
    Set docNode = domParser.Document

    Call walkTree(docNode)

    results:
    Call outputStream.Close
    Exit Sub
    errh:
    outputStream.WriteText ("errh: "+Cstr(Err)+":  "+Error+LF)
    Resume results
    End Sub

    Sub walkTree ( node As notesdomnode)
    Dim child As notesdomnode
    Dim elt As notesdomnode
    Dim attrs As notesdomnamednodemap
    Dim a As notesdomattributenode
    Dim piNode As Notesdomprocessinginstructionnode
    LF = Chr(13)+Chr(10)

    If Not node.IsNull Then  
     Select Case node.NodeType
     Case DOMNODETYPE_DOCUMENT_NODE:        ' If it is a Document node
       domParser.Output( "Document node: "+node.Nodename )
       Set child = node.FirstChild   ' Get the first node
       Dim numChildNodes As Integer
       numChildNodes = node.NumberOfChildNodes
       domParser.Output(" has "+Cstr(numChildNodes)+" Child Nodes"+LF)
       
       While numChildNodes > 0
         Set child = child.NextSibling ' Get next node
         numChildNodes = numChildNodes - 1
         Call walkTree(child)
       Wend
       
     Case DOMNODETYPE_DOCUMENTTYPE_NODE:   ' It is a tag
       domParser.Output("Document Type node: "+ node.NodeName+LF)
       
     Case DOMNODETYPE_TEXT_NODE:           ' Plain text node
       domParser.Output("Text node: "+node.NodeValue+LF)
       
     Case DOMNODETYPE_ELEMENT_NODE:        ' Most nodes are Elements
       domParser.Output("Element node: "+node.NodeName )
       Set elt = node
       
       Dim numAttributes As Integer, numChildren As Integer
       numAttributes = elt.attributes.numberofentries
       domParser.Output(" has "+Cstr(numAttributes)+" Attributes"+LF)
       
       Set attrs = elt.Attributes     ' Get attributes
       
       Dim i As Integer
       For i = 1 To numAttributes     ' Loop through them
         Set a = attrs.GetItem(i)
         ' Print attr. name & value
         domParser.Output("Attribute "+a.NodeName+": "+a.NodeValue+LF)
       Next
       
       numChildren =  elt.NumberOfChildNodes
       Set child = elt.FirstChild     ' Get child
       While numChildren > 0
         Call walkTree(child)
         Set child = child.NextSibling   ' Get next child
         numChildren = numChildren - 1
       Wend
       domParser.Output( elt.nodeName+LF)
       
     Case DOMNODETYPE_COMMENT_NODE:                ' Comments
       domParser.Output("Comment node: "+node.NodeValue+LF)
       
     Case DOMNODETYPE_PROCESSINGINSTRUCTION_NODE:  ' Handle PI nodes
       Set piNode = node
       domParser.Output("Processing Instruction node: " )
       domParser.Output(" with Target  "+piNode.Target+_
       " and Data "+piNode.Data+LF)
       
     Case DOMNODETYPE_CDATASECTION_NODE:           ' CDATA sections
       domParser.Output("CDATA Section node: "+node.NodeName)
       domParser.Output(" has value of "+node.NodeValue+LF)
       
     Case DOMNODETYPE_ENTITYREFERENCE_NODE:        ' Handle entities
       domParser.Output("Entity Reference node: "+node.NodeName+LF)
       
     Case Else:
       domParser.Output("Ignoring node: "+Cstr(node.NodeType)+LF)
       
     End Select  'node.NodeType
    End If        'Not node.IsNull
    End Sub

    ------------------End of Example--------------------------

    It's a perfectly fine example, it's just that there are a lot of real-world cases where it will crash (at least, in LotusScript it will).

    Here's a new version of the example that I re-wrote to use a node queue. The startup code parses the input XML file, and gets the top-level ("document") node. It puts that node on a queue (I wrote a LS "Queue" class to manage the queue, it's in the Declarations section). Then, as long as there's something in the queue, we grab the first Node, and see if it has any children. If so, we do a loop and put each child at the end of the queue. You could modify the code to "process" (in this example "process" just means to write out some info about the current node) the node before you put it on the queue, or to do it when you take it off the front of the queue, it doesn't matter that much.

    So, in sum, we're stuffing nodes onto the tail of the queue as we encounter them, and pulling them off the front for processing. We're done when the queue is empty. Not too hard, eh?

    Enjoy!



    New version:
    'TreeWalker:

    Option Public
    Option Declare



    Dim domParser As NotesDOMParser
    Dim LF As String
    Const INCREMENT = 5

    Public Class Queue
          Private q() As NotesDOMNode
          Private first As Integer
          Private last As Integer
         
          Sub new()
                  first = 0
                  last = -1
                  Redim q(INCREMENT - 1)        ' INCREMENT represents the count, not the ubound
          End Sub
         
          Public Sub Add(node As NotesDOMNode)
                  Dim origFirst As Integer
                  Dim i As Integer
                 
                  origFirst = Me.first
                 
                  ' does the queue have space to just add?
                  If Me.last < Ubound(Me.q)  Then
                          Me.last = Me.last + 1
                          Set Me.q(last) = node
                          Goto XEnd
                  End If
                 
                  ' if there's space at the head of the queue, shift everything over
                  ' (this is cheaper than redim-ing the array)
                  If Me.first > 0 Then
                          i = 0
                          While Me.first <= Me.last
                                  Set Me.q(i) = Me.q(Me.first)
                                  Set Me.q(Me.first) = Nothing
                                  Me.first = Me.first + 1
                                  i = i + 1
                          Wend
                          Me.last = Me.last - origfirst
                          Me.first = 0
                         
                          ' now add the new node
                          Me.last = Me.last + 1
                          Set Me.q(last) = node
                          Goto XEnd
                  End If
                 
                  ' if we reach here, the array is full but we still need to add a node,
                  ' so no choice but to expand the array
                  Redim Preserve Me.q(Ubound(Me.q) + INCREMENT )
                  Me.last = Me.last + 1
                  Set Me.q(last) = node
                  Goto XEnd
                 
    XEnd:
          End Sub
         
          Public Function FirstNode() As NotesDOMNode
                  If Me.first <= Me.last Then
                          Set FirstNode = Me.q(Me.first)
                          Set Me.q(Me.first) = Nothing
                  Else
                          Set FirstNode = Nothing
                          Exit Function
                  End If
    & nbsp;             Me.first = Me.first + 1
          End Function
    End Class
    Sub Initialize
          Dim session As NotesSession
          Dim db As NotesDatabase
          Dim inputStream As NotesStream, outputStream As NotesStream
          Dim docNode As NotesDOMDocumentNode
         
          Dim origXML As String, outputFile As String
          origXML = "c:\temp\xmldom.xml"
          outputFile = "c:\temp\DOMtree.txt"
         
          Dim header As String
          header = "Walk Tree agent"
          LF = Chr(13)+Chr(10)
         
          On Error Goto errh
         
          Set session = New NotesSession    
          Set db = session.CurrentDatabase
         
    'create the output file
          Set outputStream =session.CreateStream
          outputStream.Open (outputFile)
          outputStream.Truncate
         
    'write report title
          outputStream.WriteText ("DOM Parser Report - " )
          outputStream.WriteText (header+LF)
         
    'open the XML file
          Set inputStream = session.CreateStream
          inputStream.Open (origXML)
          If inputStream.Bytes = 0 Then
                  outputStream.WriteText (origXML+" is empty"+LF)
                  Goto results
          End If
         
    'create DOM parser and process
          Set domParser=session.CreateDOMParser(inputStream, outputStream)
          domParser.Process
         
    'get the document node
          Set docNode = domParser.Document
         
    '        Call walkTree(docNode)
          Dim q As New Queue()
          q.Add docNode                        ' get the first node in there
          Process q
         
    results:
          Call outputStream.Close
          Exit Sub
    errh:
          outputStream.WriteText ("errh: "+Cstr(Err)+":  "+Error+LF)
          Resume results
    End Sub


    Sub walkTree ( node As notesdomnode)
          Dim child As notesdomnode
          Dim elt As notesdomnode
          Dim attrs As notesdomnamednodemap
          Dim a As notesdomattributenode
          Dim piNode As Notesdomprocessinginstructionnode
          LF = Chr(13)+Chr(10)
         
          If Not node.IsNull Then  
                  Select Case node.NodeType
                  Case DOMNODETYPE_DOCUMENT_NODE:        ' If it is a Document node
                          domParser.Output( "Document node: "+node.Nodename )
                          Set child = node.FirstChild   ' Get the first node
                          Dim numChildNodes As Integer
                          numChildNodes = node.NumberOfChildNodes
                          domParser.Output(" has "+Cstr(numChildNodes)+" Child Nodes"+LF)
                         
                          While numChildNodes > 0
                                  Set child = child.NextSibling ' Get next node
                                  numChildNodes = numChildNodes - 1
                                  Call walkTree(child)
                          Wend
                         
                  Case DOMNODETYPE_DOCUMENTTYPE_NODE:   ' It is a tag
                          domParser.Output("Document Type node: "+ node.NodeName+LF)
                         
                  Case DOMNODETYPE_TEXT_NODE:           ' Plain text node
                          domParser.Output("Text node: "+node.NodeValue+LF)
                         
                  Case DOMNODETYPE_ELEMENT_NODE:        ' Most nodes are Elements
                          domParser.Output("Element node: "+node.NodeName )
                          Set elt = node
                         
                          Dim numAttributes As Integer, numChildren As Integer
                          numAttributes = elt.attributes.numberofentries
                          domParser.Output(" has "+Cstr(numAttributes)+" Attributes"+LF)
                         
                          Set attrs = elt.Attributes     ' Get attributes
                         
                          Dim i As Integer
                          For i = 1 To numAttributes     ' Loop through them
                                  Set a = attrs.GetItem(i)
                                  domParser.Output("Attribute "+a.NodeName+": "+a.NodeValue+LF)
                          Next
                         
                          numChildren =  elt.NumberOfChildNodes
                          Set child = elt.FirstChild     ' Get child
                          While numChildren > 0
                                  Call walkTree(child)
                                  Set child = child.NextSibling   ' Get next child
                                  numChildren = numChildren - 1
                          Wend
                          domParser.Output( elt.nodeName+LF)
                         
                  Case DOMNODETYPE_COMMENT_NODE:                ' Comments
                          domParser.Output("Comment node: "+node.NodeValue+LF)
                         
                  Case DOMNODETYPE_PROCESSINGINSTRUCTION_NODE:  ' Handle PI nodes
                          Set piNode = node
                          domParser.Output("Processing Instruction node: " )
                          domParser.Output(" with Target  "+piNode.Target+_
                          " and Data "+piNode.Data+LF)
                         
                  Case DOMNODETYPE_CDATASECTION_NODE:           ' CDATA sections
                          domParser.Output("CDATA Section node: "+node.NodeName)
                          domParser.Output(" has value of "+node.NodeValue+LF)
                         
                  Case DOMNODETYPE_ENTITYREFERENCE_NODE:        ' Handle entities
                          domParser.Output("Entity Reference node: "+node.NodeName+LF)
                         
                  Case Else:
                          domParser.Output("Ignoring node: "+Cstr(node.NodeType)+LF)
                         
                  End Select  'node.NodeType
          End If        'Not node.IsNull
    End Sub
    Sub Process(q As Queue)
          Dim node As NotesDOMNode
          Dim count As Integer
          Dim child As NotesDOMNode
          Dim i As Integer
         
          Set node = q.FirstNode
          While Not (node Is Nothing)
                  count = node.NumberOfChildNodes
                  If count > 0 Then
                          For i = 0 To count-1
                                  If i = 0 Then
                                          Set child = node.FirstChild
                                  Else
                                          Set child = child.NextSibling
                                  End If