Recursion or Iteration? YOU decide...
Bob Balaban April 4 2008 07:23:05 AM
Greetings, Geeks!
Today our topic is....recursion. No, fellow geeks, this does NOT mean that we are going to practice cursing, and cursing again....
The thing is, recursion (loosely defined as the technique whereby a subroutine processes a piece of a data structure, and then invokes itself ("recursively") to process another piece. The classic example that we all learn in Computer Science 101 class is... "walking a tree". Note that this has very little, if anything to do with the technique we learn in real life, called "walking the dog" (and YES, I KNOW that that is also a yo-yo maneuver....sheesh).
There's even a pretty good example of a recursive tree-walker routine, written in LotusScript, in the Designer Help file (search or index to "Domparser class", go to that article, and follow the "Example" link). In this case, the code is reading in an XML file, parsing it into a tree using the standard XML DOM parser, and then travelling the tree to visit every node. Because the algorithm is recursive, it does a "depth-first" tree walk, meaning that it visits the first child of the top-level node, then the first child of that node, then the first child of that node, etc., until it reaches a node that has no children. Along the way, the "welktree" subroutine calls itself to "process" the next child node. In this way, it stores the "context" at each layer of the tree on the LS invocation stack. When it hits a node with no children, it "processes" that node, then backs up a layer, and continues to the next child of tht layer's parent node.
Yes, it sounds confusing, but if you step through the code in the debugger, you'll get it pretty quickly.
One of our developoers actually copied this code and used it in developing an enhancement to one of our products. It was at that point that he discovered a serious limitation of recursion as an algorithm, and thereby learned an important lesson. To wit: Just because they teach you a "standard" algorithm for data structure traversal in CS101, that doesn't automatically mean that you should actually USE it in code that you sell to people for money, and therefore have to support for real.
Why do I say this? Because, dear Geeks, it turned out that the whole thing blew up when the XML DOM tree got big; especially when it got "deep" (meaning, a lot of nested layers). Why? Because, unlike in Windows (and many other operating systems), the size of the calling stack in LotusScript is fixed. Yes, that is correct. The calling stack for a LotusScript program is pre-allocated at program start time, and it never changes. In the Windows C (and .NET) runtime, by contrast, you can specify a calling stack's initial size (there is one stack per thred in most OS processes), but if you run out, the OS (or more properly, the language runtime servics) will grow it automtically. But in LS, when you (for example) recurse too deeply at runtime, you get an "out of memory" exception. I'm not sure how big the LS pre-allocated stack is, but if I had to guess, I would guess 64KB.
So, copying that particular example didn't work out so well for us. My suggestion for fixing this problem was to process the nodes iteratively instead of recursively -- do them in a loop. If you traverse the tree "breadth first" instead of "depth first", you can just build a queue of nodes as you "walk" the tree, and at the same time, you can pull nodes off the front of the queue for "processing". We did indeed verify that for the same data inputs, this new technique did not run out of memory.
Here's the original example, copied from Designer Help:
Original version:
This agent parses the origXML file, walks the DOM node tree and creates a report in outputFile about the nodes encountered.
(Declarations)
Dim domParser As NotesDOMParser
Dim LF As String
Sub Initialize
Dim session As NotesSession
Dim db As NotesDatabase
Dim inputStream As NotesStream, outputStream As NotesStream
Dim docNode As NotesDOMDocumentNode
Dim origXML As String, outputFile As String
origXML = "c:\dxl\xmldom.xml"
outputFile = "c:\dxl\DOMtree.txt"
Dim header As String
header = "Walk Tree agent"
LF = Chr(13)+Chr(10)
On Error Goto errh
Set session = New NotesSession
Set db = session.CurrentDatabase
'create the output file
Set outputStream =session.CreateStream
outputStream.Open (outputFile)
outputStream.Truncate
'write report title
outputStream.WriteText ("DOM Parser Report - " )
outputStream.WriteText (header+LF)
'open the XML file
Set inputStream = session.CreateStream
inputStream.Open (origXML)
If inputStream.Bytes = 0 Then
outputStream.WriteText (origXML+" is empty"+LF)
Goto results
End If
'create DOM parser and process
Set domParser=session.CreateDOMParser(inputStream, outputStream)
domParser.Process
'get the document node
Set docNode = domParser.Document
Call walkTree(docNode)
results:
Call outputStream.Close
Exit Sub
errh:
outputStream.WriteText ("errh: "+Cstr(Err)+": "+Error+LF)
Resume results
End Sub
Sub walkTree ( node As notesdomnode)
Dim child As notesdomnode
Dim elt As notesdomnode
Dim attrs As notesdomnamednodemap
Dim a As notesdomattributenode
Dim piNode As Notesdomprocessinginstructionnode
LF = Chr(13)+Chr(10)
If Not node.IsNull Then
Select Case node.NodeType
Case DOMNODETYPE_DOCUMENT_NODE: ' If it is a Document node
domParser.Output( "Document node: "+node.Nodename )
Set child = node.FirstChild ' Get the first node
Dim numChildNodes As Integer
numChildNodes = node.NumberOfChildNodes
domParser.Output(" has "+Cstr(numChildNodes)+" Child Nodes"+LF)
While numChildNodes > 0
Set child = child.NextSibling ' Get next node
numChildNodes = numChildNodes - 1
Call walkTree(child)
Wend
Case DOMNODETYPE_DOCUMENTTYPE_NODE: ' It is a tag
domParser.Output("Document Type node: "+ node.NodeName+LF)
Case DOMNODETYPE_TEXT_NODE: ' Plain text node
domParser.Output("Text node: "+node.NodeValue+LF)
Case DOMNODETYPE_ELEMENT_NODE: ' Most nodes are Elements
domParser.Output("Element node: "+node.NodeName )
Set elt = node
Dim numAttributes As Integer, numChildren As Integer
numAttributes = elt.attributes.numberofentries
domParser.Output(" has "+Cstr(numAttributes)+" Attributes"+LF)
Set attrs = elt.Attributes ' Get attributes
Dim i As Integer
For i = 1 To numAttributes ' Loop through them
Set a = attrs.GetItem(i)
' Print attr. name & value
domParser.Output("Attribute "+a.NodeName+": "+a.NodeValue+LF)
Next
numChildren = elt.NumberOfChildNodes
Set child = elt.FirstChild ' Get child
While numChildren > 0
Call walkTree(child)
Set child = child.NextSibling ' Get next child
numChildren = numChildren - 1
Wend
domParser.Output( elt.nodeName+LF)
Case DOMNODETYPE_COMMENT_NODE: ' Comments
domParser.Output("Comment node: "+node.NodeValue+LF)
Case DOMNODETYPE_PROCESSINGINSTRUCTION_NODE: ' Handle PI nodes
Set piNode = node
domParser.Output("Processing Instruction node: " )
domParser.Output(" with Target "+piNode.Target+_
" and Data "+piNode.Data+LF)
Case DOMNODETYPE_CDATASECTION_NODE: ' CDATA sections
domParser.Output("CDATA Section node: "+node.NodeName)
domParser.Output(" has value of "+node.NodeValue+LF)
Case DOMNODETYPE_ENTITYREFERENCE_NODE: ' Handle entities
domParser.Output("Entity Reference node: "+node.NodeName+LF)
Case Else:
domParser.Output("Ignoring node: "+Cstr(node.NodeType)+LF)
End Select 'node.NodeType
End If 'Not node.IsNull
End Sub
------------------End of Example--------------------------
It's a perfectly fine example, it's just that there are a lot of real-world cases where it will crash (at least, in LotusScript it will).
Here's a new version of the example that I re-wrote to use a node queue. The startup code parses the input XML file, and gets the top-level ("document") node. It puts that node on a queue (I wrote a LS "Queue" class to manage the queue, it's in the Declarations section). Then, as long as there's something in the queue, we grab the first Node, and see if it has any children. If so, we do a loop and put each child at the end of the queue. You could modify the code to "process" (in this example "process" just means to write out some info about the current node) the node before you put it on the queue, or to do it when you take it off the front of the queue, it doesn't matter that much.
So, in sum, we're stuffing nodes onto the tail of the queue as we encounter them, and pulling them off the front for processing. We're done when the queue is empty. Not too hard, eh?
Enjoy!
New version:
'TreeWalker:
Option Public
Option Declare
Dim domParser As NotesDOMParser
Dim LF As String
Const INCREMENT = 5
Public Class Queue
Private q() As NotesDOMNode
Private first As Integer
Private last As Integer
Sub new()
first = 0
last = -1
Redim q(INCREMENT - 1) ' INCREMENT represents the count, not the ubound
End Sub
Public Sub Add(node As NotesDOMNode)
Dim origFirst As Integer
Dim i As Integer
origFirst = Me.first
' does the queue have space to just add?
If Me.last < Ubound(Me.q) Then
Me.last = Me.last + 1
Set Me.q(last) = node
Goto XEnd
End If
' if there's space at the head of the queue, shift everything over
' (this is cheaper than redim-ing the array)
If Me.first > 0 Then
i = 0
While Me.first <= Me.last
Set Me.q(i) = Me.q(Me.first)
Set Me.q(Me.first) = Nothing
Me.first = Me.first + 1
i = i + 1
Wend
Me.last = Me.last - origfirst
Me.first = 0
' now add the new node
Me.last = Me.last + 1
Set Me.q(last) = node
Goto XEnd
End If
' if we reach here, the array is full but we still need to add a node,
' so no choice but to expand the array
Redim Preserve Me.q(Ubound(Me.q) + INCREMENT )
Me.last = Me.last + 1
Set Me.q(last) = node
Goto XEnd
XEnd:
End Sub
Public Function FirstNode() As NotesDOMNode
If Me.first <= Me.last Then
Set FirstNode = Me.q(Me.first)
Set Me.q(Me.first) = Nothing
Else
Set FirstNode = Nothing
Exit Function
End If
Me.first = Me.first + 1
End Function
End Class
Sub Initialize
Dim session As NotesSession
Dim db As NotesDatabase
Dim inputStream As NotesStream, outputStream As NotesStream
Dim docNode As NotesDOMDocumentNode
Dim origXML As String, outputFile As String
origXML = "c:\temp\xmldom.xml"
outputFile = "c:\temp\DOMtree.txt"
Dim header As String
header = "Walk Tree agent"
LF = Chr(13)+Chr(10)
On Error Goto errh
Set session = New NotesSession
Set db = session.CurrentDatabase
'create the output file
Set outputStream =session.CreateStream
outputStream.Open (outputFile)
outputStream.Truncate
'write report title
outputStream.WriteText ("DOM Parser Report - " )
outputStream.WriteText (header+LF)
'open the XML file
Set inputStream = session.CreateStream
inputStream.Open (origXML)
If inputStream.Bytes = 0 Then
outputStream.WriteText (origXML+" is empty"+LF)
Goto results
End If
'create DOM parser and process
Set domParser=session.CreateDOMParser(inputStream, outputStream)
domParser.Process
'get the document node
Set docNode = domParser.Document
' Call walkTree(docNode)
Dim q As New Queue()
q.Add docNode ' get the first node in there
Process q
results:
Call outputStream.Close
Exit Sub
errh:
outputStream.WriteText ("errh: "+Cstr(Err)+": "+Error+LF)
Resume results
End Sub
Sub walkTree ( node As notesdomnode)
Dim child As notesdomnode
Dim elt As notesdomnode
Dim attrs As notesdomnamednodemap
Dim a As notesdomattributenode
Dim piNode As Notesdomprocessinginstructionnode
LF = Chr(13)+Chr(10)
If Not node.IsNull Then
Select Case node.NodeType
Case DOMNODETYPE_DOCUMENT_NODE: ' If it is a Document node
domParser.Output( "Document node: "+node.Nodename )
Set child = node.FirstChild ' Get the first node
Dim numChildNodes As Integer
numChildNodes = node.NumberOfChildNodes
domParser.Output(" has "+Cstr(numChildNodes)+" Child Nodes"+LF)
While numChildNodes > 0
Set child = child.NextSibling ' Get next node
numChildNodes = numChildNodes - 1
Call walkTree(child)
Wend
Case DOMNODETYPE_DOCUMENTTYPE_NODE: ' It is a tag
domParser.Output("Document Type node: "+ node.NodeName+LF)
Case DOMNODETYPE_TEXT_NODE: ' Plain text node
domParser.Output("Text node: "+node.NodeValue+LF)
Case DOMNODETYPE_ELEMENT_NODE: ' Most nodes are Elements
domParser.Output("Element node: "+node.NodeName )
Set elt = node
Dim numAttributes As Integer, numChildren As Integer
numAttributes = elt.attributes.numberofentries
domParser.Output(" has "+Cstr(numAttributes)+" Attributes"+LF)
Set attrs = elt.Attributes ' Get attributes
Dim i As Integer
For i = 1 To numAttributes ' Loop through them
Set a = attrs.GetItem(i)
domParser.Output("Attribute "+a.NodeName+": "+a.NodeValue+LF)
Next
numChildren = elt.NumberOfChildNodes
Set child = elt.FirstChild ' Get child
While numChildren > 0
Call walkTree(child)
Set child = child.NextSibling ' Get next child
numChildren = numChildren - 1
Wend
domParser.Output( elt.nodeName+LF)
Case DOMNODETYPE_COMMENT_NODE: ' Comments
domParser.Output("Comment node: "+node.NodeValue+LF)
Case DOMNODETYPE_PROCESSINGINSTRUCTION_NODE: ' Handle PI nodes
Set piNode = node
domParser.Output("Processing Instruction node: " )
domParser.Output(" with Target "+piNode.Target+_
" and Data "+piNode.Data+LF)
Case DOMNODETYPE_CDATASECTION_NODE: ' CDATA sections
domParser.Output("CDATA Section node: "+node.NodeName)
domParser.Output(" has value of "+node.NodeValue+LF)
Case DOMNODETYPE_ENTITYREFERENCE_NODE: ' Handle entities
domParser.Output("Entity Reference node: "+node.NodeName+LF)
Case Else:
domParser.Output("Ignoring node: "+Cstr(node.NodeType)+LF)
End Select 'node.NodeType
End If 'Not node.IsNull
End Sub
Sub Process(q As Queue)
Dim node As NotesDOMNode
Dim count As Integer
Dim child As NotesDOMNode
Dim i As Integer
Set node = q.FirstNode
While Not (node Is Nothing)
count = node.NumberOfChildNodes
If count > 0 Then
For i = 0 To count-1
If i = 0 Then
Set child = node.FirstChild
Else
Set child = child.NextSibling
End If
(Need expert application development architecture/coding help? Contact me at: bbalaban, gmail.com)
Follow me on Twitter @LooseleafLLC
This article ©Copyright 2009 by Looseleaf Software LLC, all rights reserved. You may link to this page, but may not copy without prior approval.
- Comments [0]