Back to July Calendar 

Previous Entry Next Entry→

September 19, 2005

How about exercise #17.8?  In my cogitational peregrinations in attending to the solution of this problem,  I cam across this nifty site at Rutgers: http://www.pegasus.rutgers.edu/~elflord/cpp/list_howto/ wherein you find the commentary:

Iterating over lists

  • Some textbooks decline providing a flexible implementation, offering print as the only means to get at data in the list. This is obviously unacceptable for serious applications.
  • There is an obvious generalisation of this approach that some books use -- instead of offering a print() operation, offer a way to walk a list. The walk function takes a pointer to a function as an argument, and calls that function on succesive nodes. This is an improvement, but it still suffers from inflexibility -- what if we only want to operate on a subset of the list ? What if we want to perform an operation such as add the numbers in a list ?

It seems that this describes Deitel fairly aptly, huh?  I think the Deitel version would suffice for the purpose of writing a parser that still hangs as the egger in the wings, but I am intrigued by this alternate approach, so I'll take a moment to explore.

In particular, I was searching for the "swap" operation that I would need to do a bubble sort on a list, to meet the requisites of exercise #17.8, and the Rutgers model claims to have one, so let's see.  The Rutgers site suggest various ways to address the problem:

Accessing list elements

Clearly, the best way to flexibly address the former concern would be to provide a way to address the latter. There are a few ways to access elements in a list:

  • Bad: A list class could have a notion of a current node.
  • Naive (and bad) We could use integer offsets, and treat the list like an array.
  • Pointer Based: We could use node pointers to represent positions in the list.
  • Iterator Based: We could abstract node pointers by using objects known as iterators.

We are advised to shun the instinct that our initial indoctrinization to arrays instills and to not treat any sequence container as an array.  A linked list is not an array, and is not a random access data structure and such misconceptions inevitably lead to disaster. Consider the following code:

	for (int i = 0; i < mylist.size(); ++i )
		cout << mylist[i] << endl;

Accessing the nth node takes n operations where n is the size of the list, so the loop takes 1 + 2 + 3 + ... + n = n(n+1)/2 operations. This is disastrous for large values of n, considering that we can do the same in linear time if we use the pointer based or iterator based approach.

Another fundamental problem with integer offsets is that they are invalidated by insertions and deletions. For example, consider this code:

  int pos = mylist.find ( "Thomas" );
  mylist.insert ( "William" );
  cout << mylist[pos]; // pos no longer refers to "Thomas"
This is a weakness that we are stuck with when we work with arrays, but there is no reason why we should suffer the same for lists, because inserting in a list does not change the memory address of the nodes--that is, unlike an array, the information is not distributed on contiguous memory.

So implementing a linked list that uses array style integer offsets should be shunned.

The Pointer Approach to Element Access

The second approach is the one that is usually used in C implementations. In a C++ implementation, there are two different sub-categories of this approach.

  • A linked list is the same as a list node
  • A linked list is not a list node. It's a class that contains a pointer to a head node as the unique data member.
These approaches may seem identical on the face of it, but there are some key differences. For example, inserting an element before the head node is problematic in the former case. These are examples of ways one could write a function to insert at the front, using the linked list-is-a-node approach:
void insert_front ( Node*&  head, const element_type & x)
{
   Node* tmp = new Node(head, x);
   head = tmp;
}
// example client code
insert_front(mylist, x);
Node* insert_front ( Node*  head, const element_type & x)
{
   Node* tmp = new Node(head, x);
   return tmp;
}

// example client code
head = insert_front(mylist, x);
The linked-list-is-a-node approach is problematic in that calling them on a Node that is not at the head of the list will result in disaster (it will "unlink" the list, making access to elements beyond the operand inaccessible, and it will leave the pointer to the operand pointing at garbage.)

The linked-list-is-not-a-node approach has the advantage that encapsulation makes it possible to ensure that client code does not corrupt the list. An example of code using this approach:

void insert_front ( const element_type & x)
{
   Node* tmp = new Node(head, x);
   head = tmp;
}

// example client code
mylist.insert_front(x);
By the way, here's Deitel's version of this from a few days ago:
template< class NODETYPE >
void List< NODETYPE >::insertAtFront( const NODETYPE &value )
{
   ListNode< NODETYPE > *newPtr = getNewNode( value );
   if ( isEmpty() )  // List is empty
      firstPtr = lastPtr = newPtr;
   else // List is not empty
      newPtr->nextPtr = firstPtr;
      firstPtr = newPtr;
   } // end else
} // end function insertAtFront

where getNewNode simply allocates memory for a new node:

// return pointer to newly allocated node
template< class NODETYPE >
ListNode< NODETYPE > *List< NODETYPE >::getNewNode(
   const NODETYPE &value )
{
   return new ListNode< NODETYPE >( value );
} // end function getNewNode

Notice that client code does not have the opportunity to corrupt the list, because the client code does not have to know which Node is at the front of the list.

The Iterator Based Approach To Element Access

This approach uses a special iterator class that encapsulates a node pointer. One of the key advantages of this is that we can use this to provide consistent semantics to different container types. We defer further discussion of iterators.

Basic Operations

Rutger's Node implementation uses a struct:

struct Node
{
    Node(const T& x, Node* y):data(x),next(y) {}
    T data;
    Node* next;
};

This is very similar to Deitel, except that, instead of a class, we're using a struct (a simpler data aggregate) and there is node pointer argument in the constructor. This makes it simpler to write code that creates list nodes. Even if we were implementing the list in C, it would be desirable to write a function that creates a node.

Now we need to design the interface for the list, which we may want to include the follow, for a start: 

  • insert at front
  • check for empty
  • remove from front
  • get element at front
  • insert anywhere in list
  • remove item anywhere in list
  • assign one list to another list

In addition, we also need to be able to create lists. This gives rise to the following creational operations:

  • Create an empty list
  • Create a copy of an existing list

The Deitel stuff developed so far has included removal and insertion at the back of the list, and Saturday's entry showed how we can concatenate two lists. To solve exercise 8, I want a function to swap two elements of a list.  So press on.

Rutgers (actually, Donovan Rebbechi) talks about using a sentinel node at the beginning of the list, despite extra overhead, because it makes it possible to avoid "special-casing".

Neither the "sentinel" nor the "tail pointer" design elements are employed by Rutgers, though they suggest that "these are actually quite useful in certain situations, and could be introduced to our class via a wrapper class."

So here's their first attempt at writing an interface:

template <class T>
class List {
  Node* head;
  public:
    struct Node
    {
      Node(const T& x, Node* y):data(x),next(y) {}
      T data;
      Node* next;
    };

    void push_front(const T& x)
    {
      Node* tmp = new Node(x, head);
      head = tmp;
    }
    List();
    List(const List&);
    List& operator=(const List&);
    bool empty();
    void pop_front();
    T& front();
};
Note that that "Node* head;"  precedes the public statement. This is then a private data member by default.  So the pointer (link) is private, but the Node structure itself is a public structure and its constructor is self referential.  We can push/pop on/off the front, we can check if the List is empty, it's empty and we can copy one list to another with the overloaded = operator.

Drawbacks

One problem is that, to "insert anywhere in the list," we need to represent a place in the list. Another problem with singly linked lists is that given a node in the list, we can not erase that node in constant time (because the time to find it will depend on how deeply buried it is,) and we can't insert a node before it in constant time.  This is a problem with singly linked lists that detracts from their usability, and the simplest solution is to use a doubly linked list instead if you want a more flexible, general purpose class (one could write iterators that contain ordered pairs of succesive node pointers, but given this overhead, one might as well use a doubly linked list)

The Rutgers solutions to this problem are the erase_after and insert_after methods, that operate on the successor to the function argument. The ugliness of this is noted.

A Simple Implementation

We first look into implementing a list, using the pointer based approach, with the linked-list-is-not-a-node model. We previously raised the importance of considering how the class is actually going to be used prior to implementing it. In this example, we could iterate over the list by using node pointers. For example, we could write a loop, and an insertion like this:

for ( Node* i = L.begin(); i!=L.end(); i=i-> next )
cout << i->data << endl;
Node* x = L.find (3);
L.insert_after(x,4);

The notation is not elegant, but it works. Rutgers has some "implementation issues" to ponder at this stage:

Implementation Issues

There are a few issues that surface during implementation. For example, it proves useful to implement a swap() function that swaps the data of two lists. This function makes the assignment operator easier to implement (it can reuse the copy constructor) and it also offers a cheap destructive copy operation.

Another implementation issue is that copying a singly linked list requires a function to reverse the list, because the obvious strategy of iterating over the nodes and inserting them gives us a list in the wrong order. So a method that reverses the order of the list by pointer manipulation proves desirable.

Another subtle issue is const correctness. To declare a method that gives out pointers to data in the list const is dangerous, because it would allow functions that accepted a const reference to a list to use these pointers to modify the list. So we provide const versions of the methods begin() and front() that return a const pointer/reference.

Here's the first real fleshing out of the template for the Rutgers List class:

template <class T> class List {
private:
Node* head;
public
:
struct Node  {
   Node(const T& data, Node* next=0):data(data),next(next) {}
   Node* next;
   T data;
};
// List constructor for empty list containing only NULL pointer
//
This is important, because of the class invariant that the element one
// past the end of the list (or, in the case of an empty list, the pointer
// to the "first Node") is signified by a NULL pointer.
List() : head(0) {}

//
The copy constructor traverses its argument, but 
// the data is inserted at the front so the elements
// end up at the end of the list. Fix this by reversing the list
// with a reverse() member function.
List(const List& L) : head(0) {
   // copy in reverse order
   for ( const Node* i = L.begin(); i!= L.end(); i=i->next )
      push_front(i->data);
   reverse();
}

/
/ Implementing this function is delicate.
// The plan is to traverse the list, modifying the next pointer of each
// node so it points to the previous node. This requires maintaining the // following information in our loop:
//     *
 The current node, i that we are changing.
//     *  A pointer to the previous node. A variable is needed to
//        represent this, because we need to re-assign the next pointer
//        in the current node to this address, and it is not possible to
//        backtrack in a singly linked list.
//     *  Once we change the value of the next pointer in the current node,
//        we have no way of accessing the next node. So we need to save
//        the address of the next node before modifying the current node.
// Note the code:
void reverse() {
   // The initial previous node should be the NULL pointer
   // since it will end up being the last.  Why not directly?
   Node* p = 0;
   // The initial current node is the head, but you have to access that
   // through member function begin().

   Node* i = begin();
   // n is for "next," which is the Node structure's self referential
   // pointer to the next Node in the List. 
   // The first order of business will be to save this pointer as n.
   // The "p = i" is a way of incrementing, other than that, this is
   // just a classic swap routine.

   Node* n;
   while (i)  {
      n = i->next;
      i->next = p; 
// make the next pointer point to previous node.
      p = i; 
// update previous node to point to "current"
      i = n; 
// recall the address of the node after i
   }
   // cap it off by making the last first.
   head = p;
}

// The swap function is a very useful idiom in C++.
// It offers a means to transfer ownership:
// List tmp;
// tmpList.swap(x); // tmpList "steals" the data from x
// And in circumstances where it's desirable to swap data,
// it makes it possible to do this in constant time
// (that is, the time does not depend on the length of the list,
// since we only swap head pointers) We will later see that this
// also provides us with a means to implement the assignment operator
// in terms of the copy constructor.
void swap(List& x)  {
   // save the head pointer of the calling List to Node pointer tmp
   Node* tmp = head;
   head = x.head; // assign the head pointer of the called List
                          // to the head pointer of the calling List

   x.head = tmp; // finish the swap
}

//
use copy constructor and swap to overload assignment operator
List& operator=(const List& x)  {
   List tmp(x); 
   swap(tmp);
   return *this;
}

~List() { clear(); }
void clear() { while (!empty()) pop_front(); }

bool empty() { return ! head; }

//
To insert a node, create the new node and update the list
// so the head node points to the address of the new node
void push_front(const T& x) {
   Node* tmp = new Node(x,head); head = tmp;
}

// pop_front
removes the first node. Notice that the value of the deleted
// node is not returned. This promotes exception safety.
// A detailed discussion of this topic is outside the scope of this venue.
// For further reading, try Exceptional C++ by Herb Sutter.
void pop_front() {
   if (head) { Node* tmp = head; head=head->next; delete tmp; }
}

//
inserting or erasing after the current node because we can't
// backtrack in a singly linked list. Note that we check the special
// case where erase_after is called on the last node of the list.
// In this case, erase_after has no effect.
void insert_after(Node* x, const T& data)  {
   Node* tmp = new Node(data, x->next);
   x->next = tmp;
}

void erase_after(Node* x)   {
   Node* tmp = x->next;
   if (tmp)   {
   // copy point after next to next pointer
      x->next = x->next->next;
      delete tmp; // clean up
   }
}

//
Two versions of front() are provided here.
// The compiler automatically decides which version of the function
// gets called. One of these is automatically called when we have
// a const list (for example, if a list is passed to a function by
// const reference), the other is used on non-const lists.
// front() can be used in conjunction with pop_front.

	// Foo x = mylist.front();
	// mylist.pop_front();

	// The exception safety benefit of this approach is that we know
	// that we have successfully copied the element at the front of the
	// list before it is removed
	T& front() { return head->data; }
	const T& front() const { return head->data; }

	// The end() function is not strictly necessary, it is included for
	// the benefit of programmers who are used to STL conventions
	// (checking i != mylist.end() as opposed to i!= NULL)
	// Like the front() member function, we need a const and non-const version.
	// We discuss the reasoning behind this further on. 
	Node* begin() { return head; }
	Node* end() { return 0; }

	const Node* begin() const { return head; }
	const Node* end() const { return 0; }


};

Wow, this is wearing me out.   Tomorrow, an implementation of the above template class.