|
Pankaj Kumar's WeblogRandom thoughts, musings, experiences, ideas, and opinions |
June 05, 2004ArrayList versus LinkedList: Aren't they same?Minor editing and update at 3:30pm, June 05: Not surprisingly, it is natural for me to think of arrays as fixed size containers, where elements can be accessed at random location through O(1) operation (ie; in constant time) and insertion/deletion in the middle are O(N) operations (ie; could take time proportional to the size of the array) and hence, are better avoided. In contrast, a linked list can grow in size, access of its head or tail and insertion/deletion in the middle are all O(1) operations (assuming that you have pointer to an adjacent element). It is possible get around the fixed size limitation of arrays by writing a wrapper which will allocate a new array, copy the elements of the old array into the new one and then discard the old one (BTW, this is what ArrayList does). Still, the basic arrays remain a datastructure for collections of fixed size. In contrast, a linked list consists of nodes with 'pointers' to the next and previous node in the list. So, adding or removing a node is simply a matter of reassigning the pointers. Of course, this implies linear time for traversing upto an indexed node, starting from beginning or end. This simple model is very handy in deciding when to use an array and when to use a linked list. In Java, the ArrayList and LinkedList classes provide a uniform interface to both these datastructures and hence, destroy this simple conceptual model, so necessary to make judicious implementation decisions, in impressionable young minds of many Java programmers. Let me further elaborate this with my recent own experience. Today, while going over a graph traversal code, I was somewhat alarmed by the generous use of ArrayLists. This code was written by someone who perhaps had learnt programming with Java. As hinted earlier, both ArrayList and LinkedList implement List interface and support similar operations. An ArrayList can grow dynamically and allows insertion/deletion of elements. A LinkedList also allows access of elements through an index, exactly the same way as an ArrayList. This is all fine. However, the problem is that the apparent similarity in the API hides the widely different memory and time costs of these datastructures for different kinds of operations, luring the unwary to use them in dangerous ways:
While researching on this topic, I did find a couple of good articles on the Web:
I am not advocating either ArrayList or LinkedList, though it can be justifiably argued that the use of ArrayList is better suited in many more programming scenarios, and I have no contention with that. The point I am making is that the sameness of the API makes it easy for programmers to assume that these can be used interchangeably. Nothing can be farther from truth. They are distinct datastructures, each optimized for certain kinds of operations and domain of applicability. And a good programmer should be aware of the distinction. The API exposed by the above mentioned Java classes blur this distinction. In my opinion, this is one of those areas where implementation hiding behind a common, easy-to-use interface (think of List interface that both ArrayList and LinkedList implement) may not be in the best interest of the primary user of these classes. Posted by pankaj at June 5, 2004 12:06 AMComments
Wouldn't it have been nice if the developers of the graph traversal code had referenced the List interface everywhere so that the change to begin using LinkedList (as it is more efficient in that scenario) wouldn't have been too difficult. Posted by: R.J. on June 5, 2004 10:21 AMDon't forget new/gc time when comparing the lists. The LinkedList has significantly more. Although the JVM is constantly getting better at this, it's still something to avoid, particularly on multi-CPU servers. Personally I've found that I can almost always change the usage of the List to favor the ArrayList model, for example: 1. Always use the initial-size constructor of ArrayList if you have any sort of guesstimate. 2. Don't always add to the front of a List like in the techtip example. Add to the end, read from the end. 3. Don't repeatedly add or remove from the middle of a List. Build a new List instead. Clearly you don't want to obfuscate what the list is modeling for performance reasons unless absolutely necessary, but most often list usage is more a result of habit than strict modeling anyway. In those cases I usually lean towards the consistent and usually leaner/faster ArrayList rather than building up gc debt with the LinkedList. Posted by: Scott Vachalek on June 7, 2004 08:05 AMIMO, a common interface to ArrayList and LinkedList is the better design. And here's why: 1. As RJ's post suggests, hiding implementation details makes it possible to change the implementation at anytime with minimal fuss. 2. You're right, the "for loop" is bad: for (int i = 0; i < list.size(); i++) ... so don't do it that way. Instead do it "the right way" (ListIterator) and leave it, again, to the implementation to decide how to iterate. 3. Yes, inefficient code will be written by those who don't know better, but how would that change by having different interfaces? If the interfaces *were* different, those same programmers who e.g. presently pick ArrayList (or LinkedList) because it's less scary, or easier to spell, or has fewer/more letters, etc... would, when faced with two interfaces, simply pick the interface they are most comfortable with. Hiding the implementation details is a good practice, but it doesn't relieve us of the responsibility to choose a good implementation now does it? Posted by: F.B. on June 8, 2004 10:04 PMWow, I have to say that I disagree with nearly everything you said. I think you're looking at the issue the entirely wrong way. Just some pointers to get you going in the right direction: * Iterators should be used for ALL iteration, not just with LinkedList. * All application code should only work through the List interface. * The two implementations don't just happen to have similar interfaces, they have the identical one! Intentionally! * There aren't just two list implementations - many people have written others. * Even a defensive comment like, "I'm not advocating LinkedList over ArrayList" is going down the wrong path. * Your neologisms are tedious, and lacking in validity. Sorry! Posted by: Robb Shecter on June 8, 2004 10:24 PMApache Jakarta Commons Collections v3.1 will contain a new List implementation TreeList. This offers another choice, one that is very good for inserting/removing items from the middle of the list while still keeping good indexed access. http://jakarta.apache.org/commons/collections/apidocs/index.html This javadoc also includes a summary performance comparision of the three list types. Posted by: Stephen Colebourne on June 9, 2004 05:04 AMIn response to previous comments: It's absolutely pointless to use iterators when you KNOW that you're dealing with Array-backed lists. You're creating an extra object for no good reason, probably performing extra method calls (esp. if you're dumb enough to call hasNext() on the list whose size doesn't change to check for the exit condition, and most importantly, making your code harder to understand than it needs to be. The only reason to use iterators is if your interface specifies List and you don't know what type of list the client might pass in; in that case, the original post is absolutely correct. As an aside, you're wasting a lot of cycles if you call size() to check for the exit condition on a list of constant (within the scope of a loop) size, instead of storing it in a variable. Posted by: Eugene Kaganovich on June 9, 2004 01:20 PMthe key is the RandomAccess interface. ArrayList implements it, LinkedList doesn't. RandomAccess is nothing more than a marker interface, but this is a way of telling you that it'll be quicker to access ArrayList by index. The javadoc for this explains it well also: http://java.sun.com/j2se/1.4.2/docs/api/java/util/RandomAccess.html This interface can be helpful both for designing and using data structures. For design, if your structure deals with random access data, you should use something that implements RandomAccess. When using data structures, if you're dealing with very large collections of different types and speed is important, you may want to check to see if the list implements RandomAccess before going through it. Posted by: Liam Morley on June 9, 2004 10:44 PMPost a comment
|
|
|
Disclaimer: Views expressed here are my own and do not represent those of my employer.
© 2001-2005 Pankaj kumar. All Rights Reserved. |
|