Java

Set implementations

Set
A Set is a Collection that cannot contain duplicate elements. It models the mathematical set abstraction. The Set interface contains only methods inherited from Collection and adds the restriction that duplicate elements are prohibited. Set also adds a stronger contract on the behavior of the equals and hashCode operations, allowing Set instances to be compared meaningfully even if their implementation types differ. Two Set instances are equal if they contain the same elements.
The Set implementations are grouped into general-purpose and special-purpose implementations.
1. General-Purpose Set Implementations: - There are three general-purpose Set implementations — HashSet, TreeSet, and LinkedHashSet. Which of these three to use is generally straightforward. HashSet is much faster than TreeSet (constant-time versus log-time for most operations) but offers no ordering guarantees. If you need to use the operations in the SortedSet interface, or if value-ordered iteration is required, use TreeSet; otherwise, use HashSet.
LinkedHashSet is in some sense intermediate between HashSet and TreeSet. Implemented as a hash table with a linked list running through it, it provides insertion-ordered iteration (least recently inserted to most recently) and runs nearly as fast as HashSet. The LinkedHashSet implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet without incurring the increased cost associated with TreeSet.
a). HashSet: - This class implements the Set interface, backed by a hash table (actually a HashMap instance). It makes no guarantees as to the iteration order of the set; in particular, it does not guarantee that the order will remain constant over time. This class permits the null element.
This class offers constant time performance for the basic operations (add, remove, contains and size), assuming the hash function disperses the elements properly among the buckets. Iterating over this set requires time proportional to the sum of the HashSet instance's size (the number of elements) plus the "capacity" of the backing HashMap instance (the number of buckets). Thus, it's very important not to set the initial capacity too high (or the load factor too low) if iteration performance is important.
Note that this implementation is not synchronized. If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:
Set s = Collections.synchronizedSet(new HashSet(...));
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Example: -
import java.util.HashSet;
class MyHashSetToArray {
public static void main(String a[]){
HashSet hs = new HashSet();
//add elements to HashSet
hs.add("first");
hs.add("second");
hs.add("third");
System.out.println("HashSet content: ");
System.out.println(hs);
String[] strArr = new String[hs.size()];
hs.toArray(strArr);
System.out.println("Copied array content:");
for(String str:strArr){
System.out.println(str);
}
}
}
Output: -
C:\java\word>javac MyListTest1.java
C:\java\word>java MyHashSetToArray
HashSet content:
[second, third, first]
Copied array content:
second
third
first
b). TreeSet: - A NavigableSet implementation based on a TreeMap. The elements are ordered using their natural ordering, or by a Comparator provided at set creation time, depending on which constructor is used.
This implementation provides guaranteed log(n) time cost for the basic operations (add, remove and contains).
Note that the ordering maintained by a set (whether or not an explicit comparator is provided) must be consistent with equals if it is to correctly implement the Set interface. (See Comparable or Comparator for a precise definition of consistent with equals.) This is so because the Set interface is defined in terms of the equals operation, but a TreeSet instance performs all element comparisons using its compareTo (or compare) method, so two elements that are deemed equal by this method are, from the standpoint of the set, equal. The behavior of a set is well-defined even if its ordering is inconsistent with equals; it just fails to obey the general contract of the Set interface.
Note that this implementation is not synchronized. If multiple threads access a tree set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSortedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:
SortedSet s = Collections.synchronizedSortedSet(new TreeSet(...));
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw
ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Example: -
import java.util.Iterator;
import java.util.TreeSet;
class TreeSetExample {
public static void main(String[] args) {
System.out.println("Tree Set Example!\n");
TreeSet tree = new TreeSet();
tree.add(12);
tree.add(63);
tree.add(34);
tree.add(45);
// here it test it's sorted, 63 is the last element. see output below
Iterator iterator = tree.iterator();
System.out.print("Tree set data: ");
// Displaying the Tree set data
while (iterator.hasNext()) {
System.out.print(iterator.next() + " ");
}
System.out.println();
// Check empty or not
if (tree.isEmpty()) {
System.out.print("Tree Set is empty.");
} else {
System.out.println("Tree Set size: " + tree.size());
}
// Retrieve first data from tree set
System.out.println("First data: " + tree.first());
// Retrieve last data from tree set
System.out.println("Last data: " + tree.last());
if (tree.remove(45)) { // remove element by value
System.out.println("Data is removed from tree set");
} else {
System.out.println("Data doesn't exist!");
}
System.out.print("Now the tree set contain: ");
iterator = tree.iterator();
// Displaying the Tree set data
while (iterator.hasNext()) {
System.out.print(iterator.next() + " ");
}
System.out.println();
System.out.println("Now the size of tree set: " + tree.size());
// Remove all
tree.clear();
if (tree.isEmpty()) {
System.out.print("Tree Set is empty.");
} else {
System.out.println("Tree Set size: " + tree.size());
}
}
}
Output: -
C:\java\word>javac TreeSetExample.java
C:\java\word>java TreeSetExample
Tree Set Example!
Tree set data: 12 34 45 63
Tree Set size: 4
First data: 12
Last data: 63
Data is removed from tree set
Now the tree set contain: 12 34 63
Now the size of tree set: 3
Tree Set is empty.
c). LinkedHashSet: - Hash table and linked list implementation of the Set interface, with predictable iteration order. This implementation differs from HashSet in that it maintains a doubly-linked list running through all of its entries. This linked list defines the iteration ordering, which is the order in which elements were inserted into the set (insertion-order). Note that insertion order is not affected if an element is re-inserted into the set. (An element e is reinserted into a set s if s.add(e) is invoked when s.contains(e) would return true immediately prior to the invocation.)
This implementation spares its clients from the unspecified, generally chaotic ordering provided by HashSet, without incurring the increased cost associated with TreeSet. It can be used to produce a copy of a set that has the same order as the original, regardless of the original set's implementation:
void foo(Set s) {
Set copy = new LinkedHashSet(s);
... }
This technique is particularly useful if a module takes a set on input, copies it, and later returns results whose order is determined by that of the copy. (Clients generally appreciate having things returned in the same order they were presented.)
This class provides all of the optional Set operations, and permits null elements. Like HashSet, it provides constant-time performance for the basic operations (add, contains and remove), assuming the hash function disperses elements properly among the buckets. Performance is likely to be just slightly below that of HashSet, due to the added expense of maintaining the linked list, with one exception: Iteration over a LinkedHashSet requires time proportional to the size of the set, regardless of its capacity. Iteration over a HashSet is likely to be more expensive, requiring time proportional to its capacity.
A linked hash set has two parameters that affect its performance: initial capacity and load factor. They are defined precisely as for HashSet. Note, however, that the penalty for choosing an excessively high value for initial capacity is less severe for this class than for HashSet, as iteration times for this class are unaffected by capacity.
Note that this implementation is not synchronized. If multiple threads access a linked hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:
Set s = Collections.synchronizedSet(new LinkedHashSet(...));
The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the iterator will throw a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.
Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw
ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.
Example: -
import java.util.HashSet;
import java.util.LinkedHashSet;
class MyLhsAddAllEx {
public static void main(String a[]){
LinkedHashSet lhs = new LinkedHashSet();
//add elements to HashSet
lhs.add("first");
lhs.add("second");
lhs.add("third");
System.out.println(lhs);
HashSet subSet = new HashSet();
subSet.add("s1");
subSet.add("s2");
lhs.addAll(subSet);
System.out.println("LinkedHashSet content after adding another collection:");
System.out.println(lhs);
}
}
Output: -
C:\java\word>javac MyLhsAddAllEx.java
C:\java\word>java MyLhsAddAllEx
[first, second, third]
LinkedHashSet content after adding another collection:
[first, second, third, s2, s1]