A hash table supporting full concurrency of retrievals and high expected concurrency for updates. Few important points are below:
- ConcurrenycyLevel = Buket size;
-
Bucket is just array of Segment class (Segment[] segments)
- HashMap and ConcurrentHashMap replace the the old value of keys.
- Rehasing done on segment level. Each segment will rehash. In hash map it is done on overall bucket size.
- Each segment you have array transient volatile HashEntry[] table.
- Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove)
- For size and containsValue Try a few times to get accurate count. On failure due tocontinuous async changes in table, resort to locking.
* Number of unsynchronized retries in size
and containsValue
* methods before resorting to locking.
This is used to avoid
* unbounded retries if tables undergo
continuous modification
* which would make it impossible to obtain
an accurate result.
*/
static final int RETRIES_BEFORE_LOCK = 2;
After retries if sum does not
match it tires for lock segment wise.
if (check != sum) { // Resort to locking all segments
sum = 0;
for (int i = 0; i < segments.length; ++i)
segments[i].lock();
for (int i = 0; i < segments.length; ++i)
sum += segments[i].count;
for (int i = 0; i < segments.length; ++i)
segments[i].unlock();
}
1) Not allowing null key
The reason behind this implementation is due to ambiguity nature of the ConcurrentMap (ConcurrentHashMap & ConcurrentSkipListMaps) when it comes to threading environment. The main reason is that if map.get(key) returns null, we can't detect whether the key explicitly maps to null vs the key isn't mapped. In ConcurrentHashMap, It might be possible that key 'K' might be deleted in between the get(k) and containsKey(k) calls.
For example in below code of ConcurrentHashMap
Line 1: if (concurrentHashMap.containsKey(k)) {
Line 2: return concurrentHashMap.get(k);
Line 3: } else {
Line 4: throw new KeyNotPresentException();
Line 5: }
The reason behind this implementation is due to ambiguity nature of the ConcurrentMap (ConcurrentHashMap & ConcurrentSkipListMaps) when it comes to threading environment. The main reason is that if map.get(key) returns null, we can't detect whether the key explicitly maps to null vs the key isn't mapped. In ConcurrentHashMap, It might be possible that key 'K' might be deleted in between the get(k) and containsKey(k) calls.
For example in below code of ConcurrentHashMap
Line 1: if (concurrentHashMap.containsKey(k)) {
Line 2: return concurrentHashMap.get(k);
Line 3: } else {
Line 4: throw new KeyNotPresentException();
Line 5: }
2)
Not allowing null values
And also null values are not allowed in the ConcurrentMap (ConcurrentHashMap & ConcurrentSkipListMaps). If you look at the below code in ConcurrentHashMap.
Line 1. public V put(K key, V value) {
Line 2. if (value == null)
Line 3. throw new NullPointerException();
Line 4. int hash = hash(key.hashCode());
Line 5. return segmentFor(hash).put(key, hash, value, false);
Line 6. }
In line 2 & 3, throws NullPointerException by checking the value, this means, passed value could be used through the ConcurrentHashMap without expecting it to be null. I checked the usage of value in ConcurrentHashMap, there is no evidence or piece of code throwing NullPointerException. Then i started investigating the internal implementation.
So wrote a simple program with modified ConcurrentHashMap by removing those Line 2 and Line 3 in the above code and created two thread (1 and 2), one take cares of inserting null values and other with reading it. Surprisingly i did not get any error. So i decided to check performance of get and put operation by measuring the time of both modified and unmodified version of ConcurrentHashMap. The get method of modified ConcurrentHashMap takes 5-8 times more time than unmodified one and put method of modified ConcurrentHashMap takes 3-5 times more than unmodified one.
The Reason is segment of ConcurrentHashMap takes a lock if the value of HashEntry becomes null.
GET METHOD
1 get(Object key, int hash) {
2. if (count != 0) {
3. HashEntry e = getFirst(hash);
4. while (e != null) {
5. if (e.hash == hash && key.equals(e.key)) {
6. V v = e.value;
7. if (v != null)
8. return v;
9. return readValueUnderLock(e); // takes lock if the value is null
10. }
11. e = e.next;
12. }
PUT METHOD
1. V put(K key, int hash, V value, boolean onlyIfAbsent) {
2. lock();
3. try {
4. int c = count;
5. if (c++ > threshold)
6. rehash();
7. HashEntry[] tab = table;
8. int index = hash & (tab.length - 1);
9. HashEntry first = tab[index];
10. HashEntry e = first;
11. while (e != null && (e.hash != hash || !key.equals(e.key)))
12. e = e.next;
13. V oldValue;
14. if (e != null) {
15. oldValue = e.value;
16. if (!onlyIfAbsent)
17. e.value = value;
18. } else {
19. oldValue = null;
20. ++modCount;
21. tab[index] = new HashEntry(key, hash, first, value); // New value for HashEntry
22. count = c;
23. }
24. return oldValue;
25. } finally {
26. unlock();
27. }
28. }
Now the questions is, if ConcurrentHashMap is not allowing null value, then why the null check is implemented for HashEntry value in Line 7 of get method. Reason is, consider a situation that writer thread is trying to create a new HashEntry in Line 21 of put method but not yet initialized, so actual value is not reflecting in value attribute of HashEntry, in the same time reader thread reads it value as null, so this recheck is done in Line 7 in the get method.
And extra check in the Line 9 in get method is very costly in terms of performance and it is avoided
for not null value. so modified ConcurrentHashMap takes long time to read and write null values than unmodified ConcurrentHashMap with non-null values.
And also null values are not allowed in the ConcurrentMap (ConcurrentHashMap & ConcurrentSkipListMaps). If you look at the below code in ConcurrentHashMap.
Line 1. public V put(K key, V value) {
Line 2. if (value == null)
Line 3. throw new NullPointerException();
Line 4. int hash = hash(key.hashCode());
Line 5. return segmentFor(hash).put(key, hash, value, false);
Line 6. }
In line 2 & 3, throws NullPointerException by checking the value, this means, passed value could be used through the ConcurrentHashMap without expecting it to be null. I checked the usage of value in ConcurrentHashMap, there is no evidence or piece of code throwing NullPointerException. Then i started investigating the internal implementation.
So wrote a simple program with modified ConcurrentHashMap by removing those Line 2 and Line 3 in the above code and created two thread (1 and 2), one take cares of inserting null values and other with reading it. Surprisingly i did not get any error. So i decided to check performance of get and put operation by measuring the time of both modified and unmodified version of ConcurrentHashMap. The get method of modified ConcurrentHashMap takes 5-8 times more time than unmodified one and put method of modified ConcurrentHashMap takes 3-5 times more than unmodified one.
The Reason is segment of ConcurrentHashMap takes a lock if the value of HashEntry becomes null.
GET METHOD
1 get(Object key, int hash) {
2. if (count != 0) {
3. HashEntry e = getFirst(hash);
4. while (e != null) {
5. if (e.hash == hash && key.equals(e.key)) {
6. V v = e.value;
7. if (v != null)
8. return v;
9. return readValueUnderLock(e); // takes lock if the value is null
10. }
11. e = e.next;
12. }
PUT METHOD
1. V put(K key, int hash, V value, boolean onlyIfAbsent) {
2. lock();
3. try {
4. int c = count;
5. if (c++ > threshold)
6. rehash();
7. HashEntry[] tab = table;
8. int index = hash & (tab.length - 1);
9. HashEntry first = tab[index];
10. HashEntry e = first;
11. while (e != null && (e.hash != hash || !key.equals(e.key)))
12. e = e.next;
13. V oldValue;
14. if (e != null) {
15. oldValue = e.value;
16. if (!onlyIfAbsent)
17. e.value = value;
18. } else {
19. oldValue = null;
20. ++modCount;
21. tab[index] = new HashEntry(key, hash, first, value); // New value for HashEntry
22. count = c;
23. }
24. return oldValue;
25. } finally {
26. unlock();
27. }
28. }
Now the questions is, if ConcurrentHashMap is not allowing null value, then why the null check is implemented for HashEntry value in Line 7 of get method. Reason is, consider a situation that writer thread is trying to create a new HashEntry in Line 21 of put method but not yet initialized, so actual value is not reflecting in value attribute of HashEntry, in the same time reader thread reads it value as null, so this recheck is done in Line 7 in the get method.
And extra check in the Line 9 in get method is very costly in terms of performance and it is avoided
for not null value. so modified ConcurrentHashMap takes long time to read and write null values than unmodified ConcurrentHashMap with non-null values.
Kindly provide your valuable feedback in comment box.
No comments:
Post a Comment