Saturday, April 18, 2026

HashMap – How It Really Works Internally

 

■■ HashMap – How It Really Works Internally

Senior Java Engineer Interview Deep-Dive Guide

Hashing · Buckets · Collisions · Resize · TreeBin · ConcurrentHashMap · Java 8–21

Prepared for Ashok Koritala – Senior Java Full Stack Developer (15+ yrs)
Java Version Java 7 Java 21 (all critical changes covered)
Questions 35 expert-level Q&A; across 7 topic clusters
ScopeHashMap internals · hashCode/equals contract · collision handling · resize/rehash · TreeBin (Java 8) · LinkedHashMap · WeakHashMap · ConcurrentHashMap · real-world gotchas

Table of Contents

§1 HashMap Data Structure & Internal Fields Q1–Q5

§2 Hashing – hashCode(), hash(), & Bucket Index Q6–Q10

§3 Collision Handling – LinkedList & TreeBin (Java 8) Q11–Q15

§4 Resize, Rehash & Load Factor Q16–Q20

§5 put(), get(), remove() – Step-by-Step Internals Q21–Q25

§6 ConcurrentHashMap – Thread-Safe Internals Q26–Q30

§7 Variants, Gotchas & Real-World Patterns Q31–Q35

§1 HashMap Data Structure & Internal Fields

Q1. What is the core data structure of HashMap and what are its key internal fields?

HashMap is backed by an array of Node (called table). Each array slot is a bucket. When multiple keys hash to the same bucket, they form a linked list (or a red-black tree in Java 8+) within that bucket.

// Simplified view of HashMap source (Java 8+): public class HashMap extends AbstractMap {

// The bucket array – length is ALWAYS a power of 2 transient Node[] table;

// Number of key-value mappings transient int size;

// Structural modifications count (used by iterators for fail-fast) transient int modCount;

// Next resize threshold = capacity * loadFactor int threshold;

// Default 0.75 – balances space vs time final float loadFactor;

// TREEIFY_THRESHOLD = 8 (linked list -> red-black tree) // UNTREEIFY_THRESHOLD = 6 (tree -> linked list on shrink) // MIN_TREEIFY_CAPACITY = 64 (table must be >= 64 to treeify) }

Node – the building block:

static class Node implements Map.Entry {

final int hash; // cached hash of key (avoids recomputing on resize) final K key;

V value;

Node next; // linked list pointer (null if no collision)

}

Key Insight: The table array is lazily initialised – it is null until the first put(). Default initial capacity = 16; default load

factor = 0.75.

Q2. Why must the table capacity always be a power of 2?

The bucket index formula is:

index = (n - 1) & hash // where n = table.length (power of 2)

When n is a power of 2, (n-1) is a bitmask of all 1s in the low bits (e.g., 16-1 = 15 = 0b00001111). ANDing the hash with this mask is equivalent to hash % n but 5–10x faster (bit operation vs division). It also guarantees the index falls in [0, n-1].

n = 16 n-1 = 15 = 0000 1111 hash = ???? ???? (any 32-bit int)index = hash & 15 = last 4 bits only always 0..15e.g. hash = 0b10110101_11001010_00101101_10110111 mask = 0000 1111 index = 0000 0111 = 7
Interview Tip: If an interviewer asks 'why not use hash % n?', answer: (n-1) & hash is identical when n is power-of-2but is a single CPU instruction vs an expensive integer division.
Q3. What are the default capacity, load factor, and resize threshold? Why 0.75?

Default initial capacity: 16 (always power of 2).

Default load factor: 0.75.

Initial threshold: 16 × 0.75 = 12. Map resizes when size exceeds 12.

Why 0.75? It is a statistical sweet spot from Poisson distribution analysis. At load factor 0.75, the expected

number of entries in a single bucket follows a Poisson distribution with parameter ~0.5 – the probability of a bucket

having 8+ entries (treeify threshold) is approximately 0.00000006. It balances memory usage vs collision

probability.

// Choosing initial capacity to avoid resize: // If you know you'll store N entries:

int n = 1000; Map map = new HashMap<>(n * 4 / 3 + 1); // capacity = ceil(n / 0.75) + 1 // Or simply: new HashMap<>((int)(n / 0.75) + 1) // Guava helper:

Maps.newHashMapWithExpectedSize(1000); // does the math for you

■■ Watch Out: Creating a HashMap without specifying capacity when you know the size causes unnecessary resize operations. Each resize is O(n) and copies all entries.

Q4. What is TREEIFY_THRESHOLD and why was it introduced in Java 8?

TREEIFY_THRESHOLD = 8: when a single bucket's linked list grows beyond 8 nodes, it is converted to a red-black tree (TreeNode). This limits worst-case lookup from O(n) to O(log n).

Why it was needed: in Java 7 and earlier, an adversary who knew the JVM's hashCode algorithm could craft keys that all hash to the same bucket, causing HashMap operations to degrade to O(n) – a Hash DoS (Denial of Service) attack. Java 8's TreeBin limits the damage to O(log n) even under worst-case hashing.

UNTREEIFY_THRESHOLD = 6: when a bucket shrinks (due to removes or resize), if its tree has 6 or fewer

nodes, it converts back to a linked list. Hysteresis (8 to treeify, 6 to untreeify) prevents thrashing.

MIN_TREEIFY_CAPACITY = 64: treeification only happens if the total table size >= 64. Below 64, HashMap

resizes instead of treeifying (resize redistributes entries, usually fixing long chains).

Key Insight: TreeNode extends LinkedHashMap.Entry which extends HashMap.Node. So a TreeBin can be

'untreeified' back to a plain Node linked list without allocating new objects.

Q5. Draw the full internal structure of HashMap showing buckets, linked lists, and TreeBins.

Full internal layout of HashMap (Java 8+):

HashMap.table (Node[] – capacity = 16 default) ■■■■■■■■ 0 ■ ■■■ null■■■■■■■■ 1 ■ ■■■ Node{hash,key='A',val=1,next} ■■■ Node{hash,key='Q',val=9,next=null}■■■■■■■ (linked list – 2 colliding keys)2 ■ ■■■ null■■■■■■■■ 3 ■ ■■■ TreeNode (root of red-black tree – 8+ collisions)■ ■ / \■ ■ TreeNode TreeNode■ ■ / \■ ■ TreeNode TreeNode■■■■■■■■ 4 ■ ■■■ Node{hash,key='B',val=2,next=null} (single entry)■■■■■■■■ ... ■■■■■■■■■ 15 ■ ■■■ null■■■■■■■Fields: size=5, threshold=12, loadFactor=0.75, modCount=5
Interview Tip: Being able to draw this diagram in an interview immediately shows deep understanding. Label the Nodefields (hash, key, value, next) and explain when TreeNode kicks in.

§2 Hashing – hashCode(), hash() & Bucket Index

Q6. Walk through the exact steps from key.hashCode() to bucket index.

Three steps happen inside every put() and get():

// Step 1: get raw hashCode from the key object int rawHash = key.hashCode(); // e.g., String.hashCode()

// Step 2: HashMap applies a secondary 'spread' hash (perturbHashCode in Java 7, // static hash() in Java 8):

static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }

// XOR upper 16 bits into lower 16 bits.

// Purpose: spread high-bit differences into low bits, // reducing collisions when table is small.

// Step 3: compute bucket index

int index = (table.length - 1) & hash(key);

// e.g., table.length=16
(16-1) & hash = last 4 bits of hash

rawHash = 1111 1111 0000 1111 1010 1010 0011 0011 h>>>16 = 0000 0000 0000 0000 1111 1111 0000 1111XOR = 1111 1111 0000 1111 0101 0101 0011 1100 spread hashn-1 = 0000 1111 (n=16)index = 0000 1100 = 12
Key Insight: The h ^ (h >>> 16) spread is critical. Without it, keys whose hashCodes differ only in high bits would allland in the same bucket when the table is small.

Q7. How does String.hashCode() work? What is its weakness?

// String.hashCode() – Horner's method: public int hashCode() {

int h = hash; // cached (String is immutable)

if (h == 0 && value.length > 0) { for (char c : value) { h = 31 * h + c; } hash = h; } return h; }

// hash('ABC') = 31^2*'A' + 31*'B' + 'C'

// = 31^2*65 + 31*66 + 67

// = 64352 + 2046 + 67 = 66465

• Multiplier 31 is chosen because it is prime and 31*x = (x<<5) - x (JIT can optimise to a single shift+subtract). • Cached: String caches its hash in a field – computed only once (lazy init). Safe because String is immutable. • Weakness: anagrams have the same hash. 'ab' and 'ba' have the same hashCode only if the characters sum identically – they don't with 31^k weighting, but crafted collisions ('Aa' and 'BB' have the same hashCode!) are possible.

Key Insight: 'Aa'.hashCode() == 'BB'.hashCode() == 2112. This is a famous Java gotcha. Always pair equals() with

hashCode() to handle collisions correctly.

Q8. What is the hashCode / equals contract and what breaks HashMap if violated?

The Java API contract (Object.hashCode() javadoc) requires:

Rule 1 – Consistency: hashCode() returns the same value on multiple calls within a single execution (no

external state change).

Rule 2 – Equality implies same hash: if a.equals(b) is true, then a.hashCode() == b.hashCode() MUST be true.

Rule 3 (recommended, not required): if a.equals(b) is false, a.hashCode() != b.hashCode() is desirable

(reduces collisions).

// BROKEN: hashCode uses mutable field

class BadKey { int id; public int hashCode() { return id; } // mutable! public boolean equals(Object o) { return ((BadKey)o).id == id; } } Map map = new HashMap<>();

BadKey k = new BadKey(); k.id = 1;

map.put(k, 'hello');

k.id = 2; // mutate key AFTER insertion

map.get(k); // returns NULL! (looks in wrong bucket)

■■ Watch Out: Never use mutable fields in hashCode()/equals() for keys stored in HashMap. The key's hash determines its bucket. If the hash changes, the key is 'lost'.

Q9. How does HashMap handle null keys?

HashMap permits exactly ONE null key. Null keys are handled specially:

// From HashMap.hash(): static final int hash(Object key) { int h; return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16); }

// null always hashes to 0
always goes into bucket[0]

• Null key is always stored in bucket index 0.

• Multiple puts with null key simply overwrite the value (only one null key allowed).

• Hashtable and ConcurrentHashMap do NOT allow null keys – they throw NullPointerException.

Map Type Null Key Null Value Thread-Safe
HashMap 1 allowed (bucket 0) Multiple allowed No
LinkedHashMap 1 allowed Multiple allowed No
TreeMap No (NullPointerException) Multiple allowed No
Hashtable No No Yes (legacy)
ConcurrentHashMap No No Yes
WeakHashMap 1 allowed Multiple allowed No
Q10. What happens when two different keys produce the same bucket index (collision)?

A collision occurs when hash(key1) & (n-1) == hash(key2) & (n-1). Note: the keys may have different hashCodes but still map to the same bucket (because we only use the low bits). Or they may have the same hashCode entirely. Java 8 collision resolution has two phases:

Phase 1 – Linked List (0–7 nodes): new entries are appended. Lookup requires scanning the chain, comparing

hash and equals().

Phase 2 – Red-Black Tree (8+ nodes + table >= 64): bucket is treeified. O(log n) lookup using key's natural

order or identity hash.

Key Insight: Two-level approach (list then tree) means the common case (few collisions) has minimal overhead, while

adversarial hash attacks are bounded.

§3 Collision Handling – LinkedList & TreeBin (Java 8)

Q11. Trace the exact steps of put(key, value) when a collision occurs in the linked list.

map.put('Aa', 100); // hashCode('Aa') == 2112

map.put('BB', 200); // hashCode('BB') == 2112
same hashCode!

// Inside putVal():

// 1. Compute hash: hash('BB') = 2112 ^ (2112 >>> 16) = 2112 ^ 0 = 2112 // 2. index = (16-1) & 2112 = 15 & 2112 = 0 (2112 % 16 = 0)

// 3. table[0] is not null
check existing node:

// node.hash == 2112 AND node.key.equals('BB')?
'Aa'.equals('BB') = FALSE // 4. Traverse next: next == null append new Node at tail

// table[0]
Node('Aa',100) Node('BB',200) null // get('BB'): // 1. hash = 2112, index = 0

// 2. table[0].hash==2112 && 'Aa'.equals('BB')? NO
traverse next

// 3. next.hash==2112 && 'BB'.equals('BB')? YES
return 200

Key lookup always checks: node.hash == hash && (node.key == key || key.equals(node.key)). The hash check is a fast pre-filter before the potentially expensive equals() call.

Interview Tip: In an interview, trace this step by step. Say 'first we check hash equality as a fast filter, then equals() for

correctness'. This shows you understand why hashCode matters for performance.

Q12. Explain the Java 7 to Java 8 change in collision handling (linked list direction).

Java 7: new entries inserted at the head of the linked list (O(1) insertion). Assumption: recently inserted entries

are more likely to be accessed.

Java 8: new entries inserted at the tail of the linked list. Why? Head insertion in Java 7 caused an infinite loop

under concurrent resize (the list could form a cycle). Java 8 fixes this with tail insertion.

• Java 8 also introduced TreeBin, modCount tracking improvements, and optimised resize.

■■ Watch Out: Java 7 HashMap has a known bug: concurrent put() calls during resize can create a circular linked list, causing get() to loop infinitely. Never use HashMap in multithreaded code. This was the primary driver for Java 8's tail-insertion fix.

Q13. How does treeification work? What is a red-black tree and why use it?

When a bucket's chain reaches TREEIFY_THRESHOLD (8) and the table is >= MIN_TREEIFY_CAPACITY (64), the linked list is converted to a red-black tree:

// Inside treeifyBin():

// 1. Replace each Node with a TreeNode (wrapping same key/value/hash/next)

// 2. Call TreeNode.treeify() to build the red-black tree structure // - TreeNode extends LinkedHashMap.Entry extends HashMap.Node // - Each TreeNode has: parent, left, right, red (boolean)

// Red-black tree properties: // a) Every node is RED or BLACK // b) Root is BLACK

// c) Red nodes have BLACK children (no two consecutive reds) // d) Every path from root to null has same number of BLACK nodes //
Guarantees height <= 2*log2(n) O(log n) operations

• TreeNode comparison order: first by hash, then by Comparable (if keys implement it), then by

System.identityHashCode() as a tiebreaker.

• get/put/remove on a TreeBin = O(log n) instead of O(n) for plain linked list.

• TreeBin is 'untreeified' back to linked list when size drops to UNTREEIFY_THRESHOLD (6).

Key Insight: The TreeBin in HashMap is NOT a full standalone tree. It still maintains the doubly-linked list pointers

(prev/next) for iteration order and for easy conversion back to a list.

Q14. When does treeification NOT happen even if a bucket has 8+ entries?

• If table.length < MIN_TREEIFY_CAPACITY (64): HashMap resizes the table instead of treeifying. Resizing

redistributes entries and usually breaks up long chains (no treeification needed).

• If the key type does not implement Comparable, tree comparisons fall back to System.identityHashCode() – this

works but is less efficient.

• Treeification is triggered per-bucket on each put() call – there is no background thread.

// treeifyBin decision logic:

final void treeifyBin(Node[] tab, int hash) {

int n;

if (tab == null || (n = tab.length) < MIN_TREEIFY_CAPACITY) { resize(); // grow table instead of treeifying

} else if (...) {

// Actually treeify this bucket

} }

Interview Tip: This is a common interview trap: 'When does a HashMap bucket become a tree?' Correct answer: 8+

entries AND table.length >= 64. Both conditions must hold.

Q15. How does remove() work with TreeBins?

remove(key) follows the same lookup path (hash bucket chain/tree). In a TreeBin, it performs a red-black tree deletion (O(log n)) and then rebalances.

• If after deletion the tree has <= UNTREEIFY_THRESHOLD (6) nodes, untreeifyBin() converts it back to a plain

linked list.

• The TreeNode still has its next/prev pointers intact, so untreeification just re-links the nodes as a plain Node

chain.

• Each remove() increments modCount
triggers ConcurrentModificationException in iterators (fail-fast

behaviour).

Key Insight: HashMap's fail-fast iterators check modCount on every next() call. If modCount changed since the iterator was created, it throws ConcurrentModificationException. This is a best-effort check – not a guaranteed concurrency guard.

§4 Resize, Rehash & Load Factor

Q16. Describe the resize (rehash) process in detail. What happens step by step?

Resize is triggered when size > threshold (capacity * loadFactor). Steps:

1. Allocate new table: new capacity = old capacity × 2 (always power-of-2 doubling). New threshold = new

capacity × loadFactor.

2. Transfer all entries: iterate every bucket in the old table and redistribute entries into the new table. This is the

expensive O(n) step.

3. Recompute bucket indices: with new capacity n2 = 2*n1, the new index formula is (n2-1) & hash. Only ONE

extra bit of hash is examined. Each entry goes to either the same index OR same index + n1 (old capacity).

// Java 8 resize insight – NO full rehash needed: // Old capacity n = 16 mask = 0b00001111 // New capacity n = 32 mask = 0b00011111 (one more bit)

// For each entry with hash h:

// If bit 4 of h is 0: new_index = old_index (stays in place) // If bit 4 of h is 1: new_index = old_index + 16 (moves up by n)

// Example: old table size=16, entry at index 5

// hash = 0b...0_0101
bit4=0 new index = 5 (lo list) // hash = 0b...1_0101 bit4=1 new index = 5+16=21 (hi list)

Before resize (n=16): After resize (n=32): bucket[5] A B C bucket[5] A C (bit4=0) bucket[21] B (bit4=1)
Key Insight: Java 8 avoids recomputing hashCode during resize by using the cached hash in each Node and a singlebit-check. This makes resize significantly faster than Java 7.

Q17. What is the exact formula to pre-size a HashMap to avoid resize?

// Formula: initialCapacity = ceil(expectedSize / loadFactor) + 1 // With default loadFactor = 0.75:

int expectedEntries = 1000;

int initialCapacity = (int)(expectedEntries / 0.75) + 1; // = 1334

// HashMap constructor rounds UP to next power of 2: tableSizeFor(1334) = 2048

Map map = new HashMap<>(initialCapacity); // No resize will occur when inserting 1000 entries

// tableSizeFor() – finds next power of 2 >= cap:

static final int tableSizeFor(int cap) { int n = cap - 1; n |= n >>> 1; n |= n >>> 2; n |= n >>> 4; n |= n >>> 8; n |= n >>> 16; return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1; }

■■ Watch Out: new HashMap<>(1000) does NOT prevent resize at 1000 entries. The table is sized 1024 (next power of 2 >= 1000), threshold = 1024 * 0.75 = 768. You'll resize before 1000 entries! Use the formula above.

Q18. What is the maximum capacity of HashMap?

MAXIMUM_CAPACITY = 1 << 30 = 1,073,741,824 (~1 billion buckets).

• If you try to resize beyond this, the threshold is set to Integer.MAX_VALUE (effectively disabling further resizes).

• Each bucket is an object reference (8 bytes on 64-bit JVM with compressed OOPs). 1 billion × 8 bytes = ~8 GB

just for the bucket array – not practical.

• In practice, HashMap with > 10 million entries should be reconsidered (consider off-heap stores, Caffeine, or

sharded maps).

Q19. How does the load factor affect HashMap performance?

Load factor is the ratio at which the map resizes: resize when size/capacity > loadFactor.

Load Factor Memory Use Collision Rate Lookup Speed Resize FrequencyBest For
0.25 Very high (many empty buckets)Very low O(1)near-perfectVery frequent Ultra low-latency lookups
0.5 High Low Very good Frequent Low-latency apps
0.75 Balanced (default)Moderate Good Normal General purpose
1.0 Low (fewer resizes)High Degrades Infrequent Memory-constrained
>1.0 Minimal Very high Poor (O(n) risk) Rarely Avoid in production
Interview Tip: For read-heavy, performance-critical maps, consider load factor 0.5 or even 0.25 with a pre-computedinitial capacity. The extra memory cost is often worth it.
Q20. What happens to a TreeBin during resize?

During resize, each bucket is split into two (lo and hi). For a TreeBin:

• The tree is split along the same bit-check (bit at position oldCapacity).

• Each half gets its node count evaluated.

• If a half has <= UNTREEIFY_THRESHOLD (6) nodes
converted back to linked list.

• If a half still has >= TREEIFY_THRESHOLD (8) nodes
remains a TreeBin (re-treeified).

• Typical result: long chains that caused treeification usually split evenly and become linked lists again.

Key Insight: This is why MIN_TREEIFY_CAPACITY=64 exists: below 64, resize is more effective than treeification. A

resize at capacity 16
32 typically splits an 8-node chain into two 4-node chains.

§5 put(), get(), remove() – Complete Step-by-Step

Q21. Trace every step of HashMap.put(key, value) from start to finish.

// HashMap.put() delegates to putVal():

final V putVal(int hash, K key, V value, boolean onlyIfAbsent, boolean evict) { Node[] tab; Node p; int n, i;

// STEP 1: Lazy init – allocate table on first put if ((tab = table) == null || (n = tab.length) == 0) n = (tab = resize()).length;

// STEP 2: Compute bucket index; check if bucket is empty if ((p = tab[i = (n - 1) & hash]) == null)

tab[i] = newNode(hash, key, value, null); // no collision, done else { Node e; K k;

// STEP 3a: Does first node match exactly?

if (p.hash == hash && ((k = p.key) == key || (key != null && key.equals(k)))) e = p; // found existing entry to update

// STEP 3b: TreeBin? Delegate to tree put else if (p instanceof TreeNode)

e = ((TreeNode)p).putTreeVal(this, tab, hash, key, value);

// STEP 3c: Walk linked list

else { for (int binCount = 0; ; ++binCount) { if ((e = p.next) == null) { p.next = newNode(hash, key, value, null); // tail insert if (binCount >= TREEIFY_THRESHOLD - 1) // 7 treeifyBin(tab, hash); // treeify if >= 8 nodes break; } if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k))))

break; // found existing key p = e;

}

}

// STEP 4: Update existing value

if (e != null) { V oldValue = e.value; if (!onlyIfAbsent || oldValue == null) e.value = value; return oldValue;

} }

// STEP 5: Increment modCount; check resize threshold ++modCount;

if (++size > threshold) resize(); return null; }

Interview Tip: Recite these 5 steps in order: (1) lazy init, (2) bucket index, (3) collision handling (empty/TreeBin/list),

(4) value update, (5) resize check. Interviewers love this walk-through.

Q22. Trace every step of HashMap.get(key).

public V get(Object key) { Node e; return (e = getNode(hash(key), key)) == null ? null : e.value; }

final Node getNode(int hash, Object key) { Node[] tab; Node first, e; int n; K k;

// STEP 1: table must exist and bucket must be non-null if ((tab = table) != null && (n = tab.length) > 0 && (first = tab[(n - 1) & hash]) != null) {

// STEP 2: Check first node (most common case – O(1))

if (first.hash == hash && ((k = first.key) == key || (key != null && key.equals(k)))) return first;

// STEP 3: Collision – check rest of chain

if ((e = first.next) != null) { if (first instanceof TreeNode) // TreeBin: O(log n) return ((TreeNode)first).getTreeNode(hash, key); do { // Linked list: O(k) if (e.hash == hash && ((k = e.key) == key || (key != null && key.equals(k)))) return e; } while ((e = e.next) != null);

} }

return null; }

• Best case: key found at bucket head O(1).

• Collision (linked list of k entries): O(k).

• Collision (TreeBin): O(log k).

• Amortised average: O(1) with good hash distribution.

Q23. What is the time complexity of HashMap operations and when do they degrade?

Operation Best Case Average Worst (Java 7) Worst (Java 8+)
put() O(1) O(1) O(n) – all keys same bucket O(log n) – TreeBin
get() O(1) O(1) O(n) O(log n)
remove() O(1) O(1) O(n) O(log n)
resize() O(n) O(n) O(n)
containsKey() O(1) O(1) O(n) O(log n)
keySet() iteration O(n+buckets )O(n+buckets )O(n+buckets) O(n+buckets)
Key Insight: O(n+buckets) for iteration is why iterating a HashMap with capacity=1,000,000 but size=10 visits1,000,010 slots. Use HashMap(16) when you know the map will stay small.
Q24. What does computeIfAbsent() do internally and why is it preferred over get/put?

// Old pattern – two lookups (get + put): List list = map.get(key);

if (list == null) { list = new ArrayList<>();

map.put(key, list); // second hash computation + bucket lookup! }

list.add(value);

// Preferred – single lookup:

map.computeIfAbsent(key, k -> new ArrayList<>()).add(value); // computeIfAbsent() internals:

// 1. Compute hash once // 2. Find bucket

// 3. If absent: call mappingFunction, insert result, return it // 4. If present: return existing value

// Only ONE bucket lookup, ONE hash computation

• Also useful: merge(), compute(), getOrDefault(), putIfAbsent(), replaceAll().

• These atomic operations reduce the chance of race conditions in single-threaded code and simplify logic.

Interview Tip: In an interview, mention compute* methods as proof you write idiomatic Java 8+ code. They also matter

for ConcurrentHashMap where they provide atomic compound operations.

Q25. What is the difference between putIfAbsent() and computeIfAbsent()?

putIfAbsent(key, value): the value is already computed before the call. Even if the key exists, you've already

paid the cost of creating the value object. Returns old value if key exists.

computeIfAbsent(key, fn): the mapping function is only called if the key is absent. Lazy – no wasted object

creation. Returns the mapped value (new or existing).

// BAD: creates new ArrayList even if key exists map.putIfAbsent('key', new ArrayList<>());

// GOOD: ArrayList created only if 'key' is absent map.computeIfAbsent('key', k -> new ArrayList<>());

Key Insight: This distinction matters greatly when the value creation is expensive (e.g., loading from DB, network call).

computeIfAbsent() is almost always preferred.

§6 ConcurrentHashMap – Thread-Safe Internals

Q26. How does ConcurrentHashMap achieve thread safety without locking the whole map?

ConcurrentHashMap (CHM) in Java 8 uses a completely different approach from Java 7's segment locking:

Java 7: 16 Segments (each a ReentrantLock). Only 16 threads can write concurrently.

Java 8+: node-level CAS + synchronized on individual bucket heads. Concurrency level = number of

non-empty buckets (up to millions concurrent writers).

// Java 8 ConcurrentHashMap put() – simplified:

final V putVal(K key, V value, boolean onlyIfAbsent) { // 1. Compute hash (spread same as HashMap)

int hash = spread(key.hashCode()); for (Node[] tab = table;;) {

// 2. If bucket is EMPTY: use CAS (lock-free insert) if (tabAt(tab, i) == null) {

if (casTabAt(tab, i, null, newNode)) break; // atomic compare-and-set

// CAS failed? another thread inserted
retry loop }

// 3. If RESIZE in progress: help with transfer else if ((fh = f.hash) == MOVED)

tab = helpTransfer(tab, f);

// 4. If bucket NON-EMPTY: lock only this bucket head else {

synchronized (f) { // f = first node in bucket // ... insert into linked list or tree ...

} } } }

Key Insight: CAS for empty buckets means zero contention when different threads write to different buckets.

synchronized only occurs when two threads hit the same non-empty bucket.

Q27. How does ConcurrentHashMap count its size accurately?

CHM uses a LongAdder-style striped counter instead of a single AtomicLong. This eliminates CAS contention on the counter under high write concurrency:

baseCount: main count field. CAS'd when there's no contention.

counterCells[]: array of counter cells (like LongAdder). Each thread CAS's a different cell.

size(): sums baseCount + all counterCells. Not guaranteed to be perfectly accurate (snapshot) because other

threads may be inserting concurrently.

mappingCount(): returns long (size() returns int, capped at Integer.MAX_VALUE).

■■ Watch Out: ConcurrentHashMap.size() is NOT an atomic snapshot. By the time it returns, the map may have changed. Use it for monitoring, not control flow.

Q28. What are the happens-before guarantees in ConcurrentHashMap?

CHM provides strong visibility guarantees by using volatile reads/writes on bucket entries:

• Writes to bucket heads use Unsafe.putObjectVolatile() (effectively a volatile write).

• Reads from bucket heads use Unsafe.getObjectVolatile() (volatile read).

• Therefore: a put() by thread A happens-before a subsequent get() by thread B for the same key – the j.u.c.

contract guarantees this.

• Operations using compute*, merge(), forEach() are atomic at the entry level but NOT across multiple keys – for

multi-key atomicity, use computeIfAbsent + external sync.

Key Insight: This is why CHM is safe without additional volatile declarations or synchronized blocks for standard

get/put. The internal volatile semantics handle it.

Q29. Compare HashMap vs ConcurrentHashMap vs Hashtable vs synchronizedMap.

Feature HashMap ConcurrentHashMap Hashtable Collections.synchronizedMap
Thread-safe No Yes Yes Yes
Null key 1 allowed No No Depends on delegate
Null value Yes No No Depends on delegate
Locking None CAS + bucket lock Full lock Full lock on wrapper
Iteration Fail-fast Weakly consistent Fail-fast Fail-fast (manual sync)
Java version 1.2 1.5 1.0 (legacy) 1.2
Performance Best ST Best MT Poor Poor
size() accuracy Exact Approximate Exact Exact (locked)
■■ Watch Out: Never use Hashtable or Collections.synchronizedMap() in new code. ConcurrentHashMap is the correct choice for thread-safe maps.
Q30. How does ConcurrentHashMap resize (transfer) work?

CHM resize is cooperative/concurrent – multiple threads can help with the transfer:

• When resize is triggered, a new table is allocated and
ForwardingNode sentinels are placed in processed

buckets of the old table.

• Any thread that encounters a ForwardingNode during put/get/remove calls helpTransfer() and assists with

moving remaining buckets.

• Bucket migration: each bucket is locked (synchronized on old bucket head), split into lo/hi lists (same logic as

HashMap), and inserted into new table.

• get() during resize: if it hits a ForwardingNode, it searches the new table directly.

• Resize concurrency controlled by sizeCtl field (negative = resizing, positive = threshold).

Key Insight: This cooperative resize is why CHM remains usable during resize – reads are never blocked, and writes

help the resize complete faster.

§7 Map Variants, Real-World Gotchas & Patterns

Q31. How does LinkedHashMap maintain insertion/access order?

LinkedHashMap extends HashMap and adds a doubly-linked list through all entries in insertion or access order:

// LinkedHashMap.Entry adds two extra pointers: static class Entry extends HashMap.Node {

Entry before, after; // linked list for iteration order } // Insertion order (default): Map lhm = new LinkedHashMap<>(); lhm.put('C', 3); lhm.put('A', 1); lhm.put('B', 2);

lhm.keySet(); // [C, A, B] – insertion order preserved

// Access order (LRU cache pattern):

Map lru = new LinkedHashMap(16, 0.75f, true) { protected boolean removeEldestEntry(Map.Entry eldest) { return size() > MAX_ENTRIES; // evict LRU entry

} };

Key Insight: LinkedHashMap with accessOrder=true is the canonical Java LRU cache. For thread-safe LRU, wrap with

Collections.synchronizedMap() or use Caffeine/Guava Cache.

Q32. How does WeakHashMap work and what is it used for?

WeakHashMap stores keys as WeakReferences. When a key has no other strong references, the GC can collect it. The entry is then eligible for removal from the map.

Map cache = new WeakHashMap<>(); MyObject key = new MyObject(); cache.put(key, 'data');

key = null; // drop strong reference System.gc(); // hint to GC

// After GC, cache entry may be removed automatically

// How it works internally:

// WeakHashMap uses Reference.ReferenceQueue.

// When key is GC'd, its WeakReference is enqueued. // expungeStaleEntries() (called on every get/put/size) drains the queue // and removes the stale entries.

• Use case: caches where entries should not prevent GC of their keys.

• ClassLoader
Class metadata caches (internal JVM use).

WARNING: keys MUST be objects (not Strings from literal pool, not cached Integer values) – interned/cached

objects always have strong references, so they never get collected.

■■ Watch Out: Do not use WeakHashMap with String literals or autoboxed Integer values as keys. These are always strongly referenced by the JVM, so entries will NEVER be evicted.

Q33. What is EnumMap and why is it faster than HashMap for enum keys?

enum Day { MON, TUE, WED, THU, FRI, SAT, SUN } // EnumMap: array-backed, not hash-backed Map schedule = new EnumMap<>(Day.class); schedule.put(Day.MON, 'Standup'); // Internally: Object[] vals = new Object[Day.values().length] // vals[Day.MON.ordinal()] = 'Standup'

// get(Day.MON)
vals[0] (direct array access, O(1), no hashing)

• No hashing, no collision, no null bucket scanning – pure array indexing.

• Iteration is always in enum declaration order.

• Cannot have null keys. Very memory-efficient (array = 7 slots for Day enum).

• 10–20% faster than HashMap for small enum key sets.

Interview Tip: Always use EnumMap when keys are enums. It's a quick win in banking/fintech apps where you map

status codes, transaction types, or day-of-week to handlers.

Q34. What are common HashMap performance pitfalls in production?

Poor hashCode() distribution: all keys hash to same few buckets O(n) operations. Test with large data sets

and monitor bucket distribution.

Missing initial capacity: multiple resize operations during startup. Profile with -XX:+PrintGCDetails or JFR to

spot allocation spikes.

Mutable keys: changing a key's state after insertion makes the entry unreachable. Use immutable key classes

(String, UUID, Long, record types).

equals()/hashCode() inconsistency: violating the contract causes silent lookup failures.

Large maps not sized: a 10M-entry map starting at capacity 16 will resize log2(10M/12) 20 times, each O(n).

Always pre-size.

Iterating while modifying: ConcurrentModificationException. Use iterator.remove() or collect keys to remove in

a separate list.

containsValue(): O(n) always – scans entire table. Cache reverse mapping if needed.

// Safe removal during iteration: Iterator> it = map.entrySet().iterator();

while (it.hasNext()) { Map.Entry entry = it.next(); if (shouldRemove(entry)) it.remove(); // safe! } // Java 8+ alternative: map.entrySet().removeIf(e -> shouldRemove(e));

Q35. How would you implement a thread-safe cache with expiry using Java built-ins?

For production, use Caffeine or Guava Cache. But the interview may ask you to implement one:

// Simple TTL cache using ConcurrentHashMap:

public class TtlCache { private record CacheEntry(V value, long expiry) {} private final ConcurrentHashMap> store

= new ConcurrentHashMap<>(); private final long ttlMs;

public TtlCache(long ttlMs) { this.ttlMs = ttlMs; } public void put(K key, V value) { store.put(key, new CacheEntry<>(value, System.currentTimeMillis() + ttlMs)); } public Optional get(K key) { CacheEntry e = store.get(key); if (e == null) return Optional.empty(); if (System.currentTimeMillis() > e.expiry()) {

store.remove(key, e); // atomic remove (only if still this entry) return Optional.empty();

} return Optional.of(e.value()); }

// Background cleanup: ScheduledExecutorService to purge expired entries }

Key Insight: In production, prefer Caffeine: Cache cache = Caffeine.newBuilder() .expireAfterWrite(10,

MINUTES).maximumSize(10_000).build(); It uses a Window TinyLFU eviction policy that outperforms LRU.

HashMap Quick-Reference Cheat Sheet

Default capacity / load factor 16 / 0.75 threshold = 12
Bucket index formula (n - 1) & hash(key) [where n = power of 2]
hash() function h ^ (h >>> 16) – XOR high bits into low bits
Collision handling (Java 8) Linked list (0–7 nodes) Red-Black TreeBin (8+)
TREEIFY_THRESHOLD 8 nodes in bucket AND table.length >= 64
UNTREEIFY_THRESHOLD 6 nodes (hysteresis – prevents thrashing)
MIN_TREEIFY_CAPACITY 64 – below this, resize instead of treeify
Resize trigger size > capacity * loadFactor double capacity (O(n))
Resize bucket split Bit-check: entry stays OR moves to index + oldCapacity
Node fields hash (cached), key, value, next
Null key handling hash(null) = 0 always in bucket[0]
pre-size formula new HashMap<>((int)(expected / 0.75) + 1)
put() steps lazy init index empty/tree/list value update resize check
get() complexity O(1) avg, O(log n) worst (TreeBin), O(n) worst (Java 7)
ConcurrentHashMap locking CAS for empty buckets; synchronized on bucket head (non-empty)
CHM size() Approximate (LongAdder-style striped counter)
CHM null keys/values NOT allowed (NullPointerException)
LinkedHashMap Insertion or access order; uses doubly-linked list through entries
WeakHashMap Weak keys – GC'd when no strong refs; uses ReferenceQueue
EnumMap Array-backed, ordinal index; fastest for enum keys
Java 8 change Tail insertion (fixes Java 7 resize infinite loop bug)
Fail-fast iterators modCount check on every next(); ConcurrentModificationException
60-second answer: Key interview soundbite: 'HashMap uses an array of buckets, each bucket is a linked list or red-black tree. Bucket index = (n-1) & (h ^ h>>>16). Resize doubles capacity when size > 0.75 * capacity, moving each entry by one bit-check. Java 8 treeifies buckets with 8+ entries (table >= 64) to bound worst-case to O(log n). ConcurrentHashMap uses CAS for empty buckets and synchronized on individual bucket heads '

Ashok Koritala · HashMap Internals Interview Guide · Java 8–21 · Confidential

No comments:

Post a Comment