Chapter 5 - Hash tables
Suppose you work at a grocery store. When a customer buys an item, you have to look up the price in a book.
- simple search: if book is unsorted, you have to look at every line to find the price (linear)
- binary search: if book is sorted, you could run binary search (log n)
- hash table: if you have a friend who has all the names and prices memorized (constant)
# of items |
Simple search |
Binary search |
Hash table |
100 |
100 |
7 |
1 |
1,000 |
1,000 |
10 |
1 |
10,000 |
10,000 |
14 |
1 |
Hash function
- A hash function is a function where you put in a string (a sequence of bytes) and you get back a number. It maps strings to numbers.
- Requirements for a hash function:
- It needs to be consistent. Each input should create the same output every time.
- It should map different words to different numbers. A hash function is no good if it always returns “1” for any word you put in. In the best case, every different word should map to a different number.
Hash table
- Hash table uses a hash function to figure out where to store elements. If different items result in the same hash, they will be stored in a linked list. Hash table is essentially a combination of hash function and arrays.
- Also known as hash maps, maps, dictionaries, and associative arrays.
- Hash table is good for:
- Modeling relationships from one thing to another
- Filtering out duplicates
- Caching/memorizing data instead of making your server do the work
Use cases
- Using hash tables for lookups
- A phone book where you map people’s names to phone numbers.
- DNS resolution that maps a web address to an IP address.
- Preventing duplicate entries
- Using hash tables as a cache
- Websites remember the data instead of recalculating it for certain URLs.
- All big websites use caching and that data is cached in a hash.
Collisions
| | Hash tables (average) | Hash tables (worst) | Arrays | Linked lists |
| —— | ——————— | ——————- | —— | ———— |
| Search | O(1) | O(n) | O(1) | O(n) |
| Insert | O(1) | O(n) | O(n) | O(1) |
| Delete | O(1) | O(n) | O(n) | O(1) |
- On average, hash tables are as fast as arrays at searching and as fast as linked lists at insert/delete.
- In the worst case, hash tables are slow at both search and insert/delete.
- A worst case hash table is when all keys map to one slot.
- To avoid worst case performance, we need to avoid collisions. It can be fixed in two ways:
- A low load factor
- A good hash function
Load factor
A good hash function
-
A good hash function distributes values in the array evenly.
![Untitled](img/Untitled%205.png)
-
A bad hash function groups values together and produces a lot of collisions.
![Untitled](img/Untitled%206.png)
-
SHA is a good hash function.
Recap
- A hash table is made by combining a hash function with an array.
- Collisions are bad. You need a hash function that minimizes collisions.
- Hash tables have really fast search, insert, and delete.
- hash tables are good for modeling relationships from one item to another.
- Once your load factor is greater than 0.7, it’s time to resize your hash table.
- Hash tables are used for caching data (e.g. with web server).
- Hash tables are great for catching duplicates.