Table of Contents
In this article, you’ll learn about what is Hashing in DBMS, Why do we need Hashing, Types of Hashing, What is Collision and more.
What is Hashing in DBMS?
In a huge data structure, It is next to impossible to search all the index values and reach to desired data, to overcome this problem, hashing is used.
Hashing in Database Management Systems (DBMS) is an efficient technique for locating desired data directly on the disk without the need for complex index structures. It serves as a method to index and retrieve items within a database by utilizing a shorter hashed key, which accelerates the search process compared to using the original values.
Data is organized and stored in the form of data blocks, and a hash function is applied to generate addresses for these blocks in the memory location. These memory locations are commonly referred to as data blocks or data buckets.
- A hash function can choose any of the column values to generate the address.
- Mostly, the hash function uses the primary key to generate the address of the data block.
- A hash function is a simple mathematical function to any complex mathematical function.
- The primary key can also be used as the address of the data block. That means each row whose address will be the same as a primary key stored in the data block.
Why do we need Hashing?
Hashing is a crucial technique employed in Database Management Systems (DBMS) to efficiently index and retrieve items from a large database. When dealing with extensive data structures, searching for index values across multiple levels to locate specific data blocks becomes a time-consuming task. Hashing offers a solution to this problem by using a shorter hashed key instead of the original value for faster data retrieval.
The primary purpose of hashing is to calculate the direct location of a data record on the disk without the need for complex index structures. It streamlines the search process, making it more efficient and practical for large databases. Additionally, hashing finds its application in implementing dictionaries, providing an effective way to store and retrieve key-value pairs.
Terminologies using in Hashing
To understand the terminologies associated with hashing, it’s essential to know the key components involved
Data bucket | Data buckets, also known as “Unit Of Storage,” are memory locations where records are stored. |
Key | In DBMS, a key is an attribute or a set of attributes that uniquely identify a row (tuple) in a relation (table). It establishes relationships between different tables. |
Hash function | The hash function maps a set of search keys to the address where actual records are stored. It plays a vital role in generating the hashed key. |
Linear Probing | Linear probing involves a fixed interval between probes. Instead of overwriting the older record, the next available data block is used to store the new record. |
Quadratic probing | This method determines the new bucket address by adding intervals between probes using consecutive outputs of a quadratic polynomial based on the original computation. |
Hash index | The hash index refers to the address of the data block. It can be a simple or complex mathematical function, depending on the requirements. |
Double Hashing | Double hashing is a technique used in hash tables to resolve collision issues. It helps in finding an alternative bucket location when a collision occurs. |
Bucket Overflow | Collision occurs when multiple keys hash to the same bucket, resulting in a bucket overflow. This situation can be detrimental to the efficiency of static hash functions. |
Types of Hashing
- Static Hashing
- Dynamic Hashing
Static Hashing
In the static hashing, the resultant data bucket address will always remain the same. That means if we generate an address for Machine_No =103 using the hash function mod (10) then it will always result in the same bucket address 3. Here, there will be no change in the bucket address.
Hence in this static hashing, the number of data buckets in memory remains constant throughout. In this example, we will have five data buckets in the memory used to store the data.
Operations of Static Hashing
- Inserting a record: When a new record requires to be inserted into the table, you can generate an address for the new record using its hash key. When the address is generated, the record is automatically stored in that location.
- Searching: When you need to retrieve the record, the same hash function should be helpful to retrieve the address of the bucket where data should be stored.
- Delete a record: Using the hash function, you can first fetch the record which is you want to delete. Then you can remove the records for that address in memory.
- Update a Record: To update a record, we will first search it using a hash function, and then the data record is updated.
Static hashing are of two types .They are as follows
- Open hashing
- Close hashing
Open Hashing
In Open hashing method, Instead of overwriting older one the next available data block is used to enter the new record, This method is also known as linear probing.
For example, N3 is a new record that you want to insert. The hash function generates the address as 5. But it is already occupied by some other value. That’s why the system looks for the next data bucket 6 and assigns N3 to it.
Close Hashing
When buckets are full, then a new data bucket is allocated for the same hash result and is linked after the previous one. This mechanism is known as Overflow chaining.
For example: Suppose N3 is a new address that needs to be inserted into the table, the hash function generates the address as 5 for it. But this bucket is full to store the new data. In this case, a new bucket is inserted at the end of 5 buckets and is linked to it.
Dynamic Hashing
Dynamic hashing offers a way in which data buckets are added and removed dynamically and when need. In this hashing, the hash function helps you to create a large number of values.
- The dynamic hashing mechanism is used to overcome the problems of static hashing like bucket overflow.
- In this mechanism, data buckets grow or shrink as the records increases or decrease. This method is also known as the Extendable hashing method.
- This method makes hashing dynamic, i.e., it allows insertion or deletion without resulting in poor performance.
What is Collision?
Hash collision is a state when the resultant hashes from two or more data in the data set, wrongly map the same place in the hash table.
How to deal with Hashing Collision?
There are two technique which you can use to avoid a hash collision:
- Rehashing: This method, invokes a secondary hash function, which is applied continuously until an empty slot is found, where a record should be placed.
- Chaining: This method builds a Linked list of items whose key hashes to the same value. This method requires an extra link field to each table position.