October 12, 2022
Improving Query Performance with Indexes using Prisma: Hash Indexes
One strategy for improving performance for your database queries is using indexes. This article will dive into hash indexes: taking a look at the data structure used and improve the performance of an existing query with an index using Prisma.
Overview
- Introduction
- Hash tables: the data structure that powers hash indexes
- When to use a hash index
- Working with hash indexes using Prisma
- Summary and next steps
Introduction
In this part of the series, you will learn what hash indexes are, how they work, and when to use them, and then dive into a concrete example of how you can improve the performance of a query with a hash index using Prisma.
If you want to learn more about the fundamentals of database indexes, check out the first part.
Hash tables: the data structure that powers hash indexes
Hash indexes use the hash table data structure. Hash tables (also known as hash maps) are great data structures that allow fast data retrieval in almost constant time (O(1)
). This means the retrieval time of a record won't be affected by the size of the data being searched.
If you're unfamiliar with the concept of Big O notation, take a look at What is Big O notation.
PostgreSQL's hash index is composed of "buckets" or "slots" into which tuples are placed.
PostgreSQL uses a hash function to compute a hash key or hash code when storing a value to the index:
- Hash key: maps the value to a 32-bit integer.
- Hash code: maps to a bucket number in which the value will be stored.
Hash function: a function that maps data of arbitrary size to fixed-size values.
Hash code/ key: the output of a hash function.
When retrieving a record using a hash index, the database applies the hash function to the value to determine the bucket that might contain the value. After determining the bucket, the database will search through the tuple to find the records that match your query.
If you're interested in reading about PostgreSQL's implementation of the hash index, you can read further here.
When to use a hash index
Hash indexes would be a solid choice if you only intend to use the equality operator (=
) to query your data. For example, in the query in the example below, a hash index would be suitable.
SELECT firstName from 'User' where lastName = 'Wick';
If you intend to use range operators (<
, <=
,>
, >=
) when filtering your data, you can use a B-Tree index. You can refer to part 2 to learn more about B-tree indexes.
Hash indexes only work with the equality (=
) operator. This means that a hash index would be a solid choice if you're using the =
operator when querying your data.
While a hash index might be a good choice for speeding up your queries, it comes with some caveats. Some of the limitations are that they:
- Cannot be used to index multiple columns
- Cannot be used to create sorted indexes
- Cannot be used to enforce unique constraints
Working with hash indexes using Prisma
Assumed knowledge
To follow along, the following knowledge will be assumed:
- Some familiarity with JavaScript/TypeScript
- Some experience working with REST APIs
- A basic understanding of working with Git
Development environment
You will also be expected to have the following tools set up in your development environment:
- Node.js
- Git
- Docker or PostgresQL
- Prisma VS Code extension (optional): intellisense and syntax highlighting for Prisma
- REST Client VS Code extension (optional): sending HTTP requests on VS Code
Note: If you don't have Docker or PostgreSQL installed, you can set up a free database on Railway.
Clone the repository and install dependencies
Navigate to your directory of choice and clone the repository:
git clone -b hash-indexes git@github.com:prisma/prisma-indexes.gitCopy
Change the directory to the cloned repository and install dependencies:
cd prisma-indexesnpm installCopy
Next, rename the .env.example
file to .env
.
mv .env.example .envCopy
Project walkthrough
The sample project is a minimal REST API built with TypeScript and Fastify.
The project contains the following file structure:
prisma-indexes ├── .github/workflows │ │ └── test.yaml │ └── renovate.json ├── node_modules ├── prisma │ ├── migrations/ │ ├── schema.prisma │ └── seed.ts ├── src │ └── index.ts ├── README.md ├── .env ├── .gitignore ├── docker-compose.yml ├── package-lock.json ├── package.json ├── requests.http └── tsconfig.json
The notable files and directories for this project are:
- The
prisma
folder contains:- The
schema.prisma
file that defines the database schema - The
migrations
directory that contains the database migrations history - The
seed.ts
file that contains a script to seed your development database
- The
- The
src
directory:- The
index.ts
file defines a REST API using Fastify. It contains one endpoint called/users
and accepts one optional query parameter —firstName
- The
- The
docker-compose.yml
file defining the PostgreSQL database docker image - The
.env
file containing your database connection string
The application contains a single model in the Prisma schema called User
with the following fields:
// prisma/schema.prismamodel User { id Int @id @default(autoincrement()) firstName String lastName String email String}
The src/index.ts
file contains primitive logging middleware to measure the time taken by a Prisma query:
// src/index.tsprisma.$use(async (params, next) => { const before = Date.now() const result = await next(params) const after = Date.now()
logger.info(`Query took ${after - before}ms`) return result})
You can use the logged data to determine which Prisma queries are slow. You can use the logs to gauge queries that could require some performance improvements.
src/index.ts
also logs Prisma query
events and parameters to the terminal. The query
event and parameters contains the SQL query and parameters that Prisma executes against your database.
const prisma = new PrismaClient({ log: [{ emit: "event", level: "query", },],})
prisma.$on("query", async (e) => { logger.info(`Query: ${e.query}`) logger.info(`Params: ${e.params}`)});
The SQL queries (with filled-in parameters) can be copied and prefixed with EXPLAIN
to view the query plan the database will provide.
Create and seed the database
Start up the PostgreSQL database with docker:
docker-compose up -dCopy
Next, apply the existing database migration in prisma/migrations
:
npx prisma migrate devCopy
The above command will:
- Create a new database called
users-db
(inferred from the connection string defined in the.env
file) - Create a
User
table as defined by the model inprisma/schema.prisma
. - Trigger the seeding script defined in
package.json
. The seeding step is triggered because it's run against a new database.
The seed file in prisma/seed.ts
will populate the database with a million user records.
Start up the application server:
npm run devCopy
Make an API request
The cloned repository contains a requests.http
file that contains sample requests to http://localhost:3000/users
that can be used by the installed REST Client VS Code extension. The requests contain different firstName
query parameters.
Note: Ensure you've installed the REST Client VS Code extension for this step. You can also use other API testing tools such as Postman, Insomnia, or your preferred tool of choice.```
Click the Send Request button right above the request to make the request.
VS Code will open an editor tab on the right side of the window with the responses. You should also see some information logged on the terminal.
In the screenshot above, the query took 55 ms.
Improve query performance with a hash index
You can define a hash index in your Prisma schema file using the @@index()
attribute function and providing the following arguments:
fields
: a list of fields to be indexedtype
: the name of the index created in the database
The @@index
attribute supports more arguments you can learn more about in the Prisma Schema API Reference.
Next, specify the firstName
in the fields
argument and Hash
as the value of the type
argument.
model User { id Int @id @default(autoincrement()) firstName String lastName String email String
@@index(fields: [firstName], type: Hash)}
After making the change, create and run another migration to create the index on the firstName
field in the User
model:
npx prisma migrate dev --name add-firstName-indexCopy
--- prisma/migrations/[timestamp]_add_firstName_index/migration.sql-- CreateIndexCREATE INDEX "User_firstName_idx" ON "User" USING HASH ("firstName");
Next, navigate to the requests.http
file again and resend the requests to the /users
route.
You will notice an improvement in response times. In my case, in the screenshot below, the response times have been down to about 9 to 11ms.
Congratulations! 🎉
You've learned how to reduce your database queries' response times using a hash index.
Summary and next steps
In this part, you learned what hash indexes are, their internal structure and limitations, and how to define and use a hash index using Prisma.
If you would like learn about the fundamentals of database indexes and B-Tree indexes, refer to part 1 and part 2.
In the following article, you will learn about GIN indexes: what it is, their structure, how it works, and how you can utilize a GIN index in your application using Prisma.