What is Caching?
Let's say you had a bad with a lot of items in it, like some books and a laptop. Each time you needed something, you had to go all the way to your bag, pick it up, and take it back once you were done with it. And if you needed that item again, you would have to go all the way back to your bag and pick it up again. Obviously, this would start to get annoying at some point. What if you could keep each item you used on a close by table when you were done with it, so if you needed it again, you could just pick it up from the table, instead of going all the way to your bag. That's caching right there!
Caching is a mechanism aimed at improving software performance by storing frequently accessed data for future requests of that same data.
But we're still storing the same frequently accessed data somewhere so how exactly does it make things faster? The frequently accessed data being stored is stored somewhere called a cache, and retrieving an item from a cache database, is much faster than retrieving an item from a traditional database. Why? Because a cache database stores items in memory and not in storage or on the disk. And memory is much faster than the disk.
Why should we cache data?
As explained earlier, we cache data to improve the performance of our application. Obtaining cached data takes less time to complete, and you potentially reduce the amount of load on your primary database.
When should we cache data?
Like every other tool and pattern, caching has where it shines. Caching is optimal when you have data that doesn't change frequently. If your data changes very frequently then caching would not be the best solution to improve performance because the data in the cache will constantly be stale (outdated).
Caching Terminologies
Cache Hit: When data is requested from a cache and the data is present in the cache, it's called a cache hit.
Cache Miss: When data is requested from a cache and the data is not present in the cache, it's called a cache miss.
Stale Data: When a price of data in the cache is outdated, the data is referred to as stale.
Invalidation: This is the process of declaring a particular piece of data as stale (and possibly re-fetching it).
Least Recently Used (LRU): This is the data in the cache that has not been requested for the longest.
Time-To-Live (TTL): This is the amount of time a piece of data can live in a cache.
How to implement caching
On the Frontend (Client-side)
We can implement caching on the client side by using the @tanstack/react-query
library as a wrapper around our API calls. It comes with an in-built cache.
In this example, the cache is identified by a key, which is 'users'. When the API request is made, it checks the cache to see if the data with that key is present. If it's not, it makes a request to the server and automatically caches the result.
const fetchUsers = async () => {
const response = await fetch('https://api.example.com/users');
if (!response.ok) {
throw new Error('Network response was not ok');
}
return response.json();
};
const UsersComponent = () => {
const { isLoading, isError, data, error } = useQuery('users',
fetchUsers);
//Users compoenent implementation
}
Tanstack Query allows you to customize your cache settings like the time before data is declared stale, or a mechanism to invalidate cached data. You can check them all in the official documentation https://tanstack.com/query/latest/docs/framework/react/overview
On The Backend (Server-side)
On the server side of things, caching is mostly used when making calls to the database. We can implement caching by using an in-memory database called Redis.
const express = require('express');
const redis = require('redis');
const { User } = require('./models')
const app = express();
const client = redis.createClient();
// Helper function to check cache and fetch data
async function getData(key, fetchData) {
const cachedData = await client.get(key);
//If data is in the cache
if (cachedData) {
return JSON.parse(cachedData);
}
//Else
const freshData = await fetchData();
client.set(key, JSON.stringify(freshData));
return freshData;
}
// Function to fetch data from database using Sequelize
async function getDatabaseData() {
return await Users.findAll();
}
// API endpoint with cache
app.get('/api/data', async (req, res) => {
try {
const data = await getData('database-data-key', getDatabaseData);
return res.json(data);
} catch (error) {
console.error(error);
return res.status(500).send('Internal Server Error');
}
});
In this example, we're using Redis with Node.js to cache calls from our SQL database (it can be any database of your choice). When the request is made to that endpoint, it checks the cache first. If the data is in the cache, it returns it from the cache, else it checks the database, and then updates that value in the cache (although this flow I just explained is not always the case, as there are different caching strategies).
I like to keep things short so that's it for today, if you liked this, be sure to leave a like or a comment and connect with me on LinkedIn at https://linkedin.com/in/fortunealebiosu and stay tuned for the next part of this series.