One option is to use the FlexSearch module for Node js.
So let’s create a small Proof of Concept (POC) from scratch.
The full source code is here
Have in mind that it’s an in-memory implementation so won’t be possible to index a huge amount of data. You can make your own benchmarks based on your requirements.
Also, I strongly recommend you install a plugin in your browser to see JSON in a pretty-print format. I use JSONView. Another option is to use Postman to make your HTTP requests
mkdir myflexsearch
cd myflexsearch
express --no-view --git
You can delete boilerplate code such as /public folder and routes/routes/users.js. After that yo will have to modify app.js because they are used there. Anyway, that code doesn’t affect our Proof of Concept.
Let’s install flexsearch module
npm install flexsearch --save
Optionally you can install nodemon module to automatically reload your app after every change. You can install it globally but I will locally
npm install nodemon --save
After that, open package.json and modify the start
"scripts": {
"start": "nodemon ./bin/www"
}
Our main code will be at routes/index.js. This will be our endpoint to expose a service to search like this:
/search?phrase=Cloud
const FlexSearch = require("flexsearch");
const preset = "score";
const searchIndex = new FlexSearch(preset);
With preset = “score” we are defining behavior for our search. You can see more presets here. I recommend you play with different presets and see results.
We’ll need some dummy data to test.
What I’ve done is to create a file /daos/my_data.js with some content from here: https://api.publicapis.org/entries
function buildIndex() {
console.time('buildIndexTook');
console.info('building index...');
const { data } = wsData; // we could get our data from DB, remote web service, etc.
for (let i = 0; i < data.length; i++) {
// we might concatenate the fields we want for our content
const content = `${data[i].API} ${data[i].Description} ${data[i].Category}`;
const key = parseInt(data[i].id);
searchIndex.add(key, content);
}
console.info(`index built, length: ${searchIndex.length}`);
console.info(' Open a browser at http://localhost:3000/');
console.timelineEnd('buildIndexTook');
}
Have in mind we are working with an in-memory search so be careful with the amount of data you load to the index.
This method shouldn’t take more than a couple of seconds running.
Basically in buildIndex() method we get our data from a static file but we could get it from a remote web service or a database. Then we indicate a key for our index and then the content.
After that our index is ready to receive queries.
router.get('/search', async (req, res, next) => {
try {
if (searchIndex.length === 0) {
await buildIndex();
}
const { phrase } = req.query;
if (!phrase) {
throw Error('phrase query parameter empty');
}
console.info(`Searching by: ${phrase}`);
// search using flexsearch. It will return a list of IDs we used as keys during indexing
const resultIds = await searchIndex.search({
query: phrase,
suggest: true, // When suggestion is enabled all results will be filled up (until limit, default 1000) with similar matches ordered by relevance.
});
console.info(`results: ${resultIds.length}`);
const results = getDataByIds(resultIds);
res.json(results);
} catch (e) {
next(e);
}
});
Here we expose a typical Express endpoint that receives the phrase to search through a query string parameter called phrase.
The result of our index will be the keys that match with our phrase, after that we will have to search our elements in our dataset to be displayed.
function getDataByIds(idsList) {
const result = [];
const { data } = wsData;
for (let i = 0; i < data.length; i++) {
if (idsList.includes(data[i].id)) {
result.push(data[i]);
}
}
return result;
}
We are just iterating our collection but typically we will query a database.
Some examples:
Tip: if you are working with MySQL, you can try its own full-text implementation