Full-text search in Node JS (search related data)

June 29th, 2019

If you are building a website, e-commerce, a blog, etc., you will need a full-text search to find related content like Google does for every web page. This is an already known problem so probably you don’t want to implement your own solution.

One option is to use the flexsearch module for Node js.

So let’s create a small Proof of Concept (POC).

The full source code is here

Have in mind that it’s an in-memory implementation so won’t be possible to index a huge amount of data. You can make your own benchmarks based on your requirements.

Setting up

Install Express generator if you haven’t done

Also, I strongly recommend you install a plugin in your browser to see JSON in a pretty-print format. I use JSONView. Another option is to use Postman to make your HTTP requests

mkdir myflexsearch 
cd myflexsearch 
express --no-view --git

You can delete boilerplate code such as /public folder and routes/routes/users.js. After that yo will have to modify app.js because they are used there. Anyway that code doesn’t affect our Proof of Concept.

Let’s install flexsearch module

npm install flexsearch --save

Optionally you can install nodemon module to automatically relad your app after every change. You can install it globally but I will locally

npm install nodemon --save

After that, open package.json and modify start

"scripts": {    
    "start": "nodemon ./bin/www"  
 }

Let’s code!

Our main code will be at routes/index.js. This will be our endpoint to expose a service to search like this: /search?phrase=Cloud

Import the module

const FlexSearch = require("flexsearch");
const preset = "score"; 
const searchIndex = new FlexSearch(preset);

With preset = “score” we are defining behavior for our search. You can see more presets here. I recommend you play with different presets and see results.

We’ll need some dummy data to test. What I’ve done is to create a file /daos/my_data.js with some content from here: https://api.publicapis.org/entries

Summary steps

Build our index
- Define a key. Typically and ID field of our elements to index (user.id, book.id, etc)
- Define a content where we want to search. Example: the body of our blog post plus some description and its category.
Expose a service to search through a URL parameter
- Build our index if it is empty
- Get the phrase to search from and url parameter
- Search in our index and get a list of IDs with results
- With the above results get elements from our indexed collection.
Make requests to test our data

Building the index

function buildIndex() {
  console.time("buildIndexTook");
  console.info("building index...");

  const data = wsData.data; //we could get our data from DB, remote web service, etc.
  for (let i = 0; i < data.length; i++) {
    //we might concatenate the fields we want for our content
    const content =
      data[i].API + " " + data[i].Description + " " + data[i].Category;
    const key = parseInt(data[i].id);
    searchIndex.add(key, content);
  }
  console.info("index built, length: " + searchIndex.length);
  console.info("Open a browser at http://localhost:3000/");
  console.timelineEnd("buildIndexTook");
}

Have in mind we are working with an in-memory search so be careful with the amount of data you load to the index. This method shouldn’t take more than a couple of seconds running.

Basically in buildIndex() method we get our data from a static file but we could get it from a remote web service or a data base. Then we indicate a key for our index and then the content. After that our index is ready to receive queries.

Exposing the service to search

router.get("/search", async (req, res, next) => {
  try {
    if (searchIndex.length === 0) {
      await buildIndex();
    }

    const phrase = req.query.phrase;
    if (!phrase) {
      throw Error("phrase query parameter empty");
    }
    console.info("Searching by: " + phrase);
    //search using flexsearch. It will return a list of IDs we used as keys during indexing
    const resultIds = await searchIndex.search({
      query: phrase,
      suggest: true //When suggestion is enabled all results will be filled up (until limit, default 1000) with similar matches ordered by relevance.
    });

    console.info("results: " + resultIds.length);
    const results = getDataByIds(resultIds);
    res.json(results);
  } catch (e) {
    next(e);
  }
});

Here we expose a typical Express endpoint that receives the phrase to search through a query string parameter called phrase. The result of our index will be the keys that match with our phrase, after that we will have to search our elements in our dataset to be displayed.

function getDataByIds(idsList) {
  const result = [];
  const data = wsData.data;
  for (let i = 0; i < data.length; i++) {
    if (idsList.includes(data[i].id)) {
      result.push(data[i]);
    }
  }
  return result;
}

We are just iterating our collection but typically we will query a data base.