Full-text search in Node JS - search related data


Posted on Aug 04, 2020


featured image
If you are building a website, e-commerce, a blog, etc., you will need a full-text search to find related content like Google does for every web page. This is an already known problem so probably you don’t want to implement your own solution.

One option is to use the FlexSearch module for Node js.

So let’s create a small Proof of Concept (POC) from scratch.

The full source code is here

Have in mind that it’s an in-memory implementation so won’t be possible to index a huge amount of data. You can make your own benchmarks based on your requirements.

Setting up

Install Express generator if you haven’t done

Also, I strongly recommend you install a plugin in your browser to see JSON in a pretty-print format. I use JSONView. Another option is to use Postman to make your HTTP requests

      
mkdir myflexsearch
cd myflexsearch
express --no-view --git

    
You can delete boilerplate code such as /public folder and routes/routes/users.js. After that yo will have to modify app.js because they are used there. Anyway, that code doesn’t affect our Proof of Concept.

Let’s install flexsearch module

      
npm install flexsearch --save

    
Optionally you can install nodemon module to automatically reload your app after every change. You can install it globally but I will locally
      
npm install nodemon --save

After that, open package.json and modify the start
      
"scripts": {
"start": "nodemon ./bin/www"
}

    

Let’s code

Our main code will be at routes/index.js. This will be our endpoint to expose a service to search like this:

      
/search?phrase=Cloud

    

Import the module

      
const FlexSearch = require("flexsearch");
const preset = "score";
const searchIndex = new FlexSearch(preset);

    
With preset = “score” we are defining behavior for our search. You can see more presets here. I recommend you play with different presets and see results.

We’ll need some dummy data to test.

What I’ve done is to create a file /daos/my_data.js with some content from here: https://api.publicapis.org/entries

Summary steps

  • Build our index
  • Define a key. Typically and ID field of our elements to index (user.id, book.id, etc)
  • Define a content where we want to search. Example: the body of our blog post plus some description and its category.
  • Expose a service to search through a URL parameter
  • Build our index if it is empty
  • Get the phrase to search from and url parameter
  • Search in our index and get a list of IDs with results
  • With the above results get elements from our indexed collection.
  • Make requests to test our data
  • Building the index
      
function buildIndex() {
  console.time('buildIndexTook');
  console.info('building index...');
  const { data } = wsData; // we could get our data from DB, remote web service, etc.
  for (let i = 0; i < data.length; i++) {
    // we might concatenate the fields we want for our content
    const content = `${data[i].API} ${data[i].Description} ${data[i].Category}`;
    const key = parseInt(data[i].id);
    searchIndex.add(key, content);
  }
  console.info(`index built, length: ${searchIndex.length}`);
  console.info(' Open a browser at http://localhost:3000/');
  console.timelineEnd('buildIndexTook');
}

    
Have in mind we are working with an in-memory search so be careful with the amount of data you load to the index. This method shouldn’t take more than a couple of seconds running.

Basically in buildIndex() method we get our data from a static file but we could get it from a remote web service or a database. Then we indicate a key for our index and then the content.

After that our index is ready to receive queries.

Exposing the service to search

      
router.get('/search', async (req, res, next) => {
  try {
    if (searchIndex.length === 0) {
      await buildIndex();
    }

    const { phrase } = req.query;
    if (!phrase) {
      throw Error('phrase query parameter empty');
    }
    console.info(`Searching by: ${phrase}`);
    // search using flexsearch. It will return a list of IDs we used as keys during indexing
    const resultIds = await searchIndex.search({
      query: phrase,
      suggest: true, // When suggestion is enabled all results will be filled up (until limit, default 1000) with similar matches ordered by relevance.
    });

    console.info(`results: ${resultIds.length}`);
    const results = getDataByIds(resultIds);
    res.json(results);
  } catch (e) {
    next(e);
  }
});

    
Here we expose a typical Express endpoint that receives the phrase to search through a query string parameter called phrase. The result of our index will be the keys that match with our phrase, after that we will have to search our elements in our dataset to be displayed.
      
function getDataByIds(idsList) {
  const result = [];
  const { data } = wsData;
  for (let i = 0; i < data.length; i++) {
    if (idsList.includes(data[i].id)) {
      result.push(data[i]);
    }
  }
  return result;
}

    

We are just iterating our collection but typically we will query a database.

Making requests

Our last step is just to make some test requests with our browser, Postman, curl, or any other tool.

Some examples:

  • http://localhost:3000/search?phrase=Cryptocurrency
  • http://localhost:3000/search?phrase=Cloud
  • http://localhost:3000/search?phrase=File
  • http://localhost:3000/search?phrase=Storage
  • http://localhost:3000/search?phrase=Open%20Threat
That’s it. See the full source code

Tip: if you are working with MySQL, you can try its own full-text implementation


flexsearch nodejs express full-text-search dev mysql
Search
Side Widget
You can put anything you want inside of these side widgets. They are easy to use, and feature the new Bootstrap 4 card containers!