Analysis on different search commands

Photo by Markus Winkler on Unsplash

The purpose of this document is to find what would be the optimal way to search for word in a folder containing multiple inner folders and containing large set of files.

Below experiment was done on a folder containing 8918 folders and 48170 files. Purpose of this experiment is to find out what are varies ways of searching a string in these folders and also trying to find performance of each.

  1. grep -rinF search_term .

Using grep with -r to recursively search in current directory and -i for ignore case and -n to display line numbers and -F to treat the search term as a fixed string rather than regexp.

Binary file ./build/docroot.tar matches./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp:21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp:23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp:91:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp:23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn} -rinF  .  116.12s user 8.37s system 38% cpu 5:22.50 total

above took 5 mins 22 seconds to complete the search. Note — previous versions of grep might not support recursive (-r or -R) and also in POSIX systems also this option is not available, now if this option is not available we can use find with grep

2. grep search_term`find . -type f`

By using above command we are first finding all files and then applying grep on those files. This will work if folder has less number of files. If you have large number of files then it will fail with argument list too long: grep

zsh: argument list too long: grepgrep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn} "logo sc-cxo-logo"   0.42s user 0.04s system 99% cpu 0.459 total

3. find . -type f -exec grep -n search_term {} \; -print

In order to avoid ‘argument list too long: grep’ let’s use exec . As each line is found by find it would be fed to grep to search in that file. ( {} is replaced with each file)

Binary file ./build/docroot.tar matches./build/docroot.tar43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp91:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jspfind . -type f -exec grep -n "logo sc-cxo-logo" {} \; -print  124.81s user 139.21s system 51% cpu 8:34.44 total

Above worked fine but it took 8 minutes 34 seconds to complete. Can we improve this?

4. find . -type f -exec grep -n search_term {} +

Above is same as option 3, but instead of ; we are using + . By having + , set of as many paths possible are sent to grep ( {} is replaced with as many paths as possible)

Binary file ./build/docroot.tar matches./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp:21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp:23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp:91:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2:23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">find . -type f -exec grep -n "logo sc-cxo-logo" {} +  82.80s user 7.51s system 26% cpu 5:35.59 total

Above command took 5 mins 35 seconds which is similar to grep -r . Can we do this in another way?

5. find . -type f -print0 | xargs -0 grep -n search_term

The same can be accomplished with xargs too. Note — print0 and -0 are required if the folders and filenames contains spaces.

Binary file ./build/docroot.tar matches./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp:21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp:23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp:91:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2:23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">find . -type f -print0  0.06s user 0.79s system 0% cpu 4:24.74 totalxargs -0 grep -n "logo sc-cxo-logo"  82.92s user 7.36s system 32% cpu 4:39.14 total

Above took 4 minutes, 39 seconds which did somewhat better than option 1 and option 3. Can we do better than this? Yes by using third party utilities.

6. ack search_term *

ack is a is a grep-like source code search tool. — https://beyondgrep.com/

You can install by following the steps mentioned in https://beyondgrep.com/install/

modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65:                  class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp91:                                        class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65:                  class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">ack "logo sc-cxo-logo" *  6.14s user 6.61s system 9% cpu 2:09.14 total

above took 2 min, 9 seconds which is significant improvement on previous options. Can we do better than this?

7. rg search_term

ripgrep is a line-oriented search tool that recursively searches your current directory for a regex pattern — https://github.com/BurntSushi/ripgrep

modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65:                  class="logo sc-cxo-logo" title="Back to homepage”>rg 'logo sc-cxo-logo'  1.23s user 4.83s system 8% cpu 1:13.77 total

above command took, 1 min, 13 seconds, which is better than above all options. Note by default ripgrep excludes folders like bin. Can we do better than this?

8. ag search_term *

The Silver Searcher — A code searching tool similar to ack, with a focus on speed. — https://github.com/ggreer/the_silver_searcher

43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65:                  class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp91:                                        class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65:                  class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">ag "logo sc-cxo-logo" *  0.88s user 8.13s system 16% cpu 53.761 total

This took 54 seconds which is significant improvement. Thus a search which took 5 mins by using above tools we are able to search in seconds.

From this article we went through varies commands to search in a folder, we went through different options and tools to find optimal way to search.

Hope this helps.

Software Engineer, Learner, Developer, Prokopton