Analysis on different search commands

Photo by Markus Winkler on Unsplash

The purpose of this document is to find what would be the optimal way to search for word in a folder containing multiple inner folders and containing large set of files.

Below experiment was done on a folder containing 8918 folders and 48170 files. Purpose of this experiment is to find out what are varies ways of searching a string in these folders and also trying to find performance of each.

  1. grep -rinF search_term .

Using grep with -r to recursively search in current directory and -i for ignore case and -n to display line numbers and -F to treat the search term as a fixed string rather than regexp.

$ time grep -rinF "logo sc-cxo-logo" .Binary file ./build/docroot.tar matches./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp:21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp:23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp:91:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp:23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">grep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn} -rinF  .  116.12s user 8.37s system 38% cpu 5:22.50 total

above took 5 mins 22 seconds to complete the search. Note — previous versions of grep might not support recursive (-r or -R) and also in POSIX systems also this option is not available, now if this option is not available we can use with

2. grep search_term`find . -type f`

By using above command we are first finding all files and then applying grep on those files. This will work if folder has less number of files. If you have large number of files then it will fail with argument list too long: grep

$ time grep "logo sc-cxo-logo" `find . -type f`zsh: argument list too long: grepgrep --color=auto --exclude-dir={.bzr,CVS,.git,.hg,.svn} "logo sc-cxo-logo"   0.42s user 0.04s system 99% cpu 0.459 total

3. find . -type f -exec grep -n search_term {} \; -print

In order to avoid ‘argument list too long: grep’ let’s use . As each line is found by it would be fed to to search in that file. ( is replaced with each file)

$ time find . -type f -exec grep -n "logo sc-cxo-logo" {} \; -printBinary file ./build/docroot.tar matches./build/docroot.tar43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp91:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jspfind . -type f -exec grep -n "logo sc-cxo-logo" {} \; -print  124.81s user 139.21s system 51% cpu 8:34.44 total

Above worked fine but it took 8 minutes 34 seconds to complete. Can we improve this?

4. find . -type f -exec grep -n search_term {} +

Above is same as option 3, but instead of we are using . By having , set of as many paths possible are sent to grep ( is replaced with as many paths as possible)

$ time find . -type f -exec grep -n "logo sc-cxo-logo" {} +Binary file ./build/docroot.tar matches./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp:21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp:23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp:91:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2:23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">find . -type f -exec grep -n "logo sc-cxo-logo" {} +  82.80s user 7.51s system 26% cpu 5:35.59 total

Above command took 5 mins 35 seconds which is similar to . Can we do this in another way?

5. find . -type f -print0 | xargs -0 grep -n search_term

The same can be accomplished with too. Note — and are required if the folders and filenames contains spaces.

$ time find . -type f -print0 | xargs -0 grep -n "logo sc-cxo-logo"Binary file ./build/docroot.tar matches./modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp:21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp:23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp:91:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html:43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>./modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html:65:                  class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2:23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">./modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp:111:                                        class="logo sc-cxo-logo" title="Back to homepage">find . -type f -print0  0.06s user 0.79s system 0% cpu 4:24.74 totalxargs -0 grep -n "logo sc-cxo-logo"  82.92s user 7.36s system 32% cpu 4:39.14 total

Above took 4 minutes, 39 seconds which did somewhat better than option 1 and option 3. Can we do better than this? Yes by using third party utilities.

6. ack search_term *

is a is a grep-like source code search tool. — https://beyondgrep.com/

You can install by following the steps mentioned in https://beyondgrep.com/install/

$ time ack "logo sc-cxo-logo” * modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65:                  class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp21:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23:              <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp91:                                        class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65:                  class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">ack "logo sc-cxo-logo" *  6.14s user 6.61s system 9% cpu 2:09.14 total

above took 2 min, 9 seconds which is significant improvement on previous options. Can we do better than this?

7. rg search_term

ripgrep is a line-oriented search tool that recursively searches your current directory for a regex pattern — https://github.com/BurntSushi/ripgrep

$ time rg 'logo sc-cxo-logo'modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23:              <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111:                                        class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html43:                    <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65:                  class="logo sc-cxo-logo" title="Back to homepage”>rg 'logo sc-cxo-logo'  1.23s user 4.83s system 8% cpu 1:13.77 total

above command took, 1 min, 13 seconds, which is better than above all options. Note by default ripgrep excludes folders like bin. Can we do better than this?

8. ag search_term *

The Silver Searcher — A code searching tool similar to ack, with a focus on speed. — https://github.com/ggreer/the_silver_searcher

$ time ag "logo sc-cxo-logo" *
modules/store/bin/j2ee-apps/Web/Web.war/checkout/mocks/layout.html
43: <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/bin/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65: class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_1.jsp21: <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23: <a id="homeLink" href="/homepage.jsp?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_cxoint.jsp91: class="logo sc-cxo-logo" title="Back to homepage">modules/store/bin/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111: class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/checkout/mocks/layout.html43: <a class="pull-left logo sc-cxo-logo" href="#"></a>modules/store/j2ee-apps/Web/Web.war/checkout/businessdelivery/index.html65: class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_2.jsp23: <a id="homeLink" href="/?selectedTab=allProducts" class="logo sc-cxo-logo" title="Back to homepage">modules/store/j2ee-apps/Web/Web.war/common/checkOutSubHeader_reskin.jsp111: class="logo sc-cxo-logo" title="Back to homepage">ag "logo sc-cxo-logo" * 0.88s user 8.13s system 16% cpu 53.761 total

This took 54 seconds which is significant improvement. Thus a search which took 5 mins by using above tools we are able to search in seconds.

From this article we went through varies commands to search in a folder, we went through different options and tools to find optimal way to search.

Hope this helps.

Software Engineer, Learner, Developer, Prokopton