How to count number of partial occurrences of a string in a file

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP








up vote
5
down vote

favorite












I have a file of which I need to count all partial matches for an input string in a file.

I'll show you an easy example of what I need:



In a file with this content:



Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat


I need to count how many times does the partial string "Good -*-Cat" (Where * could be anything, it doesn't matter) appears. The expected output count is 2.



Any help will be appreciated.







share|improve this question
























    up vote
    5
    down vote

    favorite












    I have a file of which I need to count all partial matches for an input string in a file.

    I'll show you an easy example of what I need:



    In a file with this content:



    Good-Black-Cat
    Bad-Red-Cat
    Bad-Gray-Dog
    Good-Golden-Dog
    Bad-White-Dog
    Good-Tabby-Cat
    Bad-Siamese-Cat


    I need to count how many times does the partial string "Good -*-Cat" (Where * could be anything, it doesn't matter) appears. The expected output count is 2.



    Any help will be appreciated.







    share|improve this question






















      up vote
      5
      down vote

      favorite









      up vote
      5
      down vote

      favorite











      I have a file of which I need to count all partial matches for an input string in a file.

      I'll show you an easy example of what I need:



      In a file with this content:



      Good-Black-Cat
      Bad-Red-Cat
      Bad-Gray-Dog
      Good-Golden-Dog
      Bad-White-Dog
      Good-Tabby-Cat
      Bad-Siamese-Cat


      I need to count how many times does the partial string "Good -*-Cat" (Where * could be anything, it doesn't matter) appears. The expected output count is 2.



      Any help will be appreciated.







      share|improve this question












      I have a file of which I need to count all partial matches for an input string in a file.

      I'll show you an easy example of what I need:



      In a file with this content:



      Good-Black-Cat
      Bad-Red-Cat
      Bad-Gray-Dog
      Good-Golden-Dog
      Bad-White-Dog
      Good-Tabby-Cat
      Bad-Siamese-Cat


      I need to count how many times does the partial string "Good -*-Cat" (Where * could be anything, it doesn't matter) appears. The expected output count is 2.



      Any help will be appreciated.









      share|improve this question











      share|improve this question




      share|improve this question










      asked May 26 at 21:47









      Rodrigo Andres Nava Lara

      414




      414




















          3 Answers
          3






          active

          oldest

          votes

















          up vote
          12
          down vote













          Given



          $ cat file
          Good-Black-Cat
          Bad-Red-Cat
          Bad-Gray-Dog
          Good-Golden-Dog
          Bad-White-Dog
          Good-Tabby-Cat
          Bad-Siamese-Cat


          then



          $ grep -c 'Good-.*-Cat' file
          2


          Note that this is a count of matching lines - so for example it won't work for multiple occurrences per line, or for occurrences that span lines.



          Alternatively, with awk



          awk '/Good-.*-Cat/ n++ END print n' file



          If you need to match multiple possible occurrences per line, then I'd suggest perl:



          perl -lne '$c += () = /Good-.*?-Cat/g } wc -l



          If you also need to match occurrences that may span a line boundary, then you can do so in perl by unsetting the record separator (note: this means that that the whole file is slurped into memory) and adding the s regex modifier e.g.



          perl -0777 -nE '$c += () = /Good-.*?-Cat/gs wc -l



          If you also need to match occurrences that may span a line boundary, then you can do so in perl by unsetting the record separator (note: this means that that the whole file is slurped into memory) and adding the s regex modifier e.g.



          perl -0777 -nE '$c += () = /Good-.*?-Cat/gs  










          up vote
          12
          down vote













          Given



          $ cat file
          Good-Black-Cat
          Bad-Red-Cat
          Bad-Gray-Dog
          Good-Golden-Dog
          Bad-White-Dog
          Good-Tabby-Cat
          Bad-Siamese-Cat


          then



          $ grep -c 'Good-.*-Cat' file
          2


          Note that this is a count of matching lines - so for example it won't work for multiple occurrences per line, or for occurrences that span lines.



          Alternatively, with awk



          awk '/Good-.*-Cat/ n++ END print n' file



          If you need to match multiple possible occurrences per line, then I'd suggest perl:



          perl -lne '$c += () = /Good-.*?-Cat/g wc -l



          If you also need to match occurrences that may span a line boundary, then you can do so in perl by unsetting the record separator (note: this means that that the whole file is slurped into memory) and adding the s regex modifier e.g.



          perl -0777 -nE '$c += () = /Good-.*?-Cat/gs  








          up vote
          12
          down vote










          up vote
          12
          down vote









          Given



          $ cat file
          Good-Black-Cat
          Bad-Red-Cat
          Bad-Gray-Dog
          Good-Golden-Dog
          Bad-White-Dog
          Good-Tabby-Cat
          Bad-Siamese-Cat


          then



          $ grep -c 'Good-.*-Cat' file
          2


          Note that this is a count of matching lines - so for example it won't work for multiple occurrences per line, or for occurrences that span lines.



          Alternatively, with awk



          awk '/Good-.*-Cat/ n++ END print n' file



          If you need to match multiple possible occurrences per line, then I'd suggest perl:



          perl -lne '$c += () = /Good-.*?-Cat/g wc -l



          If you also need to match occurrences that may span a line boundary, then you can do so in perl by unsetting the record separator (note: this means that that the whole file is slurped into memory) and adding the s regex modifier e.g.



          perl -0777 -nE '$c += () = /Good-.*?-Cat/gs improve this answer














          Given



          $ cat file
          Good-Black-Cat
          Bad-Red-Cat
          Bad-Gray-Dog
          Good-Golden-Dog
          Bad-White-Dog
          Good-Tabby-Cat
          Bad-Siamese-Cat


          then



          $ grep -c 'Good-.*-Cat' file
          2


          Note that this is a count of matching lines - so for example it won't work for multiple occurrences per line, or for occurrences that span lines.



          Alternatively, with awk



          awk '/Good-.*-Cat/ n++ END print n' file



          If you need to match multiple possible occurrences per line, then I'd suggest perl:



          perl -lne '$c += () = /Good-.*?-Cat/g wc -l



          If you also need to match occurrences that may span a line boundary, then you can do so in perl by unsetting the record separator (note: this means that that the whole file is slurped into memory) and adding the s regex modifier e.g.



          perl -0777 -nE '$c += () = /Good-.*?-Cat/gs { say $c' file






          share|improve this answer














          share|improve this answer



          share|improve this answer








          edited May 27 at 11:46

























          answered May 26 at 21:56









          steeldriver

          62.1k1196163




          62.1k1196163











          • Thank you for your answer. In the case there is the need to count multiple occurrences per line, which would be your recommendation? I've tried the awk code and apparently it counts only matching lines as well.
            – Rodrigo Andres Nava Lara
            May 26 at 22:16










          • @RodrigoAndresNavaLara please see updated answer
            – steeldriver
            May 26 at 22:32
















          • Thank you for your answer. In the case there is the need to count multiple occurrences per line, which would be your recommendation? I've tried the awk code and apparently it counts only matching lines as well.
            – Rodrigo Andres Nava Lara
            May 26 at 22:16










          • @RodrigoAndresNavaLara please see updated answer
            – steeldriver
            May 26 at 22:32















          Thank you for your answer. In the case there is the need to count multiple occurrences per line, which would be your recommendation? I've tried the awk code and apparently it counts only matching lines as well.
          – Rodrigo Andres Nava Lara
          May 26 at 22:16




          Thank you for your answer. In the case there is the need to count multiple occurrences per line, which would be your recommendation? I've tried the awk code and apparently it counts only matching lines as well.
          – Rodrigo Andres Nava Lara
          May 26 at 22:16












          @RodrigoAndresNavaLara please see updated answer
          – steeldriver
          May 26 at 22:32




          @RodrigoAndresNavaLara please see updated answer
          – steeldriver
          May 26 at 22:32












          up vote
          4
          down vote













          awk, multiple occurences, space-separated



          $ awk 'for(i=1;i<=NF;i++ ) count+=match($i,/Good-.*-Cat/);ENDprint count' input.txt
          4
          $ cat input.txt
          Good-Black-Cat
          Bad-Red-Cat
          Bad-Gray-Dog
          Good-Golden-Dog Good-Whatever-Cat Good-Something-Cat
          Bad-White-Dog
          Good-Tabby-Cat
          Bad-Siamese-Cat


          sed + wc, non-multiple occurences



          This uses negative pattern matching //! with d for delete, leaving only lines of interest.



          $ sed '/Good-.*-Cat/!d' input.txt
          Good-Black-Cat
          Good-Golden-Dog Good-Whatever-Cat
          Good-Tabby-Cat
          $ sed '/Good-.*-Cat/!d' input.txt | wc -l
          3


          Shell solution, non-multiple occurences



          Here's shell way that combines case...esac and file-reading loop:



          $ n=0; while IFS= read -r line || [ -n "$line" ]; do case "$line" in "Good-"*"-Cat") n=$((n+1));; esac; done < input.txt; echo "$n"
          2


          Or with indientation



          n=0
          while IFS= read -r line || [ -n "$line" ]; do
          case "$line" in
          "Good-"*"-Cat") n=$((n+1));;
          esac
          done < input.txt
          echo "$n"


          Explanation:




          • n=0 initializes n counter variable


          • while IFS= read -r line || [ -n "$line" ]; do...done < input.txt is standard file-reading loop used in shell scripting, with || [ -n "$line" ] protection to account for possible files that don't end in newline


          • case "$line" in "Good-"*"-Cat") n=$((n+1));; esac pattern-matching for the desired string with $((...)) arithmetic expansion to increment the counter variable.





          share|improve this answer


























            up vote
            4
            down vote













            awk, multiple occurences, space-separated



            $ awk 'for(i=1;i<=NF;i++ ) count+=match($i,/Good-.*-Cat/);ENDprint count' input.txt
            4
            $ cat input.txt
            Good-Black-Cat
            Bad-Red-Cat
            Bad-Gray-Dog
            Good-Golden-Dog Good-Whatever-Cat Good-Something-Cat
            Bad-White-Dog
            Good-Tabby-Cat
            Bad-Siamese-Cat


            sed + wc, non-multiple occurences



            This uses negative pattern matching //! with d for delete, leaving only lines of interest.



            $ sed '/Good-.*-Cat/!d' input.txt
            Good-Black-Cat
            Good-Golden-Dog Good-Whatever-Cat
            Good-Tabby-Cat
            $ sed '/Good-.*-Cat/!d' input.txt | wc -l
            3


            Shell solution, non-multiple occurences



            Here's shell way that combines case...esac and file-reading loop:



            $ n=0; while IFS= read -r line || [ -n "$line" ]; do case "$line" in "Good-"*"-Cat") n=$((n+1));; esac; done < input.txt; echo "$n"
            2


            Or with indientation



            n=0
            while IFS= read -r line || [ -n "$line" ]; do
            case "$line" in
            "Good-"*"-Cat") n=$((n+1));;
            esac
            done < input.txt
            echo "$n"


            Explanation:




            • n=0 initializes n counter variable


            • while IFS= read -r line || [ -n "$line" ]; do...done < input.txt is standard file-reading loop used in shell scripting, with || [ -n "$line" ] protection to account for possible files that don't end in newline


            • case "$line" in "Good-"*"-Cat") n=$((n+1));; esac pattern-matching for the desired string with $((...)) arithmetic expansion to increment the counter variable.





            share|improve this answer
























              up vote
              4
              down vote










              up vote
              4
              down vote









              awk, multiple occurences, space-separated



              $ awk 'for(i=1;i<=NF;i++ ) count+=match($i,/Good-.*-Cat/);ENDprint count' input.txt
              4
              $ cat input.txt
              Good-Black-Cat
              Bad-Red-Cat
              Bad-Gray-Dog
              Good-Golden-Dog Good-Whatever-Cat Good-Something-Cat
              Bad-White-Dog
              Good-Tabby-Cat
              Bad-Siamese-Cat


              sed + wc, non-multiple occurences



              This uses negative pattern matching //! with d for delete, leaving only lines of interest.



              $ sed '/Good-.*-Cat/!d' input.txt
              Good-Black-Cat
              Good-Golden-Dog Good-Whatever-Cat
              Good-Tabby-Cat
              $ sed '/Good-.*-Cat/!d' input.txt | wc -l
              3


              Shell solution, non-multiple occurences



              Here's shell way that combines case...esac and file-reading loop:



              $ n=0; while IFS= read -r line || [ -n "$line" ]; do case "$line" in "Good-"*"-Cat") n=$((n+1));; esac; done < input.txt; echo "$n"
              2


              Or with indientation



              n=0
              while IFS= read -r line || [ -n "$line" ]; do
              case "$line" in
              "Good-"*"-Cat") n=$((n+1));;
              esac
              done < input.txt
              echo "$n"


              Explanation:




              • n=0 initializes n counter variable


              • while IFS= read -r line || [ -n "$line" ]; do...done < input.txt is standard file-reading loop used in shell scripting, with || [ -n "$line" ] protection to account for possible files that don't end in newline


              • case "$line" in "Good-"*"-Cat") n=$((n+1));; esac pattern-matching for the desired string with $((...)) arithmetic expansion to increment the counter variable.





              share|improve this answer














              awk, multiple occurences, space-separated



              $ awk 'for(i=1;i<=NF;i++ ) count+=match($i,/Good-.*-Cat/);ENDprint count' input.txt
              4
              $ cat input.txt
              Good-Black-Cat
              Bad-Red-Cat
              Bad-Gray-Dog
              Good-Golden-Dog Good-Whatever-Cat Good-Something-Cat
              Bad-White-Dog
              Good-Tabby-Cat
              Bad-Siamese-Cat


              sed + wc, non-multiple occurences



              This uses negative pattern matching //! with d for delete, leaving only lines of interest.



              $ sed '/Good-.*-Cat/!d' input.txt
              Good-Black-Cat
              Good-Golden-Dog Good-Whatever-Cat
              Good-Tabby-Cat
              $ sed '/Good-.*-Cat/!d' input.txt | wc -l
              3


              Shell solution, non-multiple occurences



              Here's shell way that combines case...esac and file-reading loop:



              $ n=0; while IFS= read -r line || [ -n "$line" ]; do case "$line" in "Good-"*"-Cat") n=$((n+1));; esac; done < input.txt; echo "$n"
              2


              Or with indientation



              n=0
              while IFS= read -r line || [ -n "$line" ]; do
              case "$line" in
              "Good-"*"-Cat") n=$((n+1));;
              esac
              done < input.txt
              echo "$n"


              Explanation:




              • n=0 initializes n counter variable


              • while IFS= read -r line || [ -n "$line" ]; do...done < input.txt is standard file-reading loop used in shell scripting, with || [ -n "$line" ] protection to account for possible files that don't end in newline


              • case "$line" in "Good-"*"-Cat") n=$((n+1));; esac pattern-matching for the desired string with $((...)) arithmetic expansion to increment the counter variable.






              share|improve this answer














              share|improve this answer



              share|improve this answer








              edited May 26 at 23:24

























              answered May 26 at 22:58









              Sergiy Kolodyazhnyy

              64k9127274




              64k9127274




















                  up vote
                  3
                  down vote













                  Non-fancy sed/grep version



                  sed 's/(Good-[^ ]*-Cat)/XXXXn/g' input.txt | grep -c XXXX


                  While XXXX can be any pattern that does not appear otherwise in your file. This approach replaces all matches with the XXXX pattern and a newline, so to make it easily countable by a basic grep expression.



                  By the way if you take "Where * could be anything" literally, at least to my understanding, the output of any such program would always be 0 or 1, so I am assuming that it should not contain a space at least.






                  share|improve this answer
























                    up vote
                    3
                    down vote













                    Non-fancy sed/grep version



                    sed 's/(Good-[^ ]*-Cat)/XXXXn/g' input.txt | grep -c XXXX


                    While XXXX can be any pattern that does not appear otherwise in your file. This approach replaces all matches with the XXXX pattern and a newline, so to make it easily countable by a basic grep expression.



                    By the way if you take "Where * could be anything" literally, at least to my understanding, the output of any such program would always be 0 or 1, so I am assuming that it should not contain a space at least.






                    share|improve this answer






















                      up vote
                      3
                      down vote










                      up vote
                      3
                      down vote









                      Non-fancy sed/grep version



                      sed 's/(Good-[^ ]*-Cat)/XXXXn/g' input.txt | grep -c XXXX


                      While XXXX can be any pattern that does not appear otherwise in your file. This approach replaces all matches with the XXXX pattern and a newline, so to make it easily countable by a basic grep expression.



                      By the way if you take "Where * could be anything" literally, at least to my understanding, the output of any such program would always be 0 or 1, so I am assuming that it should not contain a space at least.






                      share|improve this answer












                      Non-fancy sed/grep version



                      sed 's/(Good-[^ ]*-Cat)/XXXXn/g' input.txt | grep -c XXXX


                      While XXXX can be any pattern that does not appear otherwise in your file. This approach replaces all matches with the XXXX pattern and a newline, so to make it easily countable by a basic grep expression.



                      By the way if you take "Where * could be anything" literally, at least to my understanding, the output of any such program would always be 0 or 1, so I am assuming that it should not contain a space at least.







                      share|improve this answer












                      share|improve this answer



                      share|improve this answer










                      answered May 27 at 7:27









                      Sebastian Stark

                      4,603838




                      4,603838






















                           

                          draft saved


                          draft discarded


























                           


                          draft saved


                          draft discarded














                          StackExchange.ready(
                          function ()
                          StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1040702%2fhow-to-count-number-of-partial-occurrences-of-a-string-in-a-file%23new-answer', 'question_page');

                          );

                          Post as a guest













































































                          Popular posts from this blog

                          pylint3 and pip3 broken

                          Missing snmpget and snmpwalk

                          How to enroll fingerprints to Ubuntu 17.10 with VFS491