How to count number of partial occurrences of a string in a file

up vote
5
down vote

favorite

I have a file of which I need to count all partial matches for an input string in a file.

I'll show you an easy example of what I need:

In a file with this content:

Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

I need to count how many times does the partial string "Good -*-Cat" (Where * could be anything, it doesn't matter) appears. The expected output count is 2.

Any help will be appreciated.

asked May 26 at 21:47

Rodrigo Andres Nava Lara

414

add a commentÂ |Â

up vote
5
down vote

favorite

I have a file of which I need to count all partial matches for an input string in a file.

I'll show you an easy example of what I need:

In a file with this content:

Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

I need to count how many times does the partial string "Good -*-Cat" (Where * could be anything, it doesn't matter) appears. The expected output count is 2.

Any help will be appreciated.

asked May 26 at 21:47

Rodrigo Andres Nava Lara

414

add a commentÂ |Â

up vote
5
down vote

favorite

I have a file of which I need to count all partial matches for an input string in a file.

I'll show you an easy example of what I need:

In a file with this content:

Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

I need to count how many times does the partial string "Good -*-Cat" (Where * could be anything, it doesn't matter) appears. The expected output count is 2.

Any help will be appreciated.

asked May 26 at 21:47

Rodrigo Andres Nava Lara

414

I have a file of which I need to count all partial matches for an input string in a file.

I'll show you an easy example of what I need:

In a file with this content:

Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

I need to count how many times does the partial string "Good -*-Cat" (Where * could be anything, it doesn't matter) appears. The expected output count is 2.

Any help will be appreciated.

asked May 26 at 21:47

Rodrigo Andres Nava Lara

414

asked May 26 at 21:47

Rodrigo Andres Nava Lara

414

asked May 26 at 21:47

Rodrigo Andres Nava Lara

414

asked May 26 at 21:47

Rodrigo Andres Nava Lara

414

add a commentÂ |Â

3 Answers
3

active

oldest

votes

up vote
12
down vote

Given

$ cat file
Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

then

$ grep -c 'Good-.*-Cat' file
2

Note that this is a count of matching lines - so for example it won't work for multiple occurrences per line, or for occurrences that span lines.

Alternatively, with awk

awk '/Good-.*-Cat/ n++ END print n' file

If you need to match multiple possible occurrences per line, then I'd suggest perl:

perl -lne '$c += () = /Good-.*?-Cat/g } wc -l

If you also need to match occurrences that may span a line boundary, then you can do so in perl by unsetting the record separator (note: this means that that the whole file is slurped into memory) and adding the s regex modifier e.g.

perl -0777 -nE '$c += () = /Good-.*?-Cat/gs wc -l

perl -0777 -nE '$c += () = /Good-.*?-Cat/gs Â

up vote
12
down vote

Given

$ cat file
Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

then

$ grep -c 'Good-.*-Cat' file
2

Note that this is a count of matching lines - so for example it won't work for multiple occurrences per line, or for occurrences that span lines.

Alternatively, with awk

awk '/Good-.*-Cat/ n++ END print n' file

If you need to match multiple possible occurrences per line, then I'd suggest perl:

perl -lne '$c += () = /Good-.*?-Cat/g wc -l

perl -0777 -nE '$c += () = /Good-.*?-Cat/gs Â

up vote
12
down vote

Given

$ cat file
Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

then

$ grep -c 'Good-.*-Cat' file
2

Note that this is a count of matching lines - so for example it won't work for multiple occurrences per line, or for occurrences that span lines.

Alternatively, with awk

awk '/Good-.*-Cat/ n++ END print n' file

If you need to match multiple possible occurrences per line, then I'd suggest perl:

perl -lne '$c += () = /Good-.*?-Cat/g wc -l

perl -0777 -nE '$c += () = /Good-.*?-Cat/gs improve this answer

edited May 27 at 11:46

answered May 26 at 21:56

steeldriver

62.1k1196163

Given

$ cat file
Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

then

$ grep -c 'Good-.*-Cat' file
2

Note that this is a count of matching lines - so for example it won't work for multiple occurrences per line, or for occurrences that span lines.

Alternatively, with awk

awk '/Good-.*-Cat/ n++ END print n' file

If you need to match multiple possible occurrences per line, then I'd suggest perl:

perl -lne '$c += () = /Good-.*?-Cat/g wc -l

perl -0777 -nE '$c += () = /Good-.*?-Cat/gs { say $c' file

edited May 27 at 11:46

answered May 26 at 21:56

steeldriver

62.1k1196163

edited May 27 at 11:46

answered May 26 at 21:56

steeldriver

62.1k1196163

answered May 26 at 21:56

steeldriver

62.1k1196163

answered May 26 at 21:56

steeldriver

62.1k1196163

Thank you for your answer. In the case there is the need to count multiple occurrences per line, which would be your recommendation? I've tried the awk code and apparently it counts only matching lines as well.
â€“Â Rodrigo Andres Nava Lara
May 26 at 22:16

@RodrigoAndresNavaLara please see updated answer
â€“Â steeldriver
May 26 at 22:32

add a commentÂ |Â

Thank you for your answer. In the case there is the need to count multiple occurrences per line, which would be your recommendation? I've tried the awk code and apparently it counts only matching lines as well.
â€“Â Rodrigo Andres Nava Lara
May 26 at 22:16

@RodrigoAndresNavaLara please see updated answer
â€“Â steeldriver
May 26 at 22:32

Thank you for your answer. In the case there is the need to count multiple occurrences per line, which would be your recommendation? I've tried the awk code and apparently it counts only matching lines as well.
â€“Â Rodrigo Andres Nava Lara
May 26 at 22:16

@RodrigoAndresNavaLara please see updated answer
â€“Â steeldriver
May 26 at 22:32

add a commentÂ |Â

up vote
4
down vote

awk, multiple occurences, space-separated

$ awk 'for(i=1;i<=NF;i++ ) count+=match($i,/Good-.*-Cat/);ENDprint count' input.txt
4
$ cat input.txt
Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog Good-Whatever-Cat Good-Something-Cat
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

sed + wc, non-multiple occurences

This uses negative pattern matching //! with d for delete, leaving only lines of interest.

$ sed '/Good-.*-Cat/!d' input.txt
Good-Black-Cat
Good-Golden-Dog Good-Whatever-Cat
Good-Tabby-Cat
$ sed '/Good-.*-Cat/!d' input.txt | wc -l
3

Shell solution, non-multiple occurences

Here's shell way that combines case...esac and file-reading loop:

$ n=0; while IFS= read -r line || [ -n "$line" ]; do case "$line" in "Good-"*"-Cat") n=$((n+1));; esac; done < input.txt; echo "$n"
2

Or with indientation

n=0
while IFS= read -r line || [ -n "$line" ]; do 
 case "$line" in 
 "Good-"*"-Cat") n=$((n+1));; 
 esac
done < input.txt
echo "$n"

Explanation:

n=0 initializes n counter variable

while IFS= read -r line || [ -n "$line" ]; do...done < input.txt is standard file-reading loop used in shell scripting, with || [ -n "$line" ] protection to account for possible files that don't end in newline

case "$line" in "Good-"*"-Cat") n=$((n+1));; esac pattern-matching for the desired string with $((...)) arithmetic expansion to increment the counter variable.

edited May 26 at 23:24

answered May 26 at 22:58

Sergiy Kolodyazhnyy

64k9127274

add a commentÂ |Â

up vote
4
down vote

awk, multiple occurences, space-separated

$ awk 'for(i=1;i<=NF;i++ ) count+=match($i,/Good-.*-Cat/);ENDprint count' input.txt
4
$ cat input.txt
Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog Good-Whatever-Cat Good-Something-Cat
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

sed + wc, non-multiple occurences

This uses negative pattern matching //! with d for delete, leaving only lines of interest.

$ sed '/Good-.*-Cat/!d' input.txt
Good-Black-Cat
Good-Golden-Dog Good-Whatever-Cat
Good-Tabby-Cat
$ sed '/Good-.*-Cat/!d' input.txt | wc -l
3

Shell solution, non-multiple occurences

Here's shell way that combines case...esac and file-reading loop:

$ n=0; while IFS= read -r line || [ -n "$line" ]; do case "$line" in "Good-"*"-Cat") n=$((n+1));; esac; done < input.txt; echo "$n"
2

Or with indientation

n=0
while IFS= read -r line || [ -n "$line" ]; do 
 case "$line" in 
 "Good-"*"-Cat") n=$((n+1));; 
 esac
done < input.txt
echo "$n"

Explanation:

n=0 initializes n counter variable

while IFS= read -r line || [ -n "$line" ]; do...done < input.txt is standard file-reading loop used in shell scripting, with || [ -n "$line" ] protection to account for possible files that don't end in newline

case "$line" in "Good-"*"-Cat") n=$((n+1));; esac pattern-matching for the desired string with $((...)) arithmetic expansion to increment the counter variable.

edited May 26 at 23:24

answered May 26 at 22:58

Sergiy Kolodyazhnyy

64k9127274

add a commentÂ |Â

up vote
4
down vote

awk, multiple occurences, space-separated

$ awk 'for(i=1;i<=NF;i++ ) count+=match($i,/Good-.*-Cat/);ENDprint count' input.txt
4
$ cat input.txt
Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog Good-Whatever-Cat Good-Something-Cat
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

sed + wc, non-multiple occurences

This uses negative pattern matching //! with d for delete, leaving only lines of interest.

$ sed '/Good-.*-Cat/!d' input.txt
Good-Black-Cat
Good-Golden-Dog Good-Whatever-Cat
Good-Tabby-Cat
$ sed '/Good-.*-Cat/!d' input.txt | wc -l
3

Shell solution, non-multiple occurences

Here's shell way that combines case...esac and file-reading loop:

$ n=0; while IFS= read -r line || [ -n "$line" ]; do case "$line" in "Good-"*"-Cat") n=$((n+1));; esac; done < input.txt; echo "$n"
2

Or with indientation

n=0
while IFS= read -r line || [ -n "$line" ]; do 
 case "$line" in 
 "Good-"*"-Cat") n=$((n+1));; 
 esac
done < input.txt
echo "$n"

Explanation:

n=0 initializes n counter variable

while IFS= read -r line || [ -n "$line" ]; do...done < input.txt is standard file-reading loop used in shell scripting, with || [ -n "$line" ] protection to account for possible files that don't end in newline

case "$line" in "Good-"*"-Cat") n=$((n+1));; esac pattern-matching for the desired string with $((...)) arithmetic expansion to increment the counter variable.

edited May 26 at 23:24

answered May 26 at 22:58

Sergiy Kolodyazhnyy

64k9127274

awk, multiple occurences, space-separated

$ awk 'for(i=1;i<=NF;i++ ) count+=match($i,/Good-.*-Cat/);ENDprint count' input.txt
4
$ cat input.txt
Good-Black-Cat
Bad-Red-Cat
Bad-Gray-Dog
Good-Golden-Dog Good-Whatever-Cat Good-Something-Cat
Bad-White-Dog
Good-Tabby-Cat
Bad-Siamese-Cat

sed + wc, non-multiple occurences

This uses negative pattern matching //! with d for delete, leaving only lines of interest.

$ sed '/Good-.*-Cat/!d' input.txt
Good-Black-Cat
Good-Golden-Dog Good-Whatever-Cat
Good-Tabby-Cat
$ sed '/Good-.*-Cat/!d' input.txt | wc -l
3

Shell solution, non-multiple occurences

Here's shell way that combines case...esac and file-reading loop:

$ n=0; while IFS= read -r line || [ -n "$line" ]; do case "$line" in "Good-"*"-Cat") n=$((n+1));; esac; done < input.txt; echo "$n"
2

Or with indientation

n=0
while IFS= read -r line || [ -n "$line" ]; do 
 case "$line" in 
 "Good-"*"-Cat") n=$((n+1));; 
 esac
done < input.txt
echo "$n"

Explanation:

n=0 initializes n counter variable

while IFS= read -r line || [ -n "$line" ]; do...done < input.txt is standard file-reading loop used in shell scripting, with || [ -n "$line" ] protection to account for possible files that don't end in newline

case "$line" in "Good-"*"-Cat") n=$((n+1));; esac pattern-matching for the desired string with $((...)) arithmetic expansion to increment the counter variable.

edited May 26 at 23:24

answered May 26 at 22:58

Sergiy Kolodyazhnyy

64k9127274

edited May 26 at 23:24

answered May 26 at 22:58

Sergiy Kolodyazhnyy

64k9127274

answered May 26 at 22:58

Sergiy Kolodyazhnyy

64k9127274

answered May 26 at 22:58

Sergiy Kolodyazhnyy

64k9127274

add a commentÂ |Â

up vote
3
down vote

Non-fancy sed/grep version

sed 's/(Good-[^ ]*-Cat)/XXXXn/g' input.txt | grep -c XXXX

While XXXX can be any pattern that does not appear otherwise in your file. This approach replaces all matches with the XXXX pattern and a newline, so to make it easily countable by a basic grep expression.

By the way if you take "Where * could be anything" literally, at least to my understanding, the output of any such program would always be 0 or 1, so I am assuming that it should not contain a space at least.

answered May 27 at 7:27

Sebastian Stark

4,603838

add a commentÂ |Â

up vote
3
down vote

Non-fancy sed/grep version

sed 's/(Good-[^ ]*-Cat)/XXXXn/g' input.txt | grep -c XXXX

answered May 27 at 7:27

Sebastian Stark

4,603838

add a commentÂ |Â

up vote
3
down vote

Non-fancy sed/grep version

sed 's/(Good-[^ ]*-Cat)/XXXXn/g' input.txt | grep -c XXXX

answered May 27 at 7:27

Sebastian Stark

4,603838

Non-fancy sed/grep version

sed 's/(Good-[^ ]*-Cat)/XXXXn/g' input.txt | grep -c XXXX

answered May 27 at 7:27

Sebastian Stark

4,603838

answered May 27 at 7:27

Sebastian Stark

4,603838

answered May 27 at 7:27

Sebastian Stark

4,603838

answered May 27 at 7:27

Sebastian Stark

4,603838

add a commentÂ |Â

draft saved

draft discarded

draft saved

draft discarded

Sign up or log in

StackExchange.ready(function ()
StackExchange.helpers.onClickDraftSave('#login-link');

var $window = $(window),
onScroll = function(e)
var $elem = $('.new-login-left'),
docViewTop = $window.scrollTop(),
docViewBottom = docViewTop + $window.height(),
elemTop = $elem.offset().top,
elemBottom = elemTop + $elem.height();
if ((docViewTop elemBottom))
StackExchange.using('gps', function() StackExchange.gps.track('embedded_signup_form.view', location: 'question_page' ); );
$window.unbind('scroll', onScroll);

;
$window.on('scroll', onScroll);

);

StackExchange.ready(
function ()
StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1040702%2fhow-to-count-number-of-partial-occurrences-of-a-string-in-a-file%23new-answer', 'question_page');

);

Post as a guest

Name

Sign up or log in

Name

搜尋此網誌

Gfilui

How to count number of partial occurrences of a string in a file

3 Answers
3

awk, multiple occurences, space-separated

sed + wc, non-multiple occurences

Shell solution, non-multiple occurences

awk, multiple occurences, space-separated

sed + wc, non-multiple occurences

Shell solution, non-multiple occurences

awk, multiple occurences, space-separated

sed + wc, non-multiple occurences

Shell solution, non-multiple occurences

awk, multiple occurences, space-separated

sed + wc, non-multiple occurences

Shell solution, non-multiple occurences

Non-fancy sed/grep version

Non-fancy sed/grep version

Non-fancy sed/grep version

Non-fancy sed/grep version

Post as a guest

Popular posts from this blog

GRUB: Fatal! inconsistent data read from (0x84) 0+xxxxxx

Running the scala interactive shell from the command line

How to configure IPVS loopback address with netplan in Ubuntu 18.04?

How to count number of partial occurrences of a string in a file

3 Answers 3

awk, multiple occurences, space-separated

sed + wc, non-multiple occurences

Shell solution, non-multiple occurences

awk, multiple occurences, space-separated

sed + wc, non-multiple occurences

Shell solution, non-multiple occurences

awk, multiple occurences, space-separated

sed + wc, non-multiple occurences

Shell solution, non-multiple occurences

awk, multiple occurences, space-separated

sed + wc, non-multiple occurences

Shell solution, non-multiple occurences

Non-fancy sed/grep version

Non-fancy sed/grep version

Non-fancy sed/grep version

Non-fancy sed/grep version

Sign up or log in

Post as a guest

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Sign up or log in

Post as a guest

Popular posts from this blog

GRUB: Fatal! inconsistent data read from (0x84) 0+xxxxxx

Running the scala interactive shell from the command line

How to configure IPVS loopback address with netplan in Ubuntu 18.04?

3 Answers
3