Find an identical file with a different name [duplicate]

Clash Royale CLAN TAG#URR8PPP up vote
7
down vote
favorite
This question already has an answer here:
How to find (and delete) duplicate files
8 answers
Is it possible to find a without knowing its name?
I created a file with LaTex, then I copied it into another local directory and renamed the pdf. I don't know any more where the original file is located, but I have the renamed file on hand. I would like to make some modification to my latex file and recreate the pdf.
Since I know the original file is exactly the same as the renamed one except for the name, is there any way I can find my original file?
command-line files text-processing
marked as duplicate by muru
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Feb 28 at 13:53
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
 |Â
show 2 more comments
up vote
7
down vote
favorite
This question already has an answer here:
How to find (and delete) duplicate files
8 answers
Is it possible to find a without knowing its name?
I created a file with LaTex, then I copied it into another local directory and renamed the pdf. I don't know any more where the original file is located, but I have the renamed file on hand. I would like to make some modification to my latex file and recreate the pdf.
Since I know the original file is exactly the same as the renamed one except for the name, is there any way I can find my original file?
command-line files text-processing
marked as duplicate by muru
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Feb 28 at 13:53
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
if you created your copy withcp -pas I usually do, you may have the date/time/size of the file, and I'd use that tofindit. (or I'dfindall files of the same date (or size..) etc..)
â guiverc
Feb 28 at 10:51
1
Choose an appropriate checksum method - like md5. Then calculate it for your renamed file and all potential files in your file system. The same file will have the same checksum.
â Thorbjørn Ravn Andersen
Feb 28 at 11:45
1
Yes, @ThorbjørnRavnAndersen but the same file will also have the same size which is by far easier to determine than its md5. See pa4080's excellent answer.
â PerlDuck
Feb 28 at 11:57
It's not so difficult to use md5sum comparison (alongside the size comparison). I've updated my answer with this idea.
â pa4080
Feb 28 at 13:01
1
True @Thorbjørn but theres no point in calculating MD5s of files with different sizes. Sure they will be different. First find all files with the same size as your sample file and only then you need to calculate a hash for that handful of files. But not for all files. No need to calc MD5 for all 2GB videos if you're searching a 953 bytes text file.
â PerlDuck
Feb 28 at 14:40
 |Â
show 2 more comments
up vote
7
down vote
favorite
up vote
7
down vote
favorite
This question already has an answer here:
How to find (and delete) duplicate files
8 answers
Is it possible to find a without knowing its name?
I created a file with LaTex, then I copied it into another local directory and renamed the pdf. I don't know any more where the original file is located, but I have the renamed file on hand. I would like to make some modification to my latex file and recreate the pdf.
Since I know the original file is exactly the same as the renamed one except for the name, is there any way I can find my original file?
command-line files text-processing
This question already has an answer here:
How to find (and delete) duplicate files
8 answers
Is it possible to find a without knowing its name?
I created a file with LaTex, then I copied it into another local directory and renamed the pdf. I don't know any more where the original file is located, but I have the renamed file on hand. I would like to make some modification to my latex file and recreate the pdf.
Since I know the original file is exactly the same as the renamed one except for the name, is there any way I can find my original file?
This question already has an answer here:
How to find (and delete) duplicate files
8 answers
command-line files text-processing
command-line files text-processing
edited Feb 28 at 10:40
Zanna
48.2k13120228
48.2k13120228
asked Feb 28 at 10:20
dmx
5331520
5331520
marked as duplicate by muru
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Feb 28 at 13:53
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
marked as duplicate by muru
StackExchange.ready(function()
if (StackExchange.options.isMobile) return;
$('.dupe-hammer-message-hover:not(.hover-bound)').each(function()
var $hover = $(this).addClass('hover-bound'),
$msg = $hover.siblings('.dupe-hammer-message');
$hover.hover(
function()
$hover.showInfoMessage('',
messageElement: $msg.clone().show(),
transient: false,
position: my: 'bottom left', at: 'top center', offsetTop: -7 ,
dismissable: false,
relativeToBody: true
);
,
function()
StackExchange.helpers.removeMessages();
);
);
);
Feb 28 at 13:53
This question has been asked before and already has an answer. If those answers do not fully address your question, please ask a new question.
if you created your copy withcp -pas I usually do, you may have the date/time/size of the file, and I'd use that tofindit. (or I'dfindall files of the same date (or size..) etc..)
â guiverc
Feb 28 at 10:51
1
Choose an appropriate checksum method - like md5. Then calculate it for your renamed file and all potential files in your file system. The same file will have the same checksum.
â Thorbjørn Ravn Andersen
Feb 28 at 11:45
1
Yes, @ThorbjørnRavnAndersen but the same file will also have the same size which is by far easier to determine than its md5. See pa4080's excellent answer.
â PerlDuck
Feb 28 at 11:57
It's not so difficult to use md5sum comparison (alongside the size comparison). I've updated my answer with this idea.
â pa4080
Feb 28 at 13:01
1
True @Thorbjørn but theres no point in calculating MD5s of files with different sizes. Sure they will be different. First find all files with the same size as your sample file and only then you need to calculate a hash for that handful of files. But not for all files. No need to calc MD5 for all 2GB videos if you're searching a 953 bytes text file.
â PerlDuck
Feb 28 at 14:40
 |Â
show 2 more comments
if you created your copy withcp -pas I usually do, you may have the date/time/size of the file, and I'd use that tofindit. (or I'dfindall files of the same date (or size..) etc..)
â guiverc
Feb 28 at 10:51
1
Choose an appropriate checksum method - like md5. Then calculate it for your renamed file and all potential files in your file system. The same file will have the same checksum.
â Thorbjørn Ravn Andersen
Feb 28 at 11:45
1
Yes, @ThorbjørnRavnAndersen but the same file will also have the same size which is by far easier to determine than its md5. See pa4080's excellent answer.
â PerlDuck
Feb 28 at 11:57
It's not so difficult to use md5sum comparison (alongside the size comparison). I've updated my answer with this idea.
â pa4080
Feb 28 at 13:01
1
True @Thorbjørn but theres no point in calculating MD5s of files with different sizes. Sure they will be different. First find all files with the same size as your sample file and only then you need to calculate a hash for that handful of files. But not for all files. No need to calc MD5 for all 2GB videos if you're searching a 953 bytes text file.
â PerlDuck
Feb 28 at 14:40
if you created your copy with
cp -p as I usually do, you may have the date/time/size of the file, and I'd use that to find it. (or I'd find all files of the same date (or size..) etc..)â guiverc
Feb 28 at 10:51
if you created your copy with
cp -p as I usually do, you may have the date/time/size of the file, and I'd use that to find it. (or I'd find all files of the same date (or size..) etc..)â guiverc
Feb 28 at 10:51
1
1
Choose an appropriate checksum method - like md5. Then calculate it for your renamed file and all potential files in your file system. The same file will have the same checksum.
â Thorbjørn Ravn Andersen
Feb 28 at 11:45
Choose an appropriate checksum method - like md5. Then calculate it for your renamed file and all potential files in your file system. The same file will have the same checksum.
â Thorbjørn Ravn Andersen
Feb 28 at 11:45
1
1
Yes, @ThorbjørnRavnAndersen but the same file will also have the same size which is by far easier to determine than its md5. See pa4080's excellent answer.
â PerlDuck
Feb 28 at 11:57
Yes, @ThorbjørnRavnAndersen but the same file will also have the same size which is by far easier to determine than its md5. See pa4080's excellent answer.
â PerlDuck
Feb 28 at 11:57
It's not so difficult to use md5sum comparison (alongside the size comparison). I've updated my answer with this idea.
â pa4080
Feb 28 at 13:01
It's not so difficult to use md5sum comparison (alongside the size comparison). I've updated my answer with this idea.
â pa4080
Feb 28 at 13:01
1
1
True @Thorbjørn but theres no point in calculating MD5s of files with different sizes. Sure they will be different. First find all files with the same size as your sample file and only then you need to calculate a hash for that handful of files. But not for all files. No need to calc MD5 for all 2GB videos if you're searching a 953 bytes text file.
â PerlDuck
Feb 28 at 14:40
True @Thorbjørn but theres no point in calculating MD5s of files with different sizes. Sure they will be different. First find all files with the same size as your sample file and only then you need to calculate a hash for that handful of files. But not for all files. No need to calc MD5 for all 2GB videos if you're searching a 953 bytes text file.
â PerlDuck
Feb 28 at 14:40
 |Â
show 2 more comments
5 Answers
5
active
oldest
votes
up vote
5
down vote
When the only difference is the name booth files should have the same content and size.
1. About the content. We can compare two files by the command diff file-1 file-2. Also we can use this command for a test in this way:
diff -q file-1 file-2 > /dev/null && echo 'equal' || echo 'different'
2. About the size. We can find a file with certain size by the command (where 12672 is the file size in bytes):
find /path/to/search -type f -size 12672c -printf '%pn'
Or we can use a range in this way (where 12600-12700 is the file size range in bytes):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pn'
Note that, by default the command find works recursively.
3. Combine the two methods (where file-1 is our pattern file):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
4. Example. Let's assume we have the following directory structure:
$ tree /tmp/test
/tmp/test
âÂÂâÂÂâ file-1 # this is the pattern file
âÂÂâÂÂâ file-2 # this is almost the same file but wit few additional characters
âÂÂâÂÂâ file-3 # this is exact copy of file-1
The result of the above command will be:
$ find /tmp/test -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
/tmp/test/file-2 different # OK: here we have added few additional characters
/tmp/test/file-3 equal # OK: this is exact copy of file-1
/tmp/test/file-1 equal # OK: this is file-1 compared to its self
Or we can simplify the output by changing our command in this way:
$ find /tmp/test -type f -not -name "file-1" -size -12700c -size +12600c
-exec sh -c 'diff -q file-1 "$1" > /dev/null && printf "%stis equaln" "$1"' sh ;
/tmp/test/file-3 is equal
Update from the comments. The following command finding for file with the same size as the file-1, and then the diff commas is involved with --brief and --report-identical-files options:
find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec diff -qs file-1 ;
Files file-1 and /tmp/test/file-3 are identical
We can compare md5sum of the files in this way:
Get the md5sum of the pattern file:
$ md5sum file-1
d18b61a77779d69e095be5942f6be7a7 file-1Use it with our command:
$ find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec sh -c 'echo "d18b61a77779d69e095be5942f6be7a7 $1" | md5sum -c -' sh ;
/tmp/test/file-3: OK
2
+1 for using the size in first place. By the way, the diff command line can be simplified:diff -qs file-1 file-2will do the job :-)
â sudodus
Feb 28 at 11:42
2
@sudodus The-qsprobably speeds updiffbecause it can stop comparing the files when the first mismatch is found.
â PerlDuck
Feb 28 at 11:54
1
cmp -sis better thandiffwhen the goal is simply to tell if two files are equal.
â John Kugelman
Feb 28 at 13:54
1
You need to learn about xargs
â Thorbjørn Ravn Andersen
Feb 28 at 14:10
@JohnKugelman, I followed your advice and started to usecmp -sfor similar cases: askubuntu.com/a/1014921/566421 :-)
â pa4080
Mar 14 at 16:06
add a comment |Â
up vote
3
down vote
- You can search for a particular string with
grep -rl "string"
(-r for recursive, finding the string in files, -l for showing the filename, not the string)
you shouldn't use the -i as mentioned in @mrfred489 's link as files are identical and it can slow things down
â Simon Van Machin
Feb 28 at 10:36
add a comment |Â
up vote
3
down vote
If you're looking for (or okay with) a GUI application, you may try the "FSlint Janitor" application. You can install it by running
sudo apt-get install fslint
How to use the application:
Once installed, follow the steps below.
- Launch the application.
- Select the "Duplicates" option (1) to search files with the same content.
- Click on the "+ Add" button (2) and select the directories to look for files (make sure the "recurse" option is checked to include sub-directories).
- Click on the "Find" button (3) and wait.

add a comment |Â
up vote
3
down vote
This might take a while, but it should be effective and reliable. It assumes you are using Bash. Replace file with the name of your renamed file:
shopt -s globstar
for i in **; do [ -f "$i" ] && cmp --silent file "$i" && echo "$i"; done
shopt -s globstarturns on recursive globbing with**. You can turn it off withshopt -u globstar, but it is off by default and will be off when you open a new shell.for i in **loop over all files below this one. Run the command from the highest level directory that might contain the file or the directory with the file or the directory... (apply recursion to this sentence!)[ -f "$i" ] &&if the file is a regular file that exists then...cmp --silent file "$i" &&if there is no difference betweenfileand the file being examined (ie ifcmpexits successfully), then...echo $iprint the relative path of the file (this also prints the path offileitself, but I didn't see much benefit in fixing that).
Thanks to this answer on Stack Overflow for the cmp method of comparing files.
add a comment |Â
up vote
2
down vote
Grep can find it quickly
When used properly, the grep command can find the duplicate quickly. You must be careful not to search the whole file system or it will take days to complete. I recently documented this here: `grep`ing all files for a string takes a long time
For optimum speed use:
grep -rnw --exclude-dir=boot,dev,lib,media,mnt,proc,root,run,sys,/tmp,tmpfs,var '/' -e 'String in file'
If your file might be on a Windows directory remove the mnt directory.
If you know the file is within the /home directory someplace you can shorten the command:
grep -rnw '/home' -e 'String in file'
add a comment |Â
5 Answers
5
active
oldest
votes
5 Answers
5
active
oldest
votes
active
oldest
votes
active
oldest
votes
up vote
5
down vote
When the only difference is the name booth files should have the same content and size.
1. About the content. We can compare two files by the command diff file-1 file-2. Also we can use this command for a test in this way:
diff -q file-1 file-2 > /dev/null && echo 'equal' || echo 'different'
2. About the size. We can find a file with certain size by the command (where 12672 is the file size in bytes):
find /path/to/search -type f -size 12672c -printf '%pn'
Or we can use a range in this way (where 12600-12700 is the file size range in bytes):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pn'
Note that, by default the command find works recursively.
3. Combine the two methods (where file-1 is our pattern file):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
4. Example. Let's assume we have the following directory structure:
$ tree /tmp/test
/tmp/test
âÂÂâÂÂâ file-1 # this is the pattern file
âÂÂâÂÂâ file-2 # this is almost the same file but wit few additional characters
âÂÂâÂÂâ file-3 # this is exact copy of file-1
The result of the above command will be:
$ find /tmp/test -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
/tmp/test/file-2 different # OK: here we have added few additional characters
/tmp/test/file-3 equal # OK: this is exact copy of file-1
/tmp/test/file-1 equal # OK: this is file-1 compared to its self
Or we can simplify the output by changing our command in this way:
$ find /tmp/test -type f -not -name "file-1" -size -12700c -size +12600c
-exec sh -c 'diff -q file-1 "$1" > /dev/null && printf "%stis equaln" "$1"' sh ;
/tmp/test/file-3 is equal
Update from the comments. The following command finding for file with the same size as the file-1, and then the diff commas is involved with --brief and --report-identical-files options:
find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec diff -qs file-1 ;
Files file-1 and /tmp/test/file-3 are identical
We can compare md5sum of the files in this way:
Get the md5sum of the pattern file:
$ md5sum file-1
d18b61a77779d69e095be5942f6be7a7 file-1Use it with our command:
$ find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec sh -c 'echo "d18b61a77779d69e095be5942f6be7a7 $1" | md5sum -c -' sh ;
/tmp/test/file-3: OK
2
+1 for using the size in first place. By the way, the diff command line can be simplified:diff -qs file-1 file-2will do the job :-)
â sudodus
Feb 28 at 11:42
2
@sudodus The-qsprobably speeds updiffbecause it can stop comparing the files when the first mismatch is found.
â PerlDuck
Feb 28 at 11:54
1
cmp -sis better thandiffwhen the goal is simply to tell if two files are equal.
â John Kugelman
Feb 28 at 13:54
1
You need to learn about xargs
â Thorbjørn Ravn Andersen
Feb 28 at 14:10
@JohnKugelman, I followed your advice and started to usecmp -sfor similar cases: askubuntu.com/a/1014921/566421 :-)
â pa4080
Mar 14 at 16:06
add a comment |Â
up vote
5
down vote
When the only difference is the name booth files should have the same content and size.
1. About the content. We can compare two files by the command diff file-1 file-2. Also we can use this command for a test in this way:
diff -q file-1 file-2 > /dev/null && echo 'equal' || echo 'different'
2. About the size. We can find a file with certain size by the command (where 12672 is the file size in bytes):
find /path/to/search -type f -size 12672c -printf '%pn'
Or we can use a range in this way (where 12600-12700 is the file size range in bytes):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pn'
Note that, by default the command find works recursively.
3. Combine the two methods (where file-1 is our pattern file):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
4. Example. Let's assume we have the following directory structure:
$ tree /tmp/test
/tmp/test
âÂÂâÂÂâ file-1 # this is the pattern file
âÂÂâÂÂâ file-2 # this is almost the same file but wit few additional characters
âÂÂâÂÂâ file-3 # this is exact copy of file-1
The result of the above command will be:
$ find /tmp/test -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
/tmp/test/file-2 different # OK: here we have added few additional characters
/tmp/test/file-3 equal # OK: this is exact copy of file-1
/tmp/test/file-1 equal # OK: this is file-1 compared to its self
Or we can simplify the output by changing our command in this way:
$ find /tmp/test -type f -not -name "file-1" -size -12700c -size +12600c
-exec sh -c 'diff -q file-1 "$1" > /dev/null && printf "%stis equaln" "$1"' sh ;
/tmp/test/file-3 is equal
Update from the comments. The following command finding for file with the same size as the file-1, and then the diff commas is involved with --brief and --report-identical-files options:
find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec diff -qs file-1 ;
Files file-1 and /tmp/test/file-3 are identical
We can compare md5sum of the files in this way:
Get the md5sum of the pattern file:
$ md5sum file-1
d18b61a77779d69e095be5942f6be7a7 file-1Use it with our command:
$ find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec sh -c 'echo "d18b61a77779d69e095be5942f6be7a7 $1" | md5sum -c -' sh ;
/tmp/test/file-3: OK
2
+1 for using the size in first place. By the way, the diff command line can be simplified:diff -qs file-1 file-2will do the job :-)
â sudodus
Feb 28 at 11:42
2
@sudodus The-qsprobably speeds updiffbecause it can stop comparing the files when the first mismatch is found.
â PerlDuck
Feb 28 at 11:54
1
cmp -sis better thandiffwhen the goal is simply to tell if two files are equal.
â John Kugelman
Feb 28 at 13:54
1
You need to learn about xargs
â Thorbjørn Ravn Andersen
Feb 28 at 14:10
@JohnKugelman, I followed your advice and started to usecmp -sfor similar cases: askubuntu.com/a/1014921/566421 :-)
â pa4080
Mar 14 at 16:06
add a comment |Â
up vote
5
down vote
up vote
5
down vote
When the only difference is the name booth files should have the same content and size.
1. About the content. We can compare two files by the command diff file-1 file-2. Also we can use this command for a test in this way:
diff -q file-1 file-2 > /dev/null && echo 'equal' || echo 'different'
2. About the size. We can find a file with certain size by the command (where 12672 is the file size in bytes):
find /path/to/search -type f -size 12672c -printf '%pn'
Or we can use a range in this way (where 12600-12700 is the file size range in bytes):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pn'
Note that, by default the command find works recursively.
3. Combine the two methods (where file-1 is our pattern file):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
4. Example. Let's assume we have the following directory structure:
$ tree /tmp/test
/tmp/test
âÂÂâÂÂâ file-1 # this is the pattern file
âÂÂâÂÂâ file-2 # this is almost the same file but wit few additional characters
âÂÂâÂÂâ file-3 # this is exact copy of file-1
The result of the above command will be:
$ find /tmp/test -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
/tmp/test/file-2 different # OK: here we have added few additional characters
/tmp/test/file-3 equal # OK: this is exact copy of file-1
/tmp/test/file-1 equal # OK: this is file-1 compared to its self
Or we can simplify the output by changing our command in this way:
$ find /tmp/test -type f -not -name "file-1" -size -12700c -size +12600c
-exec sh -c 'diff -q file-1 "$1" > /dev/null && printf "%stis equaln" "$1"' sh ;
/tmp/test/file-3 is equal
Update from the comments. The following command finding for file with the same size as the file-1, and then the diff commas is involved with --brief and --report-identical-files options:
find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec diff -qs file-1 ;
Files file-1 and /tmp/test/file-3 are identical
We can compare md5sum of the files in this way:
Get the md5sum of the pattern file:
$ md5sum file-1
d18b61a77779d69e095be5942f6be7a7 file-1Use it with our command:
$ find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec sh -c 'echo "d18b61a77779d69e095be5942f6be7a7 $1" | md5sum -c -' sh ;
/tmp/test/file-3: OK
When the only difference is the name booth files should have the same content and size.
1. About the content. We can compare two files by the command diff file-1 file-2. Also we can use this command for a test in this way:
diff -q file-1 file-2 > /dev/null && echo 'equal' || echo 'different'
2. About the size. We can find a file with certain size by the command (where 12672 is the file size in bytes):
find /path/to/search -type f -size 12672c -printf '%pn'
Or we can use a range in this way (where 12600-12700 is the file size range in bytes):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pn'
Note that, by default the command find works recursively.
3. Combine the two methods (where file-1 is our pattern file):
find /path/to/search -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
4. Example. Let's assume we have the following directory structure:
$ tree /tmp/test
/tmp/test
âÂÂâÂÂâ file-1 # this is the pattern file
âÂÂâÂÂâ file-2 # this is almost the same file but wit few additional characters
âÂÂâÂÂâ file-3 # this is exact copy of file-1
The result of the above command will be:
$ find /tmp/test -type f -size -12700c -size +12600c -printf '%pt' -exec sh -c 'diff -q file-1 "$1" > /dev/null && echo "equal" || echo "different"' sh ;
/tmp/test/file-2 different # OK: here we have added few additional characters
/tmp/test/file-3 equal # OK: this is exact copy of file-1
/tmp/test/file-1 equal # OK: this is file-1 compared to its self
Or we can simplify the output by changing our command in this way:
$ find /tmp/test -type f -not -name "file-1" -size -12700c -size +12600c
-exec sh -c 'diff -q file-1 "$1" > /dev/null && printf "%stis equaln" "$1"' sh ;
/tmp/test/file-3 is equal
Update from the comments. The following command finding for file with the same size as the file-1, and then the diff commas is involved with --brief and --report-identical-files options:
find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec diff -qs file-1 ;
Files file-1 and /tmp/test/file-3 are identical
We can compare md5sum of the files in this way:
Get the md5sum of the pattern file:
$ md5sum file-1
d18b61a77779d69e095be5942f6be7a7 file-1Use it with our command:
$ find /path -type f -not -name "file-1" -size $(stat -c%s file-1)c -exec sh -c 'echo "d18b61a77779d69e095be5942f6be7a7 $1" | md5sum -c -' sh ;
/tmp/test/file-3: OK
edited Feb 28 at 13:58
ilkkachu
981210
981210
answered Feb 28 at 11:20
pa4080
12.3k52256
12.3k52256
2
+1 for using the size in first place. By the way, the diff command line can be simplified:diff -qs file-1 file-2will do the job :-)
â sudodus
Feb 28 at 11:42
2
@sudodus The-qsprobably speeds updiffbecause it can stop comparing the files when the first mismatch is found.
â PerlDuck
Feb 28 at 11:54
1
cmp -sis better thandiffwhen the goal is simply to tell if two files are equal.
â John Kugelman
Feb 28 at 13:54
1
You need to learn about xargs
â Thorbjørn Ravn Andersen
Feb 28 at 14:10
@JohnKugelman, I followed your advice and started to usecmp -sfor similar cases: askubuntu.com/a/1014921/566421 :-)
â pa4080
Mar 14 at 16:06
add a comment |Â
2
+1 for using the size in first place. By the way, the diff command line can be simplified:diff -qs file-1 file-2will do the job :-)
â sudodus
Feb 28 at 11:42
2
@sudodus The-qsprobably speeds updiffbecause it can stop comparing the files when the first mismatch is found.
â PerlDuck
Feb 28 at 11:54
1
cmp -sis better thandiffwhen the goal is simply to tell if two files are equal.
â John Kugelman
Feb 28 at 13:54
1
You need to learn about xargs
â Thorbjørn Ravn Andersen
Feb 28 at 14:10
@JohnKugelman, I followed your advice and started to usecmp -sfor similar cases: askubuntu.com/a/1014921/566421 :-)
â pa4080
Mar 14 at 16:06
2
2
+1 for using the size in first place. By the way, the diff command line can be simplified:
diff -qs file-1 file-2 will do the job :-)â sudodus
Feb 28 at 11:42
+1 for using the size in first place. By the way, the diff command line can be simplified:
diff -qs file-1 file-2 will do the job :-)â sudodus
Feb 28 at 11:42
2
2
@sudodus The
-qs probably speeds up diffbecause it can stop comparing the files when the first mismatch is found.â PerlDuck
Feb 28 at 11:54
@sudodus The
-qs probably speeds up diffbecause it can stop comparing the files when the first mismatch is found.â PerlDuck
Feb 28 at 11:54
1
1
cmp -s is better than diff when the goal is simply to tell if two files are equal.â John Kugelman
Feb 28 at 13:54
cmp -s is better than diff when the goal is simply to tell if two files are equal.â John Kugelman
Feb 28 at 13:54
1
1
You need to learn about xargs
â Thorbjørn Ravn Andersen
Feb 28 at 14:10
You need to learn about xargs
â Thorbjørn Ravn Andersen
Feb 28 at 14:10
@JohnKugelman, I followed your advice and started to use
cmp -s for similar cases: askubuntu.com/a/1014921/566421 :-)â pa4080
Mar 14 at 16:06
@JohnKugelman, I followed your advice and started to use
cmp -s for similar cases: askubuntu.com/a/1014921/566421 :-)â pa4080
Mar 14 at 16:06
add a comment |Â
up vote
3
down vote
- You can search for a particular string with
grep -rl "string"
(-r for recursive, finding the string in files, -l for showing the filename, not the string)
you shouldn't use the -i as mentioned in @mrfred489 's link as files are identical and it can slow things down
â Simon Van Machin
Feb 28 at 10:36
add a comment |Â
up vote
3
down vote
- You can search for a particular string with
grep -rl "string"
(-r for recursive, finding the string in files, -l for showing the filename, not the string)
you shouldn't use the -i as mentioned in @mrfred489 's link as files are identical and it can slow things down
â Simon Van Machin
Feb 28 at 10:36
add a comment |Â
up vote
3
down vote
up vote
3
down vote
- You can search for a particular string with
grep -rl "string"
(-r for recursive, finding the string in files, -l for showing the filename, not the string)
- You can search for a particular string with
grep -rl "string"
(-r for recursive, finding the string in files, -l for showing the filename, not the string)
answered Feb 28 at 10:35
Simon Van Machin
23010
23010
you shouldn't use the -i as mentioned in @mrfred489 's link as files are identical and it can slow things down
â Simon Van Machin
Feb 28 at 10:36
add a comment |Â
you shouldn't use the -i as mentioned in @mrfred489 's link as files are identical and it can slow things down
â Simon Van Machin
Feb 28 at 10:36
you shouldn't use the -i as mentioned in @mrfred489 's link as files are identical and it can slow things down
â Simon Van Machin
Feb 28 at 10:36
you shouldn't use the -i as mentioned in @mrfred489 's link as files are identical and it can slow things down
â Simon Van Machin
Feb 28 at 10:36
add a comment |Â
up vote
3
down vote
If you're looking for (or okay with) a GUI application, you may try the "FSlint Janitor" application. You can install it by running
sudo apt-get install fslint
How to use the application:
Once installed, follow the steps below.
- Launch the application.
- Select the "Duplicates" option (1) to search files with the same content.
- Click on the "+ Add" button (2) and select the directories to look for files (make sure the "recurse" option is checked to include sub-directories).
- Click on the "Find" button (3) and wait.

add a comment |Â
up vote
3
down vote
If you're looking for (or okay with) a GUI application, you may try the "FSlint Janitor" application. You can install it by running
sudo apt-get install fslint
How to use the application:
Once installed, follow the steps below.
- Launch the application.
- Select the "Duplicates" option (1) to search files with the same content.
- Click on the "+ Add" button (2) and select the directories to look for files (make sure the "recurse" option is checked to include sub-directories).
- Click on the "Find" button (3) and wait.

add a comment |Â
up vote
3
down vote
up vote
3
down vote
If you're looking for (or okay with) a GUI application, you may try the "FSlint Janitor" application. You can install it by running
sudo apt-get install fslint
How to use the application:
Once installed, follow the steps below.
- Launch the application.
- Select the "Duplicates" option (1) to search files with the same content.
- Click on the "+ Add" button (2) and select the directories to look for files (make sure the "recurse" option is checked to include sub-directories).
- Click on the "Find" button (3) and wait.

If you're looking for (or okay with) a GUI application, you may try the "FSlint Janitor" application. You can install it by running
sudo apt-get install fslint
How to use the application:
Once installed, follow the steps below.
- Launch the application.
- Select the "Duplicates" option (1) to search files with the same content.
- Click on the "+ Add" button (2) and select the directories to look for files (make sure the "recurse" option is checked to include sub-directories).
- Click on the "Find" button (3) and wait.

answered Feb 28 at 11:18
pomsky
23.3k77299
23.3k77299
add a comment |Â
add a comment |Â
up vote
3
down vote
This might take a while, but it should be effective and reliable. It assumes you are using Bash. Replace file with the name of your renamed file:
shopt -s globstar
for i in **; do [ -f "$i" ] && cmp --silent file "$i" && echo "$i"; done
shopt -s globstarturns on recursive globbing with**. You can turn it off withshopt -u globstar, but it is off by default and will be off when you open a new shell.for i in **loop over all files below this one. Run the command from the highest level directory that might contain the file or the directory with the file or the directory... (apply recursion to this sentence!)[ -f "$i" ] &&if the file is a regular file that exists then...cmp --silent file "$i" &&if there is no difference betweenfileand the file being examined (ie ifcmpexits successfully), then...echo $iprint the relative path of the file (this also prints the path offileitself, but I didn't see much benefit in fixing that).
Thanks to this answer on Stack Overflow for the cmp method of comparing files.
add a comment |Â
up vote
3
down vote
This might take a while, but it should be effective and reliable. It assumes you are using Bash. Replace file with the name of your renamed file:
shopt -s globstar
for i in **; do [ -f "$i" ] && cmp --silent file "$i" && echo "$i"; done
shopt -s globstarturns on recursive globbing with**. You can turn it off withshopt -u globstar, but it is off by default and will be off when you open a new shell.for i in **loop over all files below this one. Run the command from the highest level directory that might contain the file or the directory with the file or the directory... (apply recursion to this sentence!)[ -f "$i" ] &&if the file is a regular file that exists then...cmp --silent file "$i" &&if there is no difference betweenfileand the file being examined (ie ifcmpexits successfully), then...echo $iprint the relative path of the file (this also prints the path offileitself, but I didn't see much benefit in fixing that).
Thanks to this answer on Stack Overflow for the cmp method of comparing files.
add a comment |Â
up vote
3
down vote
up vote
3
down vote
This might take a while, but it should be effective and reliable. It assumes you are using Bash. Replace file with the name of your renamed file:
shopt -s globstar
for i in **; do [ -f "$i" ] && cmp --silent file "$i" && echo "$i"; done
shopt -s globstarturns on recursive globbing with**. You can turn it off withshopt -u globstar, but it is off by default and will be off when you open a new shell.for i in **loop over all files below this one. Run the command from the highest level directory that might contain the file or the directory with the file or the directory... (apply recursion to this sentence!)[ -f "$i" ] &&if the file is a regular file that exists then...cmp --silent file "$i" &&if there is no difference betweenfileand the file being examined (ie ifcmpexits successfully), then...echo $iprint the relative path of the file (this also prints the path offileitself, but I didn't see much benefit in fixing that).
Thanks to this answer on Stack Overflow for the cmp method of comparing files.
This might take a while, but it should be effective and reliable. It assumes you are using Bash. Replace file with the name of your renamed file:
shopt -s globstar
for i in **; do [ -f "$i" ] && cmp --silent file "$i" && echo "$i"; done
shopt -s globstarturns on recursive globbing with**. You can turn it off withshopt -u globstar, but it is off by default and will be off when you open a new shell.for i in **loop over all files below this one. Run the command from the highest level directory that might contain the file or the directory with the file or the directory... (apply recursion to this sentence!)[ -f "$i" ] &&if the file is a regular file that exists then...cmp --silent file "$i" &&if there is no difference betweenfileand the file being examined (ie ifcmpexits successfully), then...echo $iprint the relative path of the file (this also prints the path offileitself, but I didn't see much benefit in fixing that).
Thanks to this answer on Stack Overflow for the cmp method of comparing files.
edited Feb 28 at 11:29
answered Feb 28 at 10:58
Zanna
48.2k13120228
48.2k13120228
add a comment |Â
add a comment |Â
up vote
2
down vote
Grep can find it quickly
When used properly, the grep command can find the duplicate quickly. You must be careful not to search the whole file system or it will take days to complete. I recently documented this here: `grep`ing all files for a string takes a long time
For optimum speed use:
grep -rnw --exclude-dir=boot,dev,lib,media,mnt,proc,root,run,sys,/tmp,tmpfs,var '/' -e 'String in file'
If your file might be on a Windows directory remove the mnt directory.
If you know the file is within the /home directory someplace you can shorten the command:
grep -rnw '/home' -e 'String in file'
add a comment |Â
up vote
2
down vote
Grep can find it quickly
When used properly, the grep command can find the duplicate quickly. You must be careful not to search the whole file system or it will take days to complete. I recently documented this here: `grep`ing all files for a string takes a long time
For optimum speed use:
grep -rnw --exclude-dir=boot,dev,lib,media,mnt,proc,root,run,sys,/tmp,tmpfs,var '/' -e 'String in file'
If your file might be on a Windows directory remove the mnt directory.
If you know the file is within the /home directory someplace you can shorten the command:
grep -rnw '/home' -e 'String in file'
add a comment |Â
up vote
2
down vote
up vote
2
down vote
Grep can find it quickly
When used properly, the grep command can find the duplicate quickly. You must be careful not to search the whole file system or it will take days to complete. I recently documented this here: `grep`ing all files for a string takes a long time
For optimum speed use:
grep -rnw --exclude-dir=boot,dev,lib,media,mnt,proc,root,run,sys,/tmp,tmpfs,var '/' -e 'String in file'
If your file might be on a Windows directory remove the mnt directory.
If you know the file is within the /home directory someplace you can shorten the command:
grep -rnw '/home' -e 'String in file'
Grep can find it quickly
When used properly, the grep command can find the duplicate quickly. You must be careful not to search the whole file system or it will take days to complete. I recently documented this here: `grep`ing all files for a string takes a long time
For optimum speed use:
grep -rnw --exclude-dir=boot,dev,lib,media,mnt,proc,root,run,sys,/tmp,tmpfs,var '/' -e 'String in file'
If your file might be on a Windows directory remove the mnt directory.
If you know the file is within the /home directory someplace you can shorten the command:
grep -rnw '/home' -e 'String in file'
answered Feb 28 at 11:18
WinEunuuchs2Unix
36.2k759134
36.2k759134
add a comment |Â
add a comment |Â
if you created your copy with
cp -pas I usually do, you may have the date/time/size of the file, and I'd use that tofindit. (or I'dfindall files of the same date (or size..) etc..)â guiverc
Feb 28 at 10:51
1
Choose an appropriate checksum method - like md5. Then calculate it for your renamed file and all potential files in your file system. The same file will have the same checksum.
â Thorbjørn Ravn Andersen
Feb 28 at 11:45
1
Yes, @ThorbjørnRavnAndersen but the same file will also have the same size which is by far easier to determine than its md5. See pa4080's excellent answer.
â PerlDuck
Feb 28 at 11:57
It's not so difficult to use md5sum comparison (alongside the size comparison). I've updated my answer with this idea.
â pa4080
Feb 28 at 13:01
1
True @Thorbjørn but theres no point in calculating MD5s of files with different sizes. Sure they will be different. First find all files with the same size as your sample file and only then you need to calculate a hash for that handful of files. But not for all files. No need to calc MD5 for all 2GB videos if you're searching a 953 bytes text file.
â PerlDuck
Feb 28 at 14:40