One liner to find and remove duplicate files in Linux
Posted by ajay on October 16, 2009
I recently found a one-liner to report all duplicate files under the current directory and its subdirectories here. The command is as follows –
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate
It first compares size and then compares md5 hash in order to find duplicate files. Since this one just reports and doesnt delete the files, I’ve made slight modifications to find and DELETE duplicate files as well. Don’t worry, it’ll ask your permission before running the delete command over all files. Here it goes –
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate | cut -f3-100 -d ' ' | tr '\n.' '\t.' | sed 's/\t\t/\n/g' | cut -f2-100 | tr '\t' '\n' | perl -i -pe 's/([ (){}-])/\\$1/g' | perl -i -pe 's/'\''/\\'\''/g' | xargs -pr rm -v
The modifications are very boring, but that’s all I could do. Have a better solution? Let me know. If you want to delete files without asking permission, remove the -p after last xargs in the above command.
Have Fun :).
PS: The command is primarily to be used for deleting duplicate media files (mp3, videos, images etc.). Please dont run it on any sensitive system directory.
sandeep said
Really good.
Siraj said
Hey i am new 2 this world of linux… just wondering how to use the above script to find the duplicates in my folder /opt/songs/
euie said
copy and paste
swygue said
Awesome, thanks!
Steffi said
It’s about time somnoee wrote about this.
http://goanalyze.info/purevpn.com said
November 13, 2011 at 09:40Hi Mandar, So, does that mean that the world owes their survival to me? The test case that went wrong leading to cancellation of this rapture exercise (rescheduled for other mortals in October)? LOL! Hope you are doing well over there. Reply
privatkredit-rechner said
when not constrained to a specific page. I think Ubiquity is meant to change the way we work, and that is what I want. The problem is that we are not there yet, and it takes a little imagination to get
Cypress said
Great tip!
Alan jader said
is possible to use an automatic tool of windows like (www.dublicatefilesdeleter.com) on linux using wine ?
alastairgilfillan said
I wouldn’t trust Windows with access to my Linux FS…I can’t see why not, although I think a better approaching would be the following “one-liner”:
sudo apt-get install fdupes && fdupes -d
sam said
Interesting, I get find: illegal option — n
alastairgilfillanAlastair said
You need to use “[hyphen]n” not “[mdash][space]n”.
Mac duplicate file finder said
[…] […]
Net said
Hi,
I was trying to run the first command, but I got following error:
find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate
find: invalid argument `’ to `-size’
find: invalid argument `’ to `-size’
find: invalid argument `’ to `-size’
find: invalid argument `’ to `-size’
Exit 123
Bob said
If it doesn’t actually fit on one line, it’s not a one liner. Even if, technically, you could write the entire script on the command line.
Todd Carney said
In UNIX/LINUX, a line ends with an LF (decimal 10) character. Until the LF, linux considers a string of characters to be one line. It doesn’t matter how many rows it takes on the screen to display.
Magia en Bash 50 ejemplos en una sola linea (o dos) | Poesía Binaria said
[…] encontrar más ejemplos y otros one-liners que no he incluído en la lista): Bash One-Liners Just another tech blog Good coders code, great reuse Al pan pan y al vino vino All about Symbian Forums […]
georgesam said
I am using duplicate files deleter.com to find and delete all duplicate file by one click .Thank.
alastairgilfillanAlastair said
Stop spamming your (/client’s) product here… it is a Windows trial for some shovelware.
Xavi said
You can switch the first part of the sentence to avod doing a “find on a find”. This can be really slow if you find on / in a large server.
Use this one:
find -not -empty -type f -printf “%s \”%h/%f\”\n” | sort -rn | awk ‘dummy[$1]++’ | cut –delimiter=” ” -f 2- | xargs md5sum | sort | uniq -w32 –all-repeated=separate
The proposed one, takes a find of all files, calculate the sizes, then gets the “duplicates” in size order to calculate the md5 only on files that have exact size, but to do so, does a second find to search all files that have that specific size. If the directory has thousands of files, the N*N loop can be too heavy.
My proposal, instead, does a find and outputs in the same line a tuple of 2 fields: size and filename, separated by space and protecting the filename with double quotes for the xargs of md5sum, like this
34563733 “/my/file with/spaces inside”
Then I get the duplicated lines via “awk” as “uniq” cannot calculate uniqueness on a field, and the filesize has not a specific number of chars. Once “awked”, I have the same output than after the first “uniq” in the example. Nevertheless, I still have the tuple, I then use “cut” to get only the filename. Finally I concatenate with md5sm. The xargs lacks the -0 option because I have the files separated by newlines and protected with double quotes which embed the spaces in the middle. If your files had a \n inside the name you should tweak a bit. The rest is as the original.
This process has a N time-cost instead of N*N and explores 5000 files in a 2GB directory (excluding md5 time) in less than 1 second, while the original took me 24 seconds.
Hope to help!
Xavi Montero
madsurgeon said
awk ‘dummy[$1]++’ does not print the first file of each given size and therefor not all duplicates.
To print the whole set we need to use awk ‘++dummy[$1]’ instead.
madsurgeon said
Sorry, forget my first reply. Doesn’t work either, prints all lines. So I let ‘find’ write the list into a temp file and the parse it with sort as above, but I exchange
awk ‘dummy[$1]++’ | cut –delimiter=” ” -f 2-
by
while read size file; do [ $(grep -c $size files_found) -gt 1 ] && echo $file; done
Now no files are missing.
Comparing two directories and deleting files said
[…] https://ajayfromiiit.wordpress.com/20…iles-in-linux/ […]
Timothy said
Really good but…
I measured your “one line” and it comes out to 18.75 inches on my screen. It is informative but funny. Thanks for the tip.
Suja said
Really very good!
You are a champ!!!
Karl said
Hello, you put a good effort into this script, still, there is a nice linux util called fdupes that do this job, I’ve found it here:
http://gnuwhatimsaying.com/find-duplicate-files-on-linux-with-fdupes/
It’s doing a good job, and it’s fast enough and there are several paramaters for use.
It needed less then a minute for a 6 Gb directory, which is good enough for me.
Regards,
how to last longer tips naturally said
Great beat ! I would like to apprentice while you amend your
site, how could i subscribe for a blog web
site? The account helped me a applicable deal. I had been tiny bit acquainted
of this your broadcast offered vivid clear concept
Carl Parson's personal blog said
Finding duplicate files
find -not -empty -type f -printf “%s\n” | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 –all-repeated=separate Finding duplicates
Siraj said
hey Carl,
any idea how to find the files in a remote directory, i.e. find files on 1.1.1.1:/root/abc with credentials of the remote machine as below
username: root password: toor
Todd Carney said
Hi! It’s not clear to me which of the duplicates (triplicates, etc.) this one liner *doesn’t* delete. What’s the criteria? –Todd
Jame said
This page encompasses Ancient Jamestowne, the Groundbreaking Ist Yorktown Battlefield and also Cape Henry, just where the Jamestown settlers very first landed in 1607.
Steffi said
When you say 'infected we30;tes&#sb9i, do you mean websites that belong to the criminal, or is this malware capable of being hidden on a legitimate website? (As webmaster, I'd like to know if I need to take extra precautions)
What is Business intelligence said
In nearly every situation, the price of your BI technique is going to be a significant aspect,
and can probably restriction the Google to unique segments related with your
market. Your single most critical facet of cost, nevertheless,
is to make certain you comprehend the true total worth of ownership (TCO) related with your solutions
you may be considering. In typical BI implementations, the
price of the software is a tiny fraction related with
everything should cover. This is because the
vast majority of BI possibilities can require
expensive integration and report design professional services.
The cost of these talking to professional services pretty
much always eclipse the software program licensing charges.
In addition to this, your services expected to preserve your files and
analytical applications (e.g., altering and extending them to fulfill the ever-changing
desires related with the company users) and create brand-new ones also have to be studied right into consideration.
You could assume that you are going to always require more of the
professional services when compared to you anticipate.
Gary1991 said
Delete duplicate files with ease!
Try DuplicateFilesDeleter program and get rid of duplicate files.
Thank you!
Magia en Bash 50 ejemplos en una sola linea (o dos) | VTRR said
[…] encontrar más ejemplos y otros one-liners que no he incluído en la lista): Bash One-Liners Just another tech blog Good coders code, great reuse Al pan pan y al vino vino All about Symbian Forums […]
rewanya said
I use Duplicate Files Deleter as it is very effective. It is 100% accurate and performs the scan quickly.
facial compilation said
Next time I read a blog, I hope that it doesn’t fail me as much as this one. After all, I know it was my choice to read through, nonetheless I actually thought you would probably have something useful to talk about. All I hear is a bunch of complaining about something that you can fix if you were not too busy looking for attention.
view it now said
view it now
One liner to find and remove duplicate files in Linux « Just another tech blog.