Does tar -tvf decompress the file or just list the names?

The name of the pictureThe name of the pictureThe name of the pictureClash Royale CLAN TAG#URR8PPP








up vote
8
down vote

favorite
1












I have a tar.gz file of 32 GB. I was trying to extract specific folders from it so I listed the contents with the following command to view the files structure:



tar -tvf file.tar.gz > files.txt


It seems to be taking forever to list all the files. My question is does the -t flag extract the files as well? I know it doesn't extract on the disk but the amount of time it is taking makes me wonder if it actually process them in some sort of a buffer.







share|improve this question


















  • 1




    You forgot the -z option: tar -tvfz. Similar: What happens if you use the command tar tvf as opposed to tar tvfz?
    – smci
    May 15 at 2:23






  • 3




    @smci: It’s auto-detected, so not really forgotten.
    – Ry-
    May 15 at 2:38















up vote
8
down vote

favorite
1












I have a tar.gz file of 32 GB. I was trying to extract specific folders from it so I listed the contents with the following command to view the files structure:



tar -tvf file.tar.gz > files.txt


It seems to be taking forever to list all the files. My question is does the -t flag extract the files as well? I know it doesn't extract on the disk but the amount of time it is taking makes me wonder if it actually process them in some sort of a buffer.







share|improve this question


















  • 1




    You forgot the -z option: tar -tvfz. Similar: What happens if you use the command tar tvf as opposed to tar tvfz?
    – smci
    May 15 at 2:23






  • 3




    @smci: It’s auto-detected, so not really forgotten.
    – Ry-
    May 15 at 2:38













up vote
8
down vote

favorite
1









up vote
8
down vote

favorite
1






1





I have a tar.gz file of 32 GB. I was trying to extract specific folders from it so I listed the contents with the following command to view the files structure:



tar -tvf file.tar.gz > files.txt


It seems to be taking forever to list all the files. My question is does the -t flag extract the files as well? I know it doesn't extract on the disk but the amount of time it is taking makes me wonder if it actually process them in some sort of a buffer.







share|improve this question














I have a tar.gz file of 32 GB. I was trying to extract specific folders from it so I listed the contents with the following command to view the files structure:



tar -tvf file.tar.gz > files.txt


It seems to be taking forever to list all the files. My question is does the -t flag extract the files as well? I know it doesn't extract on the disk but the amount of time it is taking makes me wonder if it actually process them in some sort of a buffer.









share|improve this question













share|improve this question




share|improve this question








edited May 15 at 4:20









muru

129k19271461




129k19271461










asked May 14 at 18:35









Saif

516




516







  • 1




    You forgot the -z option: tar -tvfz. Similar: What happens if you use the command tar tvf as opposed to tar tvfz?
    – smci
    May 15 at 2:23






  • 3




    @smci: It’s auto-detected, so not really forgotten.
    – Ry-
    May 15 at 2:38













  • 1




    You forgot the -z option: tar -tvfz. Similar: What happens if you use the command tar tvf as opposed to tar tvfz?
    – smci
    May 15 at 2:23






  • 3




    @smci: It’s auto-detected, so not really forgotten.
    – Ry-
    May 15 at 2:38








1




1




You forgot the -z option: tar -tvfz. Similar: What happens if you use the command tar tvf as opposed to tar tvfz?
– smci
May 15 at 2:23




You forgot the -z option: tar -tvfz. Similar: What happens if you use the command tar tvf as opposed to tar tvfz?
– smci
May 15 at 2:23




3




3




@smci: It’s auto-detected, so not really forgotten.
– Ry-
May 15 at 2:38





@smci: It’s auto-detected, so not really forgotten.
– Ry-
May 15 at 2:38











1 Answer
1






active

oldest

votes

















up vote
13
down vote



accepted










tar.gz files do not have an index. Unlike zip or other archive formats it is not trivial nor cheap to obtain a list of the contained files or other metadata. In order to show you which files are contained in the archive, tar indeed needs to uncompress the archive and extract the files, although in the case of the -t option it does so only in memory.



If a common pattern in your use case is to list the contained files in an archive, you might want to consider using an archive format that can add a file index to the compressed file, e. g. zip.



Perhaps you also want to take a look at the HDF5 format for more complex scenarios.



Measurements



I just had to do some measurements to prove my answer and created some directories with many files in them and packed them which both, tar czf files#.tgz files# and zip -r files#.zip files#.



For the tests I ran the unpacking command twice each time and took the result of the second run, to try to avoid measuring disk speed.



Test 1



Directory files1 containing 100,000 empty files.



$ time tar tzf files1.tgz >/dev/null
tar tzf files1.tgz > /dev/null 0,56s user 0,09s system 184% cpu 0,351 total
$ time unzip -l files1.zip >/dev/null
unzip -l files1.zip > /dev/null 0,30s user 0,34s system 99% cpu 0,649 total


zip is slower here.



Test 2



Directory files2 containing 5,000 files with 512 bytes of random data each.



$ time tar tzf files2.tgz >/dev/null
tar tzf files2.tgz > /dev/null 0,14s user 0,03s system 129% cpu 0,131 total
$ time unzip -l files2.zip >/dev/null
unzip -l files2.zip > /dev/null 0,03s user 0,06s system 98% cpu 0,092 total


Still not convincing, but zip is faster this time.



Test 3



Directory files3 containing 5,000 files with 5kB of random data each.



$ time tar tzf files3.tgz >/dev/null
tar tzf files3.tgz > /dev/null 0,42s user 0,03s system 111% cpu 0,402 total
$ time unzip -l files3.zip >/dev/null
unzip -l files3.zip > /dev/null 0,03s user 0,06s system 99% cpu 0,093 total


In this test it can be seen that the larger the files get, the harder it is for tar to list them.



Conclusion



To me it looks like zip introduces a little overhead that you will notice only with many very small (almost empty) files, whereas for large numbers of larger files it wins the contest when listing the files contained in the archive.






share|improve this answer






















    Your Answer







    StackExchange.ready(function()
    var channelOptions =
    tags: "".split(" "),
    id: "89"
    ;
    initTagRenderer("".split(" "), "".split(" "), channelOptions);

    StackExchange.using("externalEditor", function()
    // Have to fire editor after snippets, if snippets enabled
    if (StackExchange.settings.snippets.snippetsEnabled)
    StackExchange.using("snippets", function()
    createEditor();
    );

    else
    createEditor();

    );

    function createEditor()
    StackExchange.prepareEditor(
    heartbeatType: 'answer',
    convertImagesToLinks: true,
    noModals: false,
    showLowRepImageUploadWarning: true,
    reputationToPostImages: 10,
    bindNavPrevention: true,
    postfix: "",
    onDemand: true,
    discardSelector: ".discard-answer"
    ,immediatelyShowMarkdownHelp:true
    );



    );








     

    draft saved


    draft discarded


















    StackExchange.ready(
    function ()
    StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1036228%2fdoes-tar-tvf-decompress-the-file-or-just-list-the-names%23new-answer', 'question_page');

    );

    Post as a guest






























    1 Answer
    1






    active

    oldest

    votes








    1 Answer
    1






    active

    oldest

    votes









    active

    oldest

    votes






    active

    oldest

    votes








    up vote
    13
    down vote



    accepted










    tar.gz files do not have an index. Unlike zip or other archive formats it is not trivial nor cheap to obtain a list of the contained files or other metadata. In order to show you which files are contained in the archive, tar indeed needs to uncompress the archive and extract the files, although in the case of the -t option it does so only in memory.



    If a common pattern in your use case is to list the contained files in an archive, you might want to consider using an archive format that can add a file index to the compressed file, e. g. zip.



    Perhaps you also want to take a look at the HDF5 format for more complex scenarios.



    Measurements



    I just had to do some measurements to prove my answer and created some directories with many files in them and packed them which both, tar czf files#.tgz files# and zip -r files#.zip files#.



    For the tests I ran the unpacking command twice each time and took the result of the second run, to try to avoid measuring disk speed.



    Test 1



    Directory files1 containing 100,000 empty files.



    $ time tar tzf files1.tgz >/dev/null
    tar tzf files1.tgz > /dev/null 0,56s user 0,09s system 184% cpu 0,351 total
    $ time unzip -l files1.zip >/dev/null
    unzip -l files1.zip > /dev/null 0,30s user 0,34s system 99% cpu 0,649 total


    zip is slower here.



    Test 2



    Directory files2 containing 5,000 files with 512 bytes of random data each.



    $ time tar tzf files2.tgz >/dev/null
    tar tzf files2.tgz > /dev/null 0,14s user 0,03s system 129% cpu 0,131 total
    $ time unzip -l files2.zip >/dev/null
    unzip -l files2.zip > /dev/null 0,03s user 0,06s system 98% cpu 0,092 total


    Still not convincing, but zip is faster this time.



    Test 3



    Directory files3 containing 5,000 files with 5kB of random data each.



    $ time tar tzf files3.tgz >/dev/null
    tar tzf files3.tgz > /dev/null 0,42s user 0,03s system 111% cpu 0,402 total
    $ time unzip -l files3.zip >/dev/null
    unzip -l files3.zip > /dev/null 0,03s user 0,06s system 99% cpu 0,093 total


    In this test it can be seen that the larger the files get, the harder it is for tar to list them.



    Conclusion



    To me it looks like zip introduces a little overhead that you will notice only with many very small (almost empty) files, whereas for large numbers of larger files it wins the contest when listing the files contained in the archive.






    share|improve this answer


























      up vote
      13
      down vote



      accepted










      tar.gz files do not have an index. Unlike zip or other archive formats it is not trivial nor cheap to obtain a list of the contained files or other metadata. In order to show you which files are contained in the archive, tar indeed needs to uncompress the archive and extract the files, although in the case of the -t option it does so only in memory.



      If a common pattern in your use case is to list the contained files in an archive, you might want to consider using an archive format that can add a file index to the compressed file, e. g. zip.



      Perhaps you also want to take a look at the HDF5 format for more complex scenarios.



      Measurements



      I just had to do some measurements to prove my answer and created some directories with many files in them and packed them which both, tar czf files#.tgz files# and zip -r files#.zip files#.



      For the tests I ran the unpacking command twice each time and took the result of the second run, to try to avoid measuring disk speed.



      Test 1



      Directory files1 containing 100,000 empty files.



      $ time tar tzf files1.tgz >/dev/null
      tar tzf files1.tgz > /dev/null 0,56s user 0,09s system 184% cpu 0,351 total
      $ time unzip -l files1.zip >/dev/null
      unzip -l files1.zip > /dev/null 0,30s user 0,34s system 99% cpu 0,649 total


      zip is slower here.



      Test 2



      Directory files2 containing 5,000 files with 512 bytes of random data each.



      $ time tar tzf files2.tgz >/dev/null
      tar tzf files2.tgz > /dev/null 0,14s user 0,03s system 129% cpu 0,131 total
      $ time unzip -l files2.zip >/dev/null
      unzip -l files2.zip > /dev/null 0,03s user 0,06s system 98% cpu 0,092 total


      Still not convincing, but zip is faster this time.



      Test 3



      Directory files3 containing 5,000 files with 5kB of random data each.



      $ time tar tzf files3.tgz >/dev/null
      tar tzf files3.tgz > /dev/null 0,42s user 0,03s system 111% cpu 0,402 total
      $ time unzip -l files3.zip >/dev/null
      unzip -l files3.zip > /dev/null 0,03s user 0,06s system 99% cpu 0,093 total


      In this test it can be seen that the larger the files get, the harder it is for tar to list them.



      Conclusion



      To me it looks like zip introduces a little overhead that you will notice only with many very small (almost empty) files, whereas for large numbers of larger files it wins the contest when listing the files contained in the archive.






      share|improve this answer
























        up vote
        13
        down vote



        accepted







        up vote
        13
        down vote



        accepted






        tar.gz files do not have an index. Unlike zip or other archive formats it is not trivial nor cheap to obtain a list of the contained files or other metadata. In order to show you which files are contained in the archive, tar indeed needs to uncompress the archive and extract the files, although in the case of the -t option it does so only in memory.



        If a common pattern in your use case is to list the contained files in an archive, you might want to consider using an archive format that can add a file index to the compressed file, e. g. zip.



        Perhaps you also want to take a look at the HDF5 format for more complex scenarios.



        Measurements



        I just had to do some measurements to prove my answer and created some directories with many files in them and packed them which both, tar czf files#.tgz files# and zip -r files#.zip files#.



        For the tests I ran the unpacking command twice each time and took the result of the second run, to try to avoid measuring disk speed.



        Test 1



        Directory files1 containing 100,000 empty files.



        $ time tar tzf files1.tgz >/dev/null
        tar tzf files1.tgz > /dev/null 0,56s user 0,09s system 184% cpu 0,351 total
        $ time unzip -l files1.zip >/dev/null
        unzip -l files1.zip > /dev/null 0,30s user 0,34s system 99% cpu 0,649 total


        zip is slower here.



        Test 2



        Directory files2 containing 5,000 files with 512 bytes of random data each.



        $ time tar tzf files2.tgz >/dev/null
        tar tzf files2.tgz > /dev/null 0,14s user 0,03s system 129% cpu 0,131 total
        $ time unzip -l files2.zip >/dev/null
        unzip -l files2.zip > /dev/null 0,03s user 0,06s system 98% cpu 0,092 total


        Still not convincing, but zip is faster this time.



        Test 3



        Directory files3 containing 5,000 files with 5kB of random data each.



        $ time tar tzf files3.tgz >/dev/null
        tar tzf files3.tgz > /dev/null 0,42s user 0,03s system 111% cpu 0,402 total
        $ time unzip -l files3.zip >/dev/null
        unzip -l files3.zip > /dev/null 0,03s user 0,06s system 99% cpu 0,093 total


        In this test it can be seen that the larger the files get, the harder it is for tar to list them.



        Conclusion



        To me it looks like zip introduces a little overhead that you will notice only with many very small (almost empty) files, whereas for large numbers of larger files it wins the contest when listing the files contained in the archive.






        share|improve this answer














        tar.gz files do not have an index. Unlike zip or other archive formats it is not trivial nor cheap to obtain a list of the contained files or other metadata. In order to show you which files are contained in the archive, tar indeed needs to uncompress the archive and extract the files, although in the case of the -t option it does so only in memory.



        If a common pattern in your use case is to list the contained files in an archive, you might want to consider using an archive format that can add a file index to the compressed file, e. g. zip.



        Perhaps you also want to take a look at the HDF5 format for more complex scenarios.



        Measurements



        I just had to do some measurements to prove my answer and created some directories with many files in them and packed them which both, tar czf files#.tgz files# and zip -r files#.zip files#.



        For the tests I ran the unpacking command twice each time and took the result of the second run, to try to avoid measuring disk speed.



        Test 1



        Directory files1 containing 100,000 empty files.



        $ time tar tzf files1.tgz >/dev/null
        tar tzf files1.tgz > /dev/null 0,56s user 0,09s system 184% cpu 0,351 total
        $ time unzip -l files1.zip >/dev/null
        unzip -l files1.zip > /dev/null 0,30s user 0,34s system 99% cpu 0,649 total


        zip is slower here.



        Test 2



        Directory files2 containing 5,000 files with 512 bytes of random data each.



        $ time tar tzf files2.tgz >/dev/null
        tar tzf files2.tgz > /dev/null 0,14s user 0,03s system 129% cpu 0,131 total
        $ time unzip -l files2.zip >/dev/null
        unzip -l files2.zip > /dev/null 0,03s user 0,06s system 98% cpu 0,092 total


        Still not convincing, but zip is faster this time.



        Test 3



        Directory files3 containing 5,000 files with 5kB of random data each.



        $ time tar tzf files3.tgz >/dev/null
        tar tzf files3.tgz > /dev/null 0,42s user 0,03s system 111% cpu 0,402 total
        $ time unzip -l files3.zip >/dev/null
        unzip -l files3.zip > /dev/null 0,03s user 0,06s system 99% cpu 0,093 total


        In this test it can be seen that the larger the files get, the harder it is for tar to list them.



        Conclusion



        To me it looks like zip introduces a little overhead that you will notice only with many very small (almost empty) files, whereas for large numbers of larger files it wins the contest when listing the files contained in the archive.







        share|improve this answer














        share|improve this answer



        share|improve this answer








        edited May 24 at 14:51

























        answered May 14 at 18:48









        Sebastian Stark

        4,643838




        4,643838






















             

            draft saved


            draft discarded


























             


            draft saved


            draft discarded














            StackExchange.ready(
            function ()
            StackExchange.openid.initPostLogin('.new-post-login', 'https%3a%2f%2faskubuntu.com%2fquestions%2f1036228%2fdoes-tar-tvf-decompress-the-file-or-just-list-the-names%23new-answer', 'question_page');

            );

            Post as a guest













































































            Popular posts from this blog

            Unable to execute new pre-installation script (/var/lib/dpkg/tmp.ci/preinst)

            Running the scala interactive shell from the command line

            Do not install recommended packages of dependencies