Clean a huge tar file in place - without extracting the contents

This is one of those options you don’t know exists till you need it (or sometimes you don’t know even when you need it because you were too afraid to ask lazy to search)

Some time ago, I needed to extract only some data files from a huge tar backup that I received from third party.

So I needed to reduce the size of a tar file so that I could download it from the server to my local machine for some testing.

But I didn’t want to extract the tar first, remove files, then re-tar it again.1

So I was wondering if there is an option where the tar itself can be cleaned up in place. I checked with ChatGPT and I found out that it is indeed possible, and here is how to do it.

tar --delete \
    --wildcards \
    -f big.tar \
    'top_lvl_folder/.git/*'

Let us understand each option here.

--delete is pretty obvious. It says delete the files matching the pattern from the tar itself.

-f big.tar is also obvious. It points to the file that tar will operate upon.

--wildcards option enables pattern matching against file names stored inside an archive

Finally the last part is the pattern matching. For example, I wanted to delete everything starting with .git inside the tar file because I didn’t really need that. So that’s what the last part is.


  1. One of the reasons being that I was not sure how much disk space the server had. Extracting the tar contents plus the original tar file meant double the disk usage, which I wanted to avoid. To be honest, I’m not sure if in-place deletion internally also does the same. Even if it does that, I think this option is still cleaner. ↩︎