Archiving directory in python

Archiving as zip file is a common task for any file processing systems which receives files either as inbound or sends files as outbound or both. When the system gets complex, directory structure may change and you may need to archive targeted folder and their subfolders recursively.

ZIP is an archive file format that supports lossless data compression. Python natively supports for handling zip files. We will see how to archive(zip) files recursively and remove the source directory using python.

.

|-- 0313EBF8-2E05-406C-864F-7873227340DD

  `-- 75

      `-- May

          |-- 1

            |-- 0

              |-- 0313EBF8-2E05-406C-864F-7873227340DD_8060.txt


Version

Python - 3.7

Zipping multiple files

You can create zipfile object and write to the zipfile object as below. It is as simple as that. If your requirement is to create zip file (archive) with list of files, instead of whole directory, you can simply add multiple files to the zip file.

from zipfile import ZipFile
with ZipFile('example.zip', 'w') as zip_file:
zip_file.write('example.txt')     zip_file.write('example1.txt')

But Python built-in module Zipfile does not directly provide methods to archive a directory/folder.  If you want to archive directory/folder, there will be a lot of work involved with this method.

You will be able to archive a folder and subfolders using command line like below.

zipfile.py -c zipfile.zip src ... # Create zipfile from sources

  May git:(master python -m zipfile -c 11.zip 11/

This executes main function in the zipfile module with arguments. So we will utilize this for our favour.

# zipfile.py

def main(args = None):...

if __name__ == "__main__":
main()

Archiving directory

Create zipfile command does all the work for us recursively adding files from subfolders which is very much needed for the archival process.

.

|-- 0313EBF8-2E05-406C-864F-7873227340DD

  `-- 75

      `-- May

          |-- 1

            |-- 0

              |-- 0313EBF8-2E05-406C-864F-7873227340DD_8060.txt


We have to pass the arguments to the main function as how the command line arguments look like. By default, we kept remove_source set to False. This feature is like toggle. Based on your need, you can call the function. This way you can create zip archive from directory recursively and remove them if required.

from zipfile import main as ZipFileMain

def zip_dir(file_name, source, remove_source=False):
"""
creates archive all the recursively in the given source directory
:param file_name: zip file name
:param source: directory to be archived
:param remove_source: remove source directory after archiving.
:return: return status of the task
"""

args = ['-c', file_name, source]
status = True
try:
ZipFileMain(args)
except Exception as e:
status = False
print(f'{e}')
else:
remove_source_dir(remove_source, source)
return status
import os
import shutil
def remove_source_dir(remove_source, source):
"""
remove the source directory if remove_source is True.

:param remove_source: boolean
:param source: directory path
"""
if remove_source:
if os.path.isdir(source):
shutil.rmtree(source)
else:
os.remove(source)

Calling Archival API

You can call above function as below. file_name can be absolute path as '/home/temp/zipfile.zip' or relative path like 'temp/zipfile.zip' to save the archive in different directory.

status = zip_dir(file_name='zipfile.zip',
source='May/11',
remove_source=True)

You can use your custom logic to get file name, source and condition for source removal.

Automating with crontab

Once you complete your script, you can automate/schedule this task using simple crontab in Unix based system. In windows, you can use task scheduler.

Editing crontab

You can add a task in crontab by editing crontab.

crontab -e

Adding crontab task

With below config, script will run every day at 1.00 AM.

0 1 * * * python archival.py


Thanks for reading.!

Please do comment if you have any questions or stuck.

Related posts
AWS S3 configuration for backup

AWS S3 configuration for backup

Durai Pandian May 07, 2020

Amazon Simple Storage Service (AWS S3) configuration for backup with expiration, maintaining versions and configuration ...
Continue reading...
Backup SQLite database to AWS S3

Backup SQLite database to AWS S3

Durai Pandian May 07, 2020

Taking SQLite database backup in python to AWS S3 with expiration and maintaining versioning using s3 lifecycle in manag...
Continue reading...

Comments
We'll never share your email with anyone else.