Efficient PDF Automation with pdftk πŸ–¨οΈπŸš€

folder, file, laptop, digital, analog, office, 3d render, 3d mockup, magnifying glass, filing, office work, folder, folder, folder, folder, folder

In today’s fast-paced business environment, efficiency is key. For back-office teams handling multiple PDF reports generated from different systems, automating the process of merging and renaming these files can save a significant amount of time and reduce errors. In this blog post, we’ll explore how to use the pdftk command in Linux to merge multiple PDF files and rename the final output based on customer names. We’ll also provide a step-by-step guide to implementing this automation using bash scripts.

The Problem ⚠️

Imagine you have a situation where multiple systems produce PDF reports for different customers. Before distributing these reports, you need to merge all the system-generated PDF reports for each customer. These reports are named after the customer’s ID and are stored in various subfolders (e.g., Part1, Part2, Part3). Your task is to:

  1. Merge the PDF parts for each customer into a single PDF file.
  2. Store the merged file in a Final subfolder.
  3. Rename the merged file from the customer ID to the customer’s name.
  4. Automate this process for multiple customers using a CSV file that maps customer IDs to their names.

The Solution πŸ’‘

We’ll use the pdftk command-line tool to merge the PDF files and bash scripts to automate the process. Here’s how you can achieve this:

Step 1: Install pdftk

If you don’t already have pdftk installed, you can install it using your package manager. For example, on Ubuntu, you can run:

sudo apt-get install pdftk

Step 2: Create the Bash Scripts

We’ll create two bash scripts: GenReport.sh and GenReportInBulk.sh.

GenReport.sh

This script will merge the PDF parts for a given customer ID and store the merged file in the Final folder.

#!/bin/bash
# Filename: GenReport.sh

# Merge multiple PDF files
pdftk A=./Part1/$1.pdf B=./Part2/$1.pdf C=./Part3/$1.pdf cat A B C output ./Final/$1.pdf

GenReportInBulk.sh

This script will read a CSV file containing customer information, call GenReport.sh for each customer, and rename the merged PDF file based on the customer’s name.

#!/bin/bash
# Filename: GenReportInBulk.sh

# Read the CSV file
input="sample_data.csv"

# Loop through each line in the CSV file
while IFS=, read -r pdf_filename customer_name customer_id
do
    # Skip the header line
    if [ "$pdf_filename" != "PDF filename" ]; then
        # Pass the customer ID to GenReport.sh as an input parameter
        ./GenReport.sh "$customer_id"
        
        # Check if the report generation was successful
        if [ $? -eq 0 ]; then
            # Rename the generated PDF file (customer_id.pdf) in the Final subfolder to the value in column 1
            mv "Final/${customer_id}.pdf" "Final/$pdf_filename"
        else
            echo "Failed to generate report for customer ID: $customer_id"
        fi
    fi
done < "$input"

Step 3: Prepare the CSV File

The CSV file (sample_data.csv) should contain the mapping between the customer ID and the customer name. Here’s an example:

PDF filename,Customer Name,Customer ID
john_doe.pdf,John Doe,12345678
jane_smith.pdf,Jane Smith,87654321

Step 4: Give Execute Permissions to the Scripts

Before running the scripts, make sure they have execute permissions:

chmod +x GenReport.sh
chmod +x GenReportInBulk.sh

Step 5: Run the Script

Now, you can run the GenReportInBulk.sh script to process all the customer reports:

./GenReportInBulk.sh

Expected Folder Structure After Execution

Before running the script, your folder structure might look like this:

.
β”œβ”€β”€ Final
β”‚   
β”œβ”€β”€ GenReportInBulk.sh
β”œβ”€β”€ GenReport.sh
β”œβ”€β”€ Part1
β”‚   β”œβ”€β”€ 12345678.pdf
β”‚   └── 87654321.pdf
β”œβ”€β”€ Part2
β”‚   β”œβ”€β”€ 12345678.pdf
β”‚   └── 87654321.pdf
β”œβ”€β”€ Part3
β”‚   β”œβ”€β”€ 12345678.pdf
β”‚   └── 87654321.pdf
└── sample_data.csv

After running the script, the Final folder will contain the merged and renamed PDF files:

.
β”œβ”€β”€ Final
β”‚   β”œβ”€β”€ jane_smith.pdf
β”‚   └── john_doe.pdf
β”œβ”€β”€ GenReportInBulk.sh
β”œβ”€β”€ GenReport.sh
β”œβ”€β”€ Part1
β”‚   β”œβ”€β”€ 12345678.pdf
β”‚   └── 87654321.pdf
β”œβ”€β”€ Part2
β”‚   β”œβ”€β”€ 12345678.pdf
β”‚   └── 87654321.pdf
β”œβ”€β”€ Part3
β”‚   β”œβ”€β”€ 12345678.pdf
β”‚   └── 87654321.pdf
└── sample_data.csv

Conclusion 🌟

By using pdftk and a couple of bash scripts, you can automate the process of merging and renaming PDF files, saving your back-office team valuable time and reducing the risk of errors. This approach can be easily adapted to more complex scenarios, such as handling more parts of the report.


We’d love to hear from you! Have you tried automating PDF workflows in your back office? Do you have other tips or tools that have helped increase operational efficiency? Share your experiences, challenges, or success stories in the comments below. Let’s start a discussion and learn from each other to make our workflows even better! πŸ’¬πŸ‘‡

Feel free to modify the scripts to suit your specific needs, and happy automating!πŸ€–

1 thought on “Efficient PDF Automation with pdftk πŸ–¨οΈπŸš€”

  1. Pingback: Automating Monthly PDF Report Delivery to Clients Using `swaks` in Linux - yewhuat.com

Leave a Comment

Your email address will not be published. Required fields are marked *