converting pdf to images formats

Introduction

I recently had an assignment from a renowned online technical articles publisher. I was tasked with writing a practical step by step tutorial showcasing Development of a Full Stack Web application for Electronic Library Management using Python’s Django Framework and Vue.js a famous JavaScript front-end framework.

Project Requirement in Brief

Given the Nature of the project there was a lot of requirements for the application and project as a whole. One of the critical requirement that ImageMagick satisfied was to generate the image formats like jpeg, gif, and png for the cover image of the pdf ebooks’ files. As the old adage says, a picture is worth a thousands words, availability of cover-images or thumbnails was critical requirement in the project. Instead of publishing a lot of ebook summary/details description from the outset of the landing page, cover-image conveys a lot of information regarding the book in a very minimum space.

Issues at Stake

There was no issues in acquiring a substantial number of pdf ebooks through downloading, given the financial and legal feasibility of the project, Many thanks to Editech. Edtechbooks.org is a website which publishes enormous amount of e-books under creative commons license which gives users some extended rights regarding using of the books. For more on varieties of creative commons license please visit: creativecommons

With that said the real issue was:

To obtain, install and configure in the working machine/computer ImageMagick software[a useful utility to manipulate various multimedia files formats and inter-converting between them]
To write the utility script and a function that will process the downloaded pdf e-books to generate the cover-image file/s for those e-books.

To make this article as brief as it should the details about issue number one above has been omitted in this article. You can however read about that from the official ImageMagick site. Or if you are running on a system with Ubuntu 18.04 or Ubuntu 20.04 you may read this: blog article Assuming everything is fine with regards to issue one above, let’s now focus on issue number 2.

Proceedings

To proceed with issue number 2 at least two things are needed:

List of downloaded pdf e-books
Writing a script to process those e-books to the required end-products of cover-images for each one of them alongside them in the given folder of our downloads.

In order to practice this task you may obtain the shared zipped file which contains eight books with CC license from edtechbooks.org that you may freely use as per cc license please check them out here: zipped-ebooks . Whether you choose to use these or download them directly from edtech yourself, it is you who decides which option you take. If you have decided to download the zipped-ebooks file, please extract the pdf ebooks files to a chosen folder in your home directory let say /home/user/ebooks for example in my case that is “home/benedict/ebooks” for your case do replace user with whatever respective name that your machine bears. If you want to make the ebooks directory in your home directory you need to go to the home directory path by executing the following command in your system terminal.

cd ~

You may now create the ebooks directory by executing the following command

mkdir ebooks

With the ebooks directory ready, You may extract the zipped file that you downloaded from the google drive link given previously to the ebooks directory by firstly changing the path to the downloads folder/directory by executing the following command in your system terminal[assuming your Downloads folder is the one that is setup by default on Ubuntu/Linux]

cd Downloads

After changing the to the Downloads folder you can now run the following command [shown in the next code block] to extract the downloaded zipped-file to the ebooks directory. Note replace user with your computer username for example in my computer where the username is benedict the path to destination folder would be /home/benedict/ebooks.

unzip ebooks.zip -d /home/user/ebooks

With these copies of pdf ebooks files, it is time to write pdftopng.sh utility script that will perform the task of converting the first page of each of the pdf ebook in our collection of the downloaded ebooks. Please now, obtain the pdftopng.sh by either copying the pdftopng.sh script from line number one to line number sixteen(without the line numbers) as they appear below or

#!/bin/bash
    pdfs=(./*.pdf)
    counter=${#pdfs[@]}
    function converter(){
        item=$1
        convert $item[0] -resize 50% -set filename:base "%[basename]" "%[filename:base].png"
        #return $?       
    }
    for((i=0;i<$counter;i++))
        do
            pages="${pdfs[$i]}"           
            #performance=converter $pages     
            #echo performance
            converter $pages
        done

go and download this piece of code here[if you would want to quickly get the full package of the above script’s code without doing to much typing]

Put/save pdftopng.sh script to home directory that is the path to it should be /home/user/pdftopng as usual remember to replace the user with the the right one for the username under which you use your system for my case that is benedict

With that done, move to the home directory within your computer by using the cd command as earlier explained, and activate the pdftopng.sh script [to turn it into executable file and not just a mere text file] by running the following command in your system terminal

chmod +x pdftopng.sh

Change directory to /home/user/ebooks by running the command below

cd /home/user/ebooks

After that execute the following command

./pdftopng.sh

To see the generated list of PNG images run the following command

ls

Hereunder is the quick explanation of of what each of the numbered line statements of the pdftopng.sh does to bring about the overall process to a success

It is known as Shebang it specifies the program to be executed once the script is executed.
Just an empty line
Creates an array of pdf ebooks in the directory under path or present working directory
Gets the total number of items in the array(array size)
Declares the function converter
Declares the variable to hold the argument passed in with the command/ function name for executing the task that the function does
The actual ImageMagick command that is executed when the function named converter is called to run later within the script. The command simply does tell the ImageMagick to convert the passed-in pdf file’s first page to a PNG image in the same folder, resizing(optionally) it to a chosen size while making the name of the output PNG image the same as that of the input pdf file.
The commented return value of the function, you may un comment it for debugging purposes by removing the hash sign at the beginning of the line’s statement.
Finishes off the function declaration.
For loop declaration statement which iterates the procedures between do and done keywords to the number of times that equals the size of the array passed in the process which script carries out taking each item in turn.
Initiates calls of what has to be done within each specific iteration
The variable pages hold the specific pdf ebook pages of which only the first page which bears the specific ebook cover is converted from pdf to the chosen image format
Commented statement which you may un comment as earlier explained together with the following statement for debugging purposes
If the statement is uncommented together with the rest of commented statements you can get sense of what is going on under the hood by interactively debugging the script
Silently execute the converter function along with the passed in argument
Completes the do…done code section block
The end of the script

Got any Problem Along the Way?

If you do not get the expected results silently or get errors you may remove comments from the line 8, 13, and 14. Save the changes before rerunning the script again, this will help in debugging the script. In most cases you might not get the expected results due to the configurations settings of the ImageMagick installation which comes by default if that be the case please try out the below fix, restart your machine to make the changes takes effect and retry the process as previously explained.

Fix:

In file /etc/ImageMagick-6/policy.xml (or /etc/ImageMagick/policy.xml)

comment line

<policy domain=”coder” rights=”none” pattern=”MVG”>

by adding the symbols in bold before and after as highlighted below

<!–<policy domain=”coder” rights=”none” pattern=”MVG”>–>

Change line

<policy domain=”coder” rights=”none” pattern=”PDF”>

<policy domain=”coder” rights=”read|write” pattern=”PDF” />

add line

<policy domain=”coder” rights=”read|write” pattern=”LABEL” />

Conclusion

ImageMagick provides us with a lot multimedia files processing capabilities, this tutorial has showed just a tip of an iceberg. Probably this may act as a spark to ignite your curiosity, and inspire you to make the most out of it. You may checkout this book titled “The Definitive guide to ImageMagick” by Michael Still from Packt Publishing to further up your skills in using ImageMagick. You may also checkout a book titled “Mastering Bash” By Giorgio Zarrelli to brush up your bash scripting techniques.

How ImageMagick Saved a Day in My assigned Project

Table of contents