Portable Document Format (PDF) is a format developed by Adobe for rendering printable documents that includes pixel-perfect formatting, embedded fonts, and 2D vector images. You can think of a PDF document as the digital equivalent of a printed document; indeed, PDFs are often used in distributing documents for the purpose of printing them.
You can easily use Python and Django to generate PDF documents thanks to an excellent open source library, ReportLab (http://www.reportlab.org/rl_toolkit.html). The advantage of dynamically generating PDF files is that under different circumstances, such as different users or different content, different PDF files can be generated on demand. The advantage of generating PDF files dynamically is that you can create customized PDFs for different purposes say, for different users or different pieces of content.
The following example uses Django and ReportLab to generate personalized, printable NCAA tournament brackets on KUSports.com.
Before generating PDF files, the ReportLab library needs to be installed. Its usually simple: just download and install the library from http://www.reportlab.org/downloads.html.
If you are using some new Linux distributions, you can check the package management software before installing. ReportLab has been added to most package repositories.
For example, if you use the (excellent) Ubuntu distribution, you only need a simple apt-get install python-reportlab one-line command to complete the installation.
The user manual (original in PDF format only) can be downloaded from http://www.reportlab.org/rsrc/userguide.pdf and contains additional installation instructions.
Import this package in the Python interactive environment to check whether the installation was successful.
>>> import reportlab
If there are no errors in the command just now, the installation is successful.
Similar to CSV, dynamically generating PDF files from Django is easy because the ReportLab API can also use similar file objects.
Here is an example of Hello World:
from reportlab.pdfgen import canvas from django.http import HttpResponse def hello_pdf(request): # Create the HttpResponse object with the appropriate PDF headers. respOnse= HttpResponse(mimetype='application/pdf') response['Content-Disposition'] = 'attachment; filename=hello.pdf' # Create the PDF object, using the response object as its "file." p = canvas.Canvas(response) # Draw things on the PDF. Here's where the PDF generation happens. # See the ReportLab documentation for the full list of functionality. p.drawString(100, 100, "Hello world.") # Close the PDF object cleanly, and we're done. p.showPage() p.save() return response
You need to pay attention to the following points:
The MIME type we are using here is application/pdf . This tells the browser that the document is a PDF document, not an HTML document. If this parameter is omitted, the browser may treat the file as an HTML document, which will cause strange text to appear in the browser window. If you leave off this information, browsers will probably interpret the response as HTML, which will result in scary gobbledygook in the browser window.
Using ReportLab’s API is very simple: just pass in the response object as the first parameter of canvas.Canvas.
All subsequent PDF generation methods need to be called from the PDF object (p in this case), not the response object.
Finally you need to call showPage() and save() methods on the PDF file (otherwise you will get a corrupted PDF file).
Complex PDF files
If you are creating a complex PDF document (or any larger block of data), use the cStringIO library to store the temporarily generated PDF file. cStringIO provides a file-object-like interface written in C to maximize system efficiency.
The following is an example of Hello World rewritten using cStringIO:
from cStringIO import StringIO from reportlab.pdfgen import canvas from django.http import HttpResponse def hello_pdf(request): # Create the HttpResponse object with the appropriate PDF headers. respOnse= HttpResponse(mimetype='application/pdf') response['Content-Disposition'] = 'attachment; filename=hello.pdf' temp = StringIO() # Create the PDF object, using the StringIO object as its "file." p = canvas.Canvas(temp) # Draw things on the PDF. Here's where the PDF generation happens. # See the ReportLab documentation for the full list of functionality. p.drawString(100, 100, "Hello world.") # Close the PDF object cleanly. p.showPage() p.save() # Get the value of the StringIO buffer and write it to the response. response.write(temp.getvalue()) return response
There are many other types of content you can generate using Python. Here are some other ideas and some libraries you can use to implement them. Here are a few more ideas and some pointers to libraries you could use to implement them:
ZIP files: The Python standard library includes the zipfile module, which can read and write compressed ZIP files. It can be used to generate a compressed archive of some files on demand, or to compress large documents when needed. If it is a TAR file, you can use the standard library tarfile module.
Dynamic images: The Python Image Processing Library (PIL; http://www.pythonware.com/products/pil/) is an excellent tool for generating images (PNG, JPEG, GIF and many other formats). It can be used to automatically generate thumbnails for images, compress multiple images into separate frames, or do web-based image processing.
Charts: Python has many excellent and powerful charting libraries for drawing charts, on-demand maps, tables, etc. We couldn’t possibly list them all, so here are the best.
matplotlib (http://matplotlib.sourceforge.net/) can be used to produce high-quality plots typically produced by matlab or Mathematica.
pygraphviz (https://networkx.lanl.gov/wiki/pygraphviz) is a Python interface to the Graphviz graph layout tool (http://graphviz.org/) that can be used to generate structured charts and networks .
In short, any library that can write files can be used with Django. The possibilities are immense.
Now that we’ve covered the basics of generating “non-HTML” content, let’s summarize it a little further. Django has many built-in tools for generating various types of “non-HTML” content.