Welcome to pdfminer.six’s documentation!

Travis-ci build badge PyPi version badge gitter badge

Pdfminer.six is a python package for extracting information from PDF documents.

Check out the source on github.

Features

  • Parse all objects from a PDF document into Python objects.
  • Analyze and group text in a human-readable way.
  • Extract text, images (JPG, JBIG2 and Bitmaps), table-of-contents, tagged contents and more.
  • Support for (almost all) features from the PDF-1.7 specification
  • Support for Chinese, Japanese and Korean CJK) languages as well as vertical writing.
  • Support for various font types (Type1, TrueType, Type3, and CID).
  • Support for basic encryption (RC4).

Installation instructions

Before using it, you must install it using Python 2.7 or newer.

$ pip install pdfminer.six

Note that Python 2.7 support is dropped at January, 2020.

Common use-cases

Contributing

We welcome any contributors to pdfminer.six! But, before doing anything, take a look at the contribution guide.