NAME

    Catmandu::Importer::PDF - Catmandu importer to extract data from one
    pdf

SYNOPSIS

        #From the command line
    
        #Export pdf information, and text
    
        $ catmandu convert PDF --file input.pdf to YAML
    
        #In a script
    
        use Catmandu::Sane;
    
        use Catmandu::Importer::PDF;
    
        my $importer = Catmandu::Importer::PDF->new( file => "/tmp/input.pdf" );
    
        $importer->each(sub{
    
            my $pdf = $_[0];
            #..
    
        });

EXAMPLE OUTPUT IN YAML

        ---
        document:
          author: ~
          creation_date: 1207274644
          creator: PDFplus
          keywords: ~
          metadata: ~
          modification_date: 1421574847
          producer: "Nobody at all"
          subject: ~
          title: "Hello there"
          version: PDF-1.6
        pages:
        - label: Cover Page
          height: 878
          width: 595
          text: "Hello world"

INSTALL

    In order to install this package you need the following system packages
    installed

    Centos

      * perl-devel

      * make

      * gcc

      * gcc-c++

      * libyaml-devel

      * libyaml

      * poppler-glib ( >= 0.16 )

      * poppler-glib-devel ( >= 0.16 )

      Centos 6 only has poppler-glib 0.12. So you need at least Centos 7.
      Or you can compile the package.

    Ubuntu

      * libpoppler-glib8

      * libpoppler-glib-dev

      * gobject-introspection

      * libgirepository1.0-dev

NOTES

    * returns only one record, compared to other Catmandu importers

    * all pages are stored in one record. For large documents this can be
    memory intensive.

    * see also the alternative importers: PDFPages and PDFInfo

AUTHORS

    Nicolas Franck <nicolas.franck at ugent.be>

SEE ALSO

    Catmandu::Importer::PDFInfo, Catmandu::Importer::PDFPages, Catmandu,
    Catmandu::Importer , Poppler