Class PDFInfo


  • public class PDFInfo
    extends Object
    The class will parse the info embedded in a PDF, and return them either globally (toHashMap() or toString()) or via individual getters.

    The PDF is parsed only at first call to run(). Values are cached during first call.

    About page sizes, see PDF page boxes for details. Here, we get the info from the first page only. The dimensions are in points. Divide by 72 to get it in inches.

    Since:
    8.10
    • Constructor Detail

      • PDFInfo

        public PDFInfo​(Blob inBlob)
        Constructor with a Blob.
        Parameters:
        inBlob - Input blob.
      • PDFInfo

        public PDFInfo​(Blob inBlob,
                       String inPassword)
        Constructor for Blob + encrypted PDF.
        Parameters:
        inBlob - Input blob.
        inPassword - If the PDF is encrypted.
      • PDFInfo

        public PDFInfo​(DocumentModel inDoc)
        Constructor with a DocumentModel. Uses the default file:content xpath to get the blob from the document.
        Parameters:
        inDoc - Input DocumentModel.
      • PDFInfo

        public PDFInfo​(DocumentModel inDoc,
                       String inXPath,
                       String inPassword)
        Constructor for DocumentModel + encrypted PDF

        If inXPath is null or "", it is set to the default file:content value.

        Parameters:
        inDoc - Input DocumentModel.
        inXPath - Input XPath.
        inPassword - If the PDF is encrypted.
    • Method Detail

      • setParseWithXMP

        public void setParseWithXMP​(boolean inValue)
        If set to true, parsing will extract PDF.

        The value cannot be modified if run() already has been called.

        Parameters:
        inValue - true to extract XMP.
      • run

        public void run()
                 throws NuxeoException
        After building the object with the correct constructor, and after possibly having set some parsing property (setParseWithXMP(), for example), this method will extract the information from the PDF.

        After extraction, the info is available through getters: Either all of them (toHashMap() or toString()) or individual info (see all getters).

        Throws:
        NuxeoException
      • toHashMap

        public HashMap<String,​String> toHashMap()
        Return all and every parsed info in a String HashMap.

        Possible values are:

        • File name
        • File size
        • PDF version
        • Page count
        • Page size
        • Page width
        • Page height
        • Page layout
        • Title
        • Author
        • Subject
        • PDF producer
        • Content creator
        • Creation date
      • toFields

        public DocumentModel toFields​(DocumentModel inDoc,
                                      HashMap<String,​String> inMapping,
                                      boolean inSave,
                                      CoreSession inSession)
        The inMapping map is an HashMap where the key is the xpath of the destination field, and the value is the exact label of a PDF info as returned by toHashMap(). For example:

        
         pdfinfo:title=Title
         pdfinfo:producer=PDF Producer
         pdfinfo:mediabox_width=Media box width
         ...
         

        If inSave is false, inSession can be null.

        Parameters:
        inDoc - Input DocumentModel.
        inMapping - Input Mapping.
        inSave - Whether should save.
        inSession - If is saving, should do it in this particular session.
      • toString

        public String toString()
        Wrapper for toHashMap().toString()
        Overrides:
        toString in class Object
      • getNumberOfPages

        public int getNumberOfPages()
      • getMediaBoxWidthInPoints

        public float getMediaBoxWidthInPoints()
      • getMediaBoxHeightInPoints

        public float getMediaBoxHeightInPoints()
      • getCropBoxWidthInPoints

        public float getCropBoxWidthInPoints()
      • getCropBoxHeightInPoints

        public float getCropBoxHeightInPoints()
      • getFileSize

        public long getFileSize()
      • isEncrypted

        public boolean isEncrypted()
      • getAuthor

        public String getAuthor()
      • getContentCreator

        public String getContentCreator()
      • getFileName

        public String getFileName()
      • getKeywords

        public String getKeywords()
      • getPageLayout

        public String getPageLayout()
      • getPdfVersion

        public String getPdfVersion()
      • getProducer

        public String getProducer()
      • getSubject

        public String getSubject()
      • getTitle

        public String getTitle()
      • getXmp

        public String getXmp()
      • getCreationDate

        public Calendar getCreationDate()
      • getModificationDate

        public Calendar getModificationDate()
      • getPermissions

        public org.apache.pdfbox.pdmodel.encryption.AccessPermission getPermissions()