Class PDFInfo

java.lang.Object
org.nuxeo.ecm.platform.pdf.PDFInfo

public class PDFInfo extends Object
The class will parse the info embedded in a PDF, and return them either globally (toHashMap() or toString()) or via individual getters.

The PDF is parsed only at first call to run(). Values are cached during first call.

About page sizes, see PDF page boxes for details. Here, we get the info from the first page only. The dimensions are in points. Divide by 72 to get it in inches.

Since:
8.10
  • Constructor Details

    • PDFInfo

      public PDFInfo(Blob inBlob)
      Constructor with a Blob.
      Parameters:
      inBlob - Input blob.
    • PDFInfo

      public PDFInfo(Blob inBlob, String inPassword)
      Constructor for Blob + encrypted PDF.
      Parameters:
      inBlob - Input blob.
      inPassword - If the PDF is encrypted.
    • PDFInfo

      public PDFInfo(DocumentModel inDoc)
      Constructor with a DocumentModel. Uses the default file:content xpath to get the blob from the document.
      Parameters:
      inDoc - Input DocumentModel.
    • PDFInfo

      public PDFInfo(DocumentModel inDoc, String inXPath, String inPassword)
      Constructor for DocumentModel + encrypted PDF

      If inXPath is null or "", it is set to the default file:content value.

      Parameters:
      inDoc - Input DocumentModel.
      inXPath - Input XPath.
      inPassword - If the PDF is encrypted.
  • Method Details

    • setParseWithXMP

      public void setParseWithXMP(boolean inValue)
      If set to true, parsing will extract PDF.

      The value cannot be modified if run() already has been called.

      Parameters:
      inValue - true to extract XMP.
    • run

      public void run() throws NuxeoException
      After building the object with the correct constructor, and after possibly having set some parsing property (setParseWithXMP(), for example), this method will extract the information from the PDF.

      After extraction, the info is available through getters: Either all of them (toHashMap() or toString()) or individual info (see all getters).

      Throws:
      NuxeoException
    • toHashMap

      public HashMap<String,String> toHashMap()
      Return all and every parsed info in a String HashMap.

      Possible values are:

      • File name
      • File size
      • PDF version
      • Page count
      • Page size
      • Page width
      • Page height
      • Page layout
      • Title
      • Author
      • Subject
      • PDF producer
      • Content creator
      • Creation date
    • toFields

      public DocumentModel toFields(DocumentModel inDoc, HashMap<String,String> inMapping, boolean inSave, CoreSession inSession)
      The inMapping map is an HashMap where the key is the xpath of the destination field, and the value is the exact label of a PDF info as returned by toHashMap(). For example:

      
       pdfinfo:title=Title
       pdfinfo:producer=PDF Producer
       pdfinfo:mediabox_width=Media box width
       ...
       

      If inSave is false, inSession can be null.

      Parameters:
      inDoc - Input DocumentModel.
      inMapping - Input Mapping.
      inSave - Whether should save.
      inSession - If is saving, should do it in this particular session.
    • toString

      public String toString()
      Wrapper for toHashMap().toString()
      Overrides:
      toString in class Object
    • getNumberOfPages

      public int getNumberOfPages()
    • getMediaBoxWidthInPoints

      public float getMediaBoxWidthInPoints()
    • getMediaBoxHeightInPoints

      public float getMediaBoxHeightInPoints()
    • getCropBoxWidthInPoints

      public float getCropBoxWidthInPoints()
    • getCropBoxHeightInPoints

      public float getCropBoxHeightInPoints()
    • getFileSize

      public long getFileSize()
    • isEncrypted

      public boolean isEncrypted()
    • getAuthor

      public String getAuthor()
    • getContentCreator

      public String getContentCreator()
    • getFileName

      public String getFileName()
    • getKeywords

      public String getKeywords()
    • getPageLayout

      public String getPageLayout()
    • getPdfVersion

      public String getPdfVersion()
    • getProducer

      public String getProducer()
    • getSubject

      public String getSubject()
    • getTitle

      public String getTitle()
    • getXmp

      public String getXmp()
    • getCreationDate

      public Calendar getCreationDate()
    • getModificationDate

      public Calendar getModificationDate()
    • getPermissions

      public org.apache.pdfbox.pdmodel.encryption.AccessPermission getPermissions()