Package org.nuxeo.ecm.platform.pdf
Class PDFInfo
java.lang.Object
org.nuxeo.ecm.platform.pdf.PDFInfo
The class will parse the info embedded in a PDF, and return them either globally (
toHashMap()
or
toString()
) or via individual getters.
The PDF is parsed only at first call to run()
. Values are cached during first call.
About page sizes, see PDF page boxes for details. Here, we get the info from the first page only. The dimensions are in points. Divide by 72 to get it in inches.
- Since:
- 8.10
-
Constructor Summary
ConstructorDescriptionConstructor with a Blob.Constructor for Blob + encrypted PDF.PDFInfo
(DocumentModel inDoc) Constructor with a DocumentModel.PDFInfo
(DocumentModel inDoc, String inXPath, String inPassword) Constructor for DocumentModel + encrypted PDF -
Method Summary
Modifier and TypeMethodDescriptionfloat
float
long
float
float
int
org.apache.pdfbox.pdmodel.encryption.AccessPermission
getTitle()
getXmp()
boolean
void
run()
After building the object with the correct constructor, and after possibly having set some parsing property (setParseWithXMP()
, for example), this method will extract the information from the PDF.void
setParseWithXMP
(boolean inValue) If set to true, parsing will extract PDF.toFields
(DocumentModel inDoc, HashMap<String, String> inMapping, boolean inSave, CoreSession inSession) TheinMapping
map is an HashMap where the key is the xpath of the destination field, and the value is the exact label of a PDF info as returned bytoHashMap()
.Return all and every parsed info in a StringHashMap
.toString()
Wrapper fortoHashMap().toString()
-
Constructor Details
-
PDFInfo
Constructor with a Blob.- Parameters:
inBlob
- Input blob.
-
PDFInfo
Constructor for Blob + encrypted PDF.- Parameters:
inBlob
- Input blob.inPassword
- If the PDF is encrypted.
-
PDFInfo
Constructor with a DocumentModel. Uses the defaultfile:content
xpath to get the blob from the document.- Parameters:
inDoc
- Input DocumentModel.
-
PDFInfo
Constructor for DocumentModel + encrypted PDFIf
inXPath
isnull
or""
, it is set to the defaultfile:content
value.- Parameters:
inDoc
- Input DocumentModel.inXPath
- Input XPath.inPassword
- If the PDF is encrypted.
-
-
Method Details
-
setParseWithXMP
public void setParseWithXMP(boolean inValue) If set to true, parsing will extract PDF.The value cannot be modified if
run()
already has been called.- Parameters:
inValue
- true to extract XMP.
-
run
After building the object with the correct constructor, and after possibly having set some parsing property (setParseWithXMP()
, for example), this method will extract the information from the PDF.After extraction, the info is available through getters: Either all of them (
toHashMap()
ortoString()
) or individual info (see all getters).- Throws:
NuxeoException
-
toHashMap
Return all and every parsed info in a StringHashMap
.Possible values are:
- File name
- File size
- PDF version
- Page count
- Page size
- Page width
- Page height
- Page layout
- Title
- Author
- Subject
- PDF producer
- Content creator
- Creation date
-
toFields
public DocumentModel toFields(DocumentModel inDoc, HashMap<String, String> inMapping, boolean inSave, CoreSession inSession) TheinMapping
map is an HashMap where the key is the xpath of the destination field, and the value is the exact label of a PDF info as returned bytoHashMap()
. For example:pdfinfo:title=Title pdfinfo:producer=PDF Producer pdfinfo:mediabox_width=Media box width ...
If
inSave
is false, inSession can be null.- Parameters:
inDoc
- Input DocumentModel.inMapping
- Input Mapping.inSave
- Whether should save.inSession
- If is saving, should do it in this particular session.
-
toString
Wrapper fortoHashMap().toString()
-
getNumberOfPages
public int getNumberOfPages() -
getMediaBoxWidthInPoints
public float getMediaBoxWidthInPoints() -
getMediaBoxHeightInPoints
public float getMediaBoxHeightInPoints() -
getCropBoxWidthInPoints
public float getCropBoxWidthInPoints() -
getCropBoxHeightInPoints
public float getCropBoxHeightInPoints() -
getFileSize
public long getFileSize() -
isEncrypted
public boolean isEncrypted() -
getAuthor
-
getContentCreator
-
getFileName
-
getKeywords
-
getPageLayout
-
getPdfVersion
-
getProducer
-
getSubject
-
getTitle
-
getXmp
-
getCreationDate
-
getModificationDate
-
getPermissions
public org.apache.pdfbox.pdmodel.encryption.AccessPermission getPermissions()
-