Package org.nuxeo.ecm.core.convert.plugins.text.extractors
package org.nuxeo.ecm.core.convert.plugins.text.extractors
Plugins that provide some sort of document transformation / text extraction.
-
ClassDescriptionBase class that contains SAX based text extractor fallbackDocx to text converter: parses the Open XML text document to read its content.Converter that tries to find a way to extract full text content according to input mime-type.Extract the text content of HTML documents while trying to respect the paragraph structure.Markdown to text converter.Based on Apache JackRabbit OOo converter.Pptx to text converter: parses the Open XML presentation document to read its content.Wrapper used because some consumer (SAX parser) tend to close the streamXML zip to text converter: parses the XML zip entries to read their content.