Package org.nuxeo.ecm.core.storage
Class FulltextExtractorWork
java.lang.Object
org.nuxeo.ecm.core.work.AbstractWork
org.nuxeo.ecm.core.storage.FulltextExtractorWork
- All Implemented Interfaces:
Serializable
,Work
Work task that does fulltext extraction from the string properties and the blobs of the given document, saving them
into the fulltext table.
- Since:
- 5.7 for the original implementation, 10.3 the extraction and update are done in the same Work
- See Also:
-
Nested Class Summary
Nested classes/interfaces inherited from interface org.nuxeo.ecm.core.work.api.Work
Work.Progress, Work.State
-
Field Summary
Modifier and TypeFieldDescriptionprotected static final String
protected static final String
protected List<DocumentRef>
protected DocumentModel
static final String
protected FulltextConfiguration
protected static final int
static final String
static final String
static final String
protected static final String
protected final boolean
If true, update the binary text from the document.protected final boolean
If true, update the simple text from the document.protected final boolean
Fields inherited from class org.nuxeo.ecm.core.work.AbstractWork
callerThread, completionTime, docId, docIds, FAILURE_EXCEPTION, FAILURE_MSG, GLOBAL_DLQ_COUNT_REGISTRY_NAME, id, isTree, loginContext, originatingUsername, progress, RANDOM, repositoryName, schedulePath, schedulingTime, session, startTime, state, status, suspended, suspending, traceContext, WORK_FAILED_EVENT, WORK_INSTANCE
-
Constructor Summary
ConstructorDescriptionFulltextExtractorWork
(String repositoryName, String docId, boolean updateSimpleText, boolean updateBinaryText, boolean useJobId) -
Method Summary
Modifier and TypeMethodDescriptionprotected String
blobToText
(Blob blob) Converts the blob to text by calling a converter.protected void
protected void
protected void
void
extractBinaryFulltext
(CoreSession session, DocumentModel doc) protected void
Gets the category for this work.protected String
getFulltextPropertyName
(String name, String indexName) int
Gets the number of times that this Work instance can be retried in case of concurrent update exceptions.getTitle()
Gets a human-readable name for this work instance.protected void
protected <O> String
protected String
removeEntities
(String string) protected String
removeHtml
(String string) protected String
stringToText
(String string) void
work()
This method should implement the actual work done by theWork
instance.Methods inherited from class org.nuxeo.ecm.core.work.AbstractWork
appendWorkToDeadLetterQueue, buildWorkFailureEventProps, cleanUp, closeSession, commitOrRollbackTransaction, equals, getCompletionTime, getDocument, getDocuments, getId, getOriginatingUsername, getPartitionKey, getProgress, getSchedulePath, getSchedulingTime, getSpanFromContext, getStartTime, getStatus, getWorkInstanceState, hashCode, isDocumentTree, isSuspending, isWorkInstanceSuspended, newDocumentLocation, openSystemSession, openUserSession, run, runWorkWithTransaction, setCompletionTime, setDocument, setDocument, setDocuments, setOriginatingUsername, setProgress, setSchedulePath, setStartTime, setStatus, setWorkInstanceState, setWorkInstanceSuspending, startTransaction, suspended, toString, workFailed
Methods inherited from class java.lang.Object
clone, finalize, getClass, notify, notifyAll, wait, wait, wait
Methods inherited from interface org.nuxeo.ecm.core.work.api.Work
isCoalescing, isGroupJoin, isIdempotent, onGroupJoinCompletion
-
Field Details
-
SYSPROP_FULLTEXT_SIMPLE
- See Also:
-
SYSPROP_FULLTEXT_BINARY
- See Also:
-
SYSPROP_FULLTEXT_JOBID
- See Also:
-
FULLTEXT_DEFAULT_INDEX
- See Also:
-
CATEGORY
- See Also:
-
TITLE
- See Also:
-
ANY2TEXT_CONVERTER
- See Also:
-
HTML_MAGIC_OFFSET
protected static final int HTML_MAGIC_OFFSET- See Also:
-
fulltextConfiguration
-
document
-
docsToUpdate
-
updateSimpleText
protected final boolean updateSimpleTextIf true, update the simple text from the document. -
updateBinaryText
protected final boolean updateBinaryTextIf true, update the binary text from the document. -
useJobId
protected final boolean useJobId
-
-
Constructor Details
-
FulltextExtractorWork
-
-
Method Details
-
getCategory
Description copied from interface:Work
Gets the category for this work.Used to choose an execution queue.
- Specified by:
getCategory
in interfaceWork
- Overrides:
getCategory
in classAbstractWork
- Returns:
- the category, or
null
for the default
-
getTitle
Description copied from interface:Work
Gets a human-readable name for this work instance.- Returns:
- a human-readable name
-
getRetryCount
public int getRetryCount()Description copied from class:AbstractWork
Gets the number of times that this Work instance can be retried in case of concurrent update exceptions.- Overrides:
getRetryCount
in classAbstractWork
- Returns:
- 0 for no retry, or more if some retries are possible
- See Also:
-
work
public void work()Description copied from interface:Work
This method should implement the actual work done by theWork
instance.It should periodically update its progress through
Work.setProgress(org.nuxeo.ecm.core.work.api.Work.Progress)
.To allow for suspension by the
WorkManager
, it should periodically callWork.isSuspending()
, and iftrue
callWork.suspended()
return early with saved state data.Clean up can by implemented by
Work.cleanUp(boolean, Exception)
.- Specified by:
work
in interfaceWork
- Specified by:
work
in classAbstractWork
- See Also:
-
extractBinaryFulltext
-
initFulltextConfiguration
protected void initFulltextConfiguration() -
findDocsToUpdate
protected void findDocsToUpdate() -
extractAndUpdate
protected void extractAndUpdate() -
extractAndUpdateSimpleText
protected void extractAndUpdateSimpleText() -
extractAndUpdateBinaryText
protected void extractAndUpdateBinaryText() -
stringToText
-
removeHtml
-
removeEntities
-
blobToText
Converts the blob to text by calling a converter. -
joinText
-
getFulltextPropertyName
-