Configuring a metadata extractor

You can extract metadata from your XML content automatically and store the values as properties in Alfresco by defining a metadata extractor.

In the following steps you will be creating and/or configuring Spring beans in a Spring context file. A basic knowledge of Spring and the Spring Framework is required.

  1. If required, create a content model in Alfresco for the extracted metadata.
  2. Open or create a new Spring context file.
  3. Create an XPath Extractorbean. Reference an existing XML schema filter to specify the document types for which the extractor applies. Use the fully-qualified name with Clark notation as the key of the entries and the XPath query as the value.
    <bean id="samples.extractor.troubleshooting.xpath" parent="rdf.extractor.xpath.abstract">
      <property name="xmlSchemaFilter" ref="samples.dtd.troubleshooting" />
      <property name="xpathQueries">
          <entry key="{}causes" value="/tsTroubleshooting/tsBody/tsCauses" />
          <entry key="{}environment" value="/tsTroubleshooting/tsBody/tsEnvironment" />
          <entry key="{}shortdesc" value="/tsTroubleshooting/abstract/shortdesc" />
          <entry key="{}symptoms" value="/tsTroubleshooting/tsBody/tsSymptoms" />
          <entry key="{}tasks" value="/tsTroubleshooting/task/title" />
  4. Restart the application server for your changes to take effect.

Whenever the content is modified in Alfresco for a document of the specified DTD or XSD, the property value will be automatically extracted from the XML content at the defined XPath location.

Note: The extraction occurs asynchronously. All policies and content rules are deactivated on the node during extraction.
Note: The property can only be set on the node if the qualified name of the predicate relates to a property from the content model with the exact same namespace URI and local name. The said property should be defined on an aspect. Componize will add the aspect if it is missing from the node at the time the extraction takes place.