"Fossies" - the Fresh Open Source Software Archive  

Source code changes of the file "src/main/java/com/openkm/extractor/XMLTextExtractor.java" between
OpenKM-document-management-system-6.3.10.tar.gz and OpenKM-document-management-system-6.3.11.tar.gz

About: OpenKM (Knowledge Management) is a document management system that allows easy management of documents, users, roles and finding your enterprise documents and records. Community version (source code).

XMLTextExtractor.java  (OpenKM-document-management-system-6.3.10):XMLTextExtractor.java  (OpenKM-document-management-system-6.3.11)
skipping to change at line 24 skipping to change at line 24
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details. * GNU General Public License for more details.
* <p> * <p>
* You should have received a copy of the GNU General Public License along * You should have received a copy of the GNU General Public License along
* with this program; if not, write to the Free Software Foundation, Inc., * with this program; if not, write to the Free Software Foundation, Inc.,
* 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. * 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
*/ */
package com.openkm.extractor; package com.openkm.extractor;
import net.xeoh.plugins.base.annotations.PluginImplementation;
import org.slf4j.Logger; import org.slf4j.Logger;
import org.slf4j.LoggerFactory; import org.slf4j.LoggerFactory;
import org.xml.sax.InputSource; import org.xml.sax.InputSource;
import org.xml.sax.SAXException; import org.xml.sax.SAXException;
import org.xml.sax.XMLReader; import org.xml.sax.XMLReader;
import javax.xml.parsers.ParserConfigurationException; import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory; import javax.xml.parsers.SAXParserFactory;
import java.io.CharArrayWriter; import java.io.CharArrayWriter;
skipping to change at line 47 skipping to change at line 48
import java.nio.charset.Charset; import java.nio.charset.Charset;
/** /**
* Text extractor for XML documents. This class extracts the text content * Text extractor for XML documents. This class extracts the text content
* and attribute values from XML documents. * and attribute values from XML documents.
* <p> * <p>
* This class can handle any XML-based format (<code>application/xml+something</ code>), not just the base XML content * This class can handle any XML-based format (<code>application/xml+something</ code>), not just the base XML content
* types reported by {@link #getContentTypes()}. However, it often makes sense t o use more specialized extractors that * types reported by {@link #getContentTypes()}. However, it often makes sense t o use more specialized extractors that
* better understand the specific content type. * better understand the specific content type.
*/ */
@PluginImplementation
public class XMLTextExtractor extends AbstractTextExtractor { public class XMLTextExtractor extends AbstractTextExtractor {
/** /**
* Logger instance. * Logger instance.
*/ */
private static final Logger logger = LoggerFactory.getLogger(XMLTextExtra ctor.class); private static final Logger logger = LoggerFactory.getLogger(XMLTextExtra ctor.class);
/** /**
* Creates a new <code>XMLTextExtractor</code> instance. * Creates a new <code>XMLTextExtractor</code> instance.
*/ */
skipping to change at line 84 skipping to change at line 86
*/ */
public String extractText(InputStream stream, String type, String encodin g) throws IOException { public String extractText(InputStream stream, String type, String encodin g) throws IOException {
try { try {
CharArrayWriter writer = new CharArrayWriter(); CharArrayWriter writer = new CharArrayWriter();
ExtractorHandler handler = new ExtractorHandler(writer); ExtractorHandler handler = new ExtractorHandler(writer);
// TODO: Use a pull parser to avoid the memory overhead // TODO: Use a pull parser to avoid the memory overhead
SAXParserFactory factory = SAXParserFactory.newInstance() ; SAXParserFactory factory = SAXParserFactory.newInstance() ;
SAXParser parser = factory.newSAXParser(); SAXParser parser = factory.newSAXParser();
XMLReader reader = parser.getXMLReader(); XMLReader reader = parser.getXMLReader();
reader.setFeature("http://apache.org/xml/features/nonvali
dating/load-external-dtd", false);
reader.setFeature("http://xml.org/sax/features/external-p
arameter-entities", false);
reader.setFeature("http://xml.org/sax/features/external-g
eneral-entities", false);
reader.setFeature("http://xml.org/sax/features/validation
", false);
reader.setContentHandler(handler); reader.setContentHandler(handler);
reader.setErrorHandler(handler); reader.setErrorHandler(handler);
// It is unspecified whether the XML parser closes the st ream when // It is unspecified whether the XML parser closes the st ream when
// done parsing. To ensure that the stream gets closed ju st once, // done parsing. To ensure that the stream gets closed ju st once,
// we prevent the parser from closing it by catching the close() // we prevent the parser from closing it by catching the close()
// call and explicitly close the stream in a finally bloc k. // call and explicitly close the stream in a finally bloc k.
InputSource source = new InputSource(new FilterInputStrea m(stream) { InputSource source = new InputSource(new FilterInputStrea m(stream) {
public void close() { public void close() {
} }
 End of changes. 3 change blocks. 
0 lines changed or deleted 10 lines changed or added

Home  |  About  |  Features  |  All  |  Newest  |  Dox  |  Diffs  |  RSS Feeds  |  Screenshots  |  Comments  |  Imprint  |  Privacy  |  HTTP(S)