Stripping (HTML) tags in XSLT

http://blog.joachim-selke.de/2011/01/stripping-html-tags-in-xslt/

Stripping (HTML) tags in XSLT

As there doesn’t seem to be any built-in function in XLST for stripping tags from strings (e.g., to remove all markup from a piece of HTML-formatted text), people came up with a recursive template-based solution, which has been posted several times on the web (e.g.,here). However, I found this approach hard to use when the string to be cleaned from all tags already is stored in a variable or is created by using a xsl:value-of statement. Therefore, I transformed the existing template-based solution into a function-based one, which is a bit shorter and easier to use. Here it is:

 

<xsl:function name="util:strip-tags">
  <xsl:param name="text"/>
  <xsl:choose>
    <xsl:when test="contains($text, '&lt;')">
      <xsl:value-of select="concat(substring-before($text, '&lt;'),
        util:strip-tags(substring-after($text, '&gt;')))"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$text"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

 

Note: Don’t forget to declare a namespace for this function (called util in the above code).

UPDATE: From the comments I see that an example might be helpful here. Well, here it is:

example.xsl:

<xsl:stylesheet version="2.0" 
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
  xmlns:util="http://whatever">

<xsl:output method="text"/>

<xsl:function name="util:strip-tags">
  <xsl:param name="text"/>
  <xsl:choose>
    <xsl:when test="contains($text, '&lt;')">
      <xsl:value-of select="concat(substring-before($text, '&lt;'),
        util:strip-tags(substring-after($text, '&gt;')))"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$text"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

<xsl:template match="/">
<xsl:value-of select="util:strip-tags(/content)"/>
</xsl:template>

</xsl:stylesheet>

 

input.xml:

<?xml version="1.0" encoding="UTF-8"?>
<content>
test <some><nice><tags>xyz</tags></nice></some> test
</content>

 

Now I use the SAXON XSLT processor to strip the tags (inside the content tag) from the input file. Note that you might need to change the path to the JAR file to make this example work for you:
java -jar /usr/share/java/saxon.jar input.xml example.xsl

The output:


test xyz test

 

This entry was posted in XML. Bookmark the permalink.

4 Responses to Stripping (HTML) tags in XSLT

  1. Raju says:

    Hi,

    Could you please provide the namespace information for the util which you have used above.

    Thanks,
    Raju

    • Here is an example of how to declare a namespace, define a function within it, and make some function calls:http://www.xml.com/pub/a/2003/09/03/trxml.html.

      Does this answer your question?

      • Raju says:

        Hi,

        I am getting the following error.

        “The following application error(s) occurred:
        Failed to render content because of an error java.lang.NoSuchMethodException: For extension function, could not find method org.apache.xml.utils.NodeVector.stripTags([ExpressionContext,] ). ”

        Thanks,
        Raju

        • I have updated my post. It now gives a complete example. Any other problem must be related to your specific XSLT processor. Please understand that I cannot help you with that.

Advertisements
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s