<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Deathy&#039;s blog &#187; XSLT</title>
	<atom:link href="http://blog.deathy.info/tag/xslt/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.deathy.info</link>
	<description>Ramblings of a geek</description>
	<lastBuildDate>Wed, 03 Mar 2010 18:02:40 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.1</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>XML, Speed and how NOT to ask questions</title>
		<link>http://blog.deathy.info/2010/02/12/xml-speed-and-how-not-to-ask-questions/#utm_source=rss&amp;utm_medium=rss&amp;utm_campaign=xml-speed-and-how-not-to-ask-questions</link>
		<comments>http://blog.deathy.info/2010/02/12/xml-speed-and-how-not-to-ask-questions/#comments</comments>
		<pubDate>Fri, 12 Feb 2010 19:34:41 +0000</pubDate>
		<dc:creator>Cristian Vat</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[development]]></category>
		<category><![CDATA[DOM]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[SAX]]></category>
		<category><![CDATA[speed]]></category>
		<category><![CDATA[StAX]]></category>
		<category><![CDATA[XML]]></category>
		<category><![CDATA[XPATH]]></category>
		<category><![CDATA[XSLT]]></category>

		<guid isPermaLink="false">http://blog.deathy.info/?p=55</guid>
		<description><![CDATA[var post = { post_id: 55, text : &#8220;This post partially addressed to some coworkers/people who don&#8217;t know how to ask questions.
Some questions sounded like: &#8220;I&#8217;ve got this xml processing thing I need to do, what would be faster?&#8221; and then enumerating SAX,DOM,etc&#8230;but mostly SAX versus DOM.
Of course they are VERY vague about the processing [...]]]></description>
			<content:encoded><![CDATA[<p>var post = { post_id: 55, text : &#8220;This post partially addressed to some coworkers/people who don&#8217;t know how to ask questions.</p>
<p>Some questions sounded like: &#8220;I&#8217;ve got this xml processing thing I need to do, what would be faster?&#8221; and then enumerating SAX,DOM,etc&#8230;but mostly SAX versus DOM.</p>
<p>Of course they are VERY vague about the processing they must do, so it&#8217;s similar to a <a href="http://www.perlmonks.org/index.pl?node_id=542341" target="_blank">XY Problem</a> and I just want to hit them with something hard.</p>
<p>Disclaimer: I don&#8217;t want to say anything bad about my coworkers, they are great usually, but it&#8217;s the small things that I get annoyed about.</p>
<p>So how do we treat such a problem?</p>
<p><strong>First</strong>: Stop talking and start communicating, explain <strong>what</strong> you are trying to do avoiding any <strong>how</strong>s for now. You might be on the wrong path from the beginning.</p>
<p><strong>Second</strong>: Do you really have a speed problem? I once had a data processing step which took around 14 hours to complete. Since it worked just fine, always gave good results and I had to run it every 6 months I really consider it was <strong>fast enough</strong>. Sure, when I knew I was going to have to run it a couple of times in the same month I took a couple of hours to make it run two times faster. But otherwise I wouldn&#8217;t even have thought about making it faster.</p>
<p>I usually try to automate/optimize something when I had to do it manually/slowly at least two times and I know a third time is coming. Or when actually writing an automated tool and running it is quicker than doing something manually.</p>
<p><strong>Third</strong> (but likely first when thinking how to do something): There is no silver bullet when parsing XML.</p>
<p>Be a programmer and realize that everything is a compromise. It all depends on what you&#8217;re trying to achieve. XML is one of those cases where you really need to define how you want to look at an XML file based on what you want. So just a few examples:</p>
<ul>
<li>you want to count characters in an xml file: you would be rather stupid not to just open the file as text and start reading it character by character (assuming you have no entities and know the encoding, as I said..compromise).</li>
<li>you want to count elements/attributes in an xml file: all the information you need is provided by a SAX parser, so why use anything else? It&#8217;s also the fastest and safest method (now considering you do have entities and weird encoding <img src='http://blog.deathy.info/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' />  )
<ul>
<li>You want more control over the flow/speed of parsing, use StAX (Streaming API for XML). SAX is a push-parser, StAX is a pull-parser.</li>
</ul>
</li>
<li>you want to get information from some elements selected by weird conditions: assuming the weird conditions depend on ancestors/descendants/attributes and something else. You will have to create a DOM, no other way. How you create the DOM and which kind of DOM is debatable. If you say SAX is faster go on, try keeping track of all the context by yourself, reinventing the wheel and missing your deadline. If you&#8217;re really good with SAX you&#8217;ll end up with your own DOM implementation in the end and kick yourself. Of course, not all DOM implementations are created equal, I kind of like Saxon&#8217;s TinyTree for speed and low memory consumption.
<ul>
<li>Side note: The DOM API sucks, has always sucked and everybody knows it. If you want to easily understand/modify your code later on do yourself some good and use XPath</li>
</ul>
</li>
<li>you want freaky conversions from one dtd to another, weird processing, special output: Go the XSLT/XSLT2/XPATH/XQUERY way and laugh at stupid Java programmers struggling with their DOMs.</li>
<li>and no, I will not discuss JAXB because I hate being considered a java-specific person, but if you are you should look at it.</li>
</ul>
<p>And just to be very impartial, if you&#8217;re getting your XML from something built also by you, then really really think if XML is the good choice. Although I <em>specialize</em> in XML, I feel so good when I can convert somebody who was thinking of using XML to JSON <img src='http://blog.deathy.info/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p>&lt;/rant&gt;</p>
<p>&#8220;}</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.deathy.info/2010/02/12/xml-speed-and-how-not-to-ask-questions/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>
