On-the-fly validation of XML markup languages using off-the-shelf tools

Pekka Kilpeläinen

Why use Java? Has built in XML and GUI supprt (JAXP and Swing) and its general usage and portability make is easy to deliver.

They created their own simple text editor, Xeditor, which has constant validation based on settings (WF or schema/DTD valid). He is clearly not talking about a finished product…it is missing the basic items of XML editing tools. I wonder why this is better than oXygen (other than it is likely free) but there are plenty of free XML editors available.

But that doesn’t seem to be the point of the talk. I now see that he is making this editor to learn about validation and perhaps find new techniques that will advance the field…that’s why he looks at Clark’s nXML but dismisses it due to the 17k lines of Emacs Lisp vs his 1k of Java. He can have a better understanding of what is going on in his product. Again this makes sense like the gentleman yesterday who wrote the xHTML validator…writing from the ground up teaches you a lot about the underlying technologies thus learning more than those of us who just read the books and the listservs.

One of the problems I see is that he has a firewall between the editor and the XML handling. This makes things easier for research but misses the markup completion and syntax highlighting that is demanded of XML editors these days. He accepts that but since his goal is not a production level editor but rather a tool for examining techniques and concepts for validation of XML those features are seemingly not very important.

He makes the point that MSV, Xerces, and Jing all validate xmlschema.xsd in unnoticably different amounts of times. The difference is less that 0.1 seconds…although I would say that that difference might be significant when run against large sets of large files (physical science journal volumes perhaps?) . In that case a good take away for this is that jing performed much better overall…not noticeably over one file but might be visible in the hypothetical large set I mentioned. I’ll have to keep that in mind.

Question: Why did you not use the DOM instead of re-parsing all the time? Answer: He admits taking the easy road. No problem there because they weren’t looking for the most efficient solution for editing but rather to examine the techniques of validation.

Question: If the validator is reporting errors for every keystroke isn’t that inefficent and is it waiting for me to type or am I waiting for the validator to complete all the time (lag time on the keystrokes)? The answer appears to be that the validator is always working but the speed of the validation is fast enough that any lag time is not noticeable.

Michael Kay states that the times for validation are completely reasonable if you take out the instantiation and initialization of the java parser. (My comment: That’s why using the Saxon API is a much better idea than running the validation in a command line (or a specific system call within a script). The speeds against large sets are astronomically different.)

Explore posts in the same categories: extrememarkup, extrememarkup07

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: