Bob Balaban's Blog

     
    alt

    Bob Balaban

     

    What if....XHTML?

    Bob Balaban  July 21 2007 05:37:38 PM
    Greetings, Geeks!!

    Today, Wizards and Witches, we are going to wave our magic wands and pretend that the Domino Web Server (among other improvements IBM might, or might not, make to the server, now or someday, or maybe never -- you know that the lawyers make me say that....) emits XHTML as a format (yes, yes, of course we'd keep the way it does it right now as an option!)

    There are many aspects of this possibility that interest me. But for today I am going to limit myself to a single question upon which I would like you-all to render opinions:

          In this hypothetical situation, the XHTML the server would render will be "well-formed", of course (i.e., it will be valid XML, with matching closing tags, etc etc)
          Is it a requirement, do you think, that the XHTML ALSO be "valid" (i.e., it conforms to a pre-defined schema, either DTD or XSD)?

    If you think so, please explain why, and what extra functionality do you get from that. I'd also be interested in hearing which schema (I assume there are more than one) you would advocate.

    Ah, standards! So many to choose from!

    OK, I thought I was going to limit myself to one question, but I just can't do it. So: is UTF-8 or UTF-16 as an output character set for our hypothetical XHTML output good enough? If not, what other character sets need to be supported, IYHO?

    (PS: Yes, I ordered 2 copies of Book 7, and they arrived bright and early today -- no WAY our house can live on only 1 copy. I'm STILL not going to get near one of them for a week or so)
    Comments

    1Andrew Price  7/21/2007 10:55:55 PM  What if....XHTML?

    Sounds like a good idea in general terms, though it doesn't seem likely to very important since people could generate XML output anyway. :)

    2Matt White  7/22/2007 3:30:22 AM  What if....XHTML?

    XHTML output would be incredibly useful, I am not worried about a particular schema, as long as the output is consistent and predictable.

    In regards to the character set, I would prefer that we be able to configure it on a per design element (or even per database) basis. Normally UTF-8 would be fine, but I have worked on applications for European clients where I have had to change the character set for my own generated content to use ISO-8859-1 to support the larger character sets.

    Hope this helps.

    Matt

    3Michael Bourak  7/22/2007 3:43:18 AM  What if....XHTML?

    I can think about 2 things XHTML "could" bring :

    - W3C accessibility : but there, XHTML is not enough...but a great help

    - I met with some companies that would have liked Domino to generate XHTML to be able to exploit the content easier in other application (for ex, generate a printable version

    Those 2 things are rapidly growing as - at least in Europe and France - Government and related web site are asked to have all their web site "accessible" soon. And sure we want Domino to be a credible web app platform in those areas

    PS : of course, beside XHTML generation, the important thing to me is the ability to control the XHTML generation, as finer as possible...but you already knew that ;)

    Michael.

    4Bob Balaban  7/22/2007 3:55:56 AM  What if....XHTML?

    @1 - I'm a bit surprised at your comment. I agree some people will want XML, but plenty will still want the server to do a reasonable job with HTML, I suspect.

    @2 - It's my impression (I could be wrong) that with XML you get to specify the character set only in the <?xml> tag, thus only once per "page" of output. I don't see how per-element charsets could work.

    @3 - Printable!!! You want printable too?? :-)

    Accessibility is an important topic, thanks for mentioning.

    5Nathan T. Freeman  7/22/2007 6:50:08 AM  What if....XHTML?

    Meh. Maybe I'm missing something. Doesn't seem like a ring worth chasing to me.

    I know there's an experimental &OutputFormat=XHTML10 parameter right now, I believe for Notes documents, right? Does it see much use? There are 0 hits on Google for it inurl, and the only places it's generally covered are in an old interview with Jeff Calow and your own statement about ditching FONT tags.

    I know of a handful of Domino developers who have put a lot of energy into generating that kind of consistency, but there certainly hasn't been a lot of noise about it.

    WAY less energy has been put into that than into AJAX efforts, for sure.

    6Matt White  7/22/2007 8:24:11 AM  What if....XHTML?

    @4 - I meant that in the same way we can override the content type of a form or page at the moment, that we should also be able to define the character set being used by the form if it is generating XHTML.

    Matt

    7Kerr  7/22/2007 8:51:35 AM  What if....XHTML?

    @0, (original post). If you are saying that domino will spit out XHTML, then you are saying that is will conform to an XHTML DTD aren't you? Otherwise it's just some XML that looks a bit like it. Of course there are three DTDs for XHTML, strict, transitional and fameset. You will need to output frameset and transitional at least. It would also be best to allow the developer to override that. So if Domino thinks its output is conformant to the strict DTD, but the developer knows he has put something else in there that requires transitional, then he should be able to override that.

    @1, well you can output XHTML yourself, but I could also do all me web apps using nothing but agents for output to the user. Not a very nice way to do it.

    @2, Matt, UTF-8 can encode any character in unicode, so unless you need something like Egyptian Hieroglyphics (proposed but not yet in unicode) then you should be fine using UTF-8. Certainly every char in ISO-8859-1 can be encoded in UTF-8

    UTF-16 can do the same thing, using a slightly different encoding scheme. The basic difference being that UTF-8 uses a single byte for chars in the ASCII range, 2 bytes for the next chunk(European, Russian, Greek, etc.) and 3 bytes for the top chunk (Chinese, Japanese etc.). UTF-16 uses 2 bytes for everything. (actually there is really mad stuff higher up, that requires more bytes for both UTF-8 and 16). So simple rule of thumb; Europe / west, use UTF-8, east use UTF-16.

    It would be defiantly be great to be able to specify use of XHTML on a per database level, save having to argue with admins over server settings ;)

    8Rob McDonagh  7/22/2007 10:35:46 AM  What if....XHTML?

    I'm not much of a purist, more of a pragmatist. To me, the issue is around playing nice with other systems. This includes the various JavaScript frameworks (no, not going there, I promise Bob!). Most of them require their input to be well-formed (and sometimes valid) XML and/or XHTML. I once wasted a couple of hours trying to cram a Domino view into a RICO grid, only to realize it was impossible (in the versions at that time - it's been years) because Domino's output didn't meet RICO's requirements, and I couldn't customize the view output enough without hand-writing an agent. So to me the big issue here is that Domino should consistently generate well-formed and valid output that other systems can consume. I would expect the interest in composite apps and portal-type integration via SOA to make this a pretty obvious thing for IBM to do, though.

    9Dan Sickles  7/22/2007 11:59:35 PM  What if....XHTML?

    I'd like access to the XHTML output at the Doc/Collection level in Lotusscript/Java too:

    Doc.toXHTML(<encoding>) //default to "UTF-8"

    Same for collections, views, etc. This would give us browser-renderable output that plays nice with XML APIs and various DBs/CMSs and languages that like XML.

    The ability to override the toXHTML method or pass it a function would ice the cake.

    Same for toDXL(), toRSS/Atom(), toJSON() etc. Just give us reasonable default behavior and let us have at it.

    10Giuseppe Grasso  7/23/2007 1:26:49 AM  What if....XHTML?

    valid, strict, xhtml is an accessibility requirement by law here in Italy for government sites (and, in general, for government application feeded in a browser) so an option to have domino generate valid strict xhtml from notes markup could help a lot.

    11Samuel deHuszar Allen  7/23/2007 6:44:14 PM  What if....XHTML?

    xHTML is paramount. Since XML is the data and xHTML is the presentation of the data, it makes sense to be as strict as the current DOCTYPES recommend. CSS and JavaScript WILL fall apart (as will XML) if things aren't just so. If the Domino server can't keep up, people won't use it.

    I think the everyone's rabid frustration with CSS as it is currently implemented in Notes/Domino is as much a reason to do the next HTTP engine draft by the letter as any.

    I'm still in favor of (and will keep asking for until someone tells me why it can't be done) pulling in the Apache engine and writing a LotusScript parser(and any other necessary bindings) module instead of pushing the old HTTP engine up a sheer cliff face, but what do I know. ;)

    12Samuel deHuszar Allen  7/23/2007 6:48:25 PM  What if....XHTML?

    @5 You're kidding! How many flaming threads have appeared on poor Ed's doorstep everytime a new feature gets announced that isn't xHTML/CSS/updated Designer client related. People shout from the rooftops. There just hasn't been much talk about it because outside of Ferdy Christant's post on manually writing xhtml into the form, there isn't another solution.

    As for the experimental parameter, I don't think most people know it exists. I know I've never heard of it.

    13Nathan T. Freeman  7/23/2007 7:45:40 PM  What if....XHTML?

    @12 - I'm not kidding at all. How does valid, strict XHTML solve people's rabid frustration with CSS? It wouldn't provide you better facilities to manage CSS classes, or apply them more consistently to certain objects.

    How would XHTML make it easier to, say, do alternating row colors on a view? I just don't see it.

    14Tim Tripcony  7/23/2007 9:07:54 PM  What if....XHTML?

    I'm late to the party again... hope there's still some punch left.

    I keep hearing a rumor that eventually all major browsers will drop the "transitional" and only display valid XHTML. I'll believe it when I see it; the sheer volume of existing content out there that would fail miserably in that event would seem to make that an impractical decision in the foreseeable future. If you do believe the rumor and see it on the near-to-mid-term horizon, then you'd have to move forward with ensuring that Domino is ready when it happens. But personally, I just want to see Domino move away from using by default tags that have long since been deprecated.

    15Erik Brooks  7/24/2007 11:00:33 AM  What if....XHTML?

    I'm with @5 and @14 - XHTML would have little short and mid-term benefits immediately.

    Eventually, as Trip states, the major browsers will drop the "transitional" doctype and only work with valid XHTML, but I too think that is a LONG way off. If NMFR is to ship in ~2 years, then XHTML definitely can take a back seat to the entire CSS / Designer updating that is badly needed. You can then re-evaluate the need for XHTML at that point.

    Of course, if any planned "Cujo" integration ;) requires XHTML then you might need to squeeze bits in here and there.

    As for the encoding bit - UTF-8 / UTF-16 is great. You're supporting Unicode, and therefore all "modern" languages (except for, if I recall, one regional dialect in Korea that affects < 100,000 people). It doesn't cover "ancient" languages (such as the Egyptian hieroglyphics mentioned earlier), but for the vast bulk of modern dev work it's fine.

    16Samuel deHuszar Allen  7/24/2007 2:09:02 PM  What if....XHTML?

    @13 A strongly typed document will translate best across different browsers on different platforms. Valid 4.01 transitional and even strict pages often do not render properly on a Palm or Blackberry device, whereas using xhtml DOCTYPES improve the rendering significantly. This is with CSS disabled on the device, so it's not people's CSS coding adapting to small screens.

    Such differences aren't as apparent using standard desktop browsers as much of the browser code is geared towards trying to guess what the document means, but being strict, or allowing the option to be strict and conforming to the xhtml spec makes the data more accessible by more browsers and more easily digested and reprocessed by Web Services and scripting languages.

    More importantly, I think Domino needs to conform it's engine and design environment to encourage the use and generation of web standards coding in general.

    Still haven't heard back on why bundling Apache and just creating a LotusScript module is a no go.

    Anyone?

    17Samuel deHuszar Allen  7/24/2007 2:32:46 PM  What if....XHTML?

    ... I almost forgot, using xHTML as a DOCTYPE offers support for XSL transformation and better content definition.

    I would rather IBM make strides towards adopting the upcoming standards then to just get current enough in time to be behind again in a year.

    If IBM isn't already a member of the W3C committee, I doubt anyone would reject their application. Their roadmaps are well defined and participation is always welcome so long as no one tries to railroad the process to their own exclusive advantage.

    If IBM generated xHTML markup and supported at least a few relevant pieces of the CSS3-draft spec (like every modern browser) then the amount of design flexibility and extensibility Notes will have as a Web Platform will increase exponentially. It might even generate some renewed interest from those who were either scared away or gave up long ago.

    ...and isn't that the whole idea underlying all these conversations?

    18Mikkel Heisterberg  7/25/2007 3:07:12 AM  What if....XHTML?

    As already mentioned there are DTD's and schemas for XHTML as it is now but for me the main benefit of Domino always emitting XHTML would be that is "real" XML and hence is parsable with a run-of-the-mill XML parser.

    As for character encoding I think ISO-8859-1 would be nice for us over in Europe. I know that's what most of my customers are using internally and what I always use for XML encoding.

    19Andrew Tjecklowsky  7/25/2007 4:35:14 AM  What if....XHTML?

    We can already generate XHTML compliant content with formatting and styling if we use one of the top WYSIWYG editors out there [using web applications instead of Notes client], so in my opinion it would be better [as you are already talking about making Notes more web] to make sure that view content is generated in nedsted XML/JSON formats, and that readers/authors fields are honored accordingly, and that you get rid of the anoying 16/32/64 k limits everywhere!

    @18

    I can understand why a lot of Europeans use the ISO-8859-1 encoding as a lot of parsers and generators have been written without XML APIs (which does all the encoding). But... if you created UTF-8 content or consumed UTF-8 content then you would have no problems as this encoding is the default and should be supported by all XML parsers and generators. So why - in you oppinion - are people using ISO-8859-1 ? ;-)

    20Kerr  7/25/2007 1:49:37 PM  What if....XHTML?

    The only reason to use a character encoding other than UTF-8 or UTF-16 (in any practical application) is that you want a single byte charset and the chars you want to encode are outside the single byte UTF-8 range. But I'd have to ask, is space really that tight?

    Obviously you want your tools and servers to handle almost any charset you can throw at them, for compatablity reasons, but if you've got the choice why use anything other than UTF-8/16?

    21Erik Brooks  7/25/2007 10:11:22 PM  What if....XHTML?

    @20 - I think the only reason to work with any other charset would simply be for linking up with a legacy application.

    I would think UTF-8/16 is the way to go in every other case, especially in the case where you have cross-charset characters (e.g. ASCII, Chinese and Cyrillic) on the screen at the same time. Unicode's your only option at that point.

    22  7/30/2007 8:47:41 AM  What if....XHTML?

    UTF-8 is most widely used in specialiced content management software afaik.

    question to @all: What character sets are not supported by UTF-8?

    If its just some Inuuk characters or tribal characters from southern Papua Neuginea. We should forget about UTF-16 and those folks should learn some second language. Sorry.

    The project I work in mostly have used UTF-8 and not ISO-8859-1 and I am german. The SAP Application server uses UTF-8 by default in its webservices. Lets not forget that economically eastern europe is more and more integrated with

    western europe and their special characters are not part of ISO-8859-1. So I am kind of anti-ISO-8859-1.

    Schema would be a good thing. It may be possible to use a xml binding framework like JAXB to read and write the output more easily. Schema is also allways good to find bugs more easily.

    Its a great idea to serve xhtml as this is fits neatly into the idea of REST architecture webservice. In my opinion this has real potential to make integration of Domino simpler for a lot of scenarios.

    regards Axel

    23Axel Janssen  7/30/2007 8:48:24 AM  What if....XHTML?

    22) was my posting (forgot name).

    24Kerr  7/30/2007 10:59:58 AM  What if....XHTML?

    @22. UTF-8 is an encoding of Unicode. UTF-16 is also an encoding of Unicode. Anything you can encode in UTF-16 you can also encode in UTF-8. Unicode just lists a series of glyphs (characters, symbols etc.) and gives them a number (called a code point). Up until recently there were fewer than 2^16 code points. This meant you could encode every code point by just writing it as 2 bytes. This is basically UTF-16. But that means that you need 2 bytes for every character, doubling the size of an ASCI document. So they came up with UTF-8 which encodes the first chunk of code points as a single byte and higher parts as 2 bytes. This also means that the highest range of code points must take up 3 bytes. UTF-8 also has the advantage that any document encoded in 8bit ASCI is exactly the same as one encoded in UTF-8.

    As for what character sets are not supported by UTF-8, really the question should be what character sets are not supported by Unicode. The answer to that is very few, mainly of historical interest, like Egyptian Hieroglyphics, and these are being added as Unicode is expanded.

    25Bo Frederiksen  7/30/2007 2:09:16 PM  What if....XHTML?

    Please give us the option to make valid XHTML 1.0 Strict. It must be time to abandon deprecated mark-up. You should also consider serving it as application/xhtml+xml, but only to useragents, that request it in the HTTP headers Accept field. (Not for IE7 and below).

    At some point we will be able to mix in MathML and/or SVG or some other namespace.

    If I have a web design, that looks or works different in different browsers, the first thing I do is making sure the markup is valid!

    When it is, I go for CSS hacks or browser specific CSS.

    If that fails, some Javascript may be used fix differences.

    Try validating this page. It has & (ampersands) that are not entity encoded and tag are not closed with > or />.

    The W3 validator is fine for testing, but I prefer a Firefox extension:

    { Link }

    It automatically validates all pages you visit and is also works with intranets and internal applications.

    26Ben Poole  8/3/2007 12:19:07 PM  What if....XHTML?

    XHTML output makes sense for a variety of reasons, all espoused above. It's true that adopting XHTML makes no difference when it comes to CSS, but it DOES make a difference when it comes to styling with XSL or, of utmot importance, to accessibility-related laws in different territories.

    I suspect that there's been little noise on this because (a) Domino developers recognise that there are only a couple of viable techniques to work around the server's default HTML output and;

    (b) that the output querystring parameter Nathan mentions is simply an unknown entity to many Domino developers

    27Michael Gollmick  8/3/2007 5:27:38 PM  What if....XHTML?

    XHTML Output is one of the things we have asked for for years. Well, getting a valid output is one of the other things. I personally would so much appreciate having an option for valid XHTML from Dominos core instead of using various techniques an tricks to convert the sometimes invalid HTML 4.0.1 to XHTML.

    So in short words: having XHTML as an option would really be cool, being able to define the type of output on a per database or per "page" level would be even better. To be able to get the output into transitional and the strict DTD would be somewhat even better. Considering the big issue accessibility i believe strict XHTML output would be the solution of choice.

    @5) the mentioned parameter easily leads to invalid XHTML in combination with RichText and it is impossible to turn HTTP in the same mode without using the parameter on every single URL - at least none I know of.

    28John Foldager  8/8/2007 5:06:25 AM  What if....XHTML?

    @25 I agree somehow with you Bo. However... it IS possible to generate Strict XHTML 1.0 and also to set the content type as XML etc. It is NOT possible to get clean output from native content. You'll have to use a web-based editor or do your own agent output. But you know that! Just wanted to clarify.

    That said... let us be able to:

    - set the Content-Type ourself with @SetHTTPHeader("Content-type"; "our-own-content-type")!

    - replace substrings in large (more than the 16/32/64 K limits) strings/richtext-content with @formula!

    I believe that if the above was fixed then we could do serious web applications!

    29Kerr  8/9/2007 6:05:14 AM  What if....XHTML?

    @28, you can set the content type by using: form properties dialog/advanced tab(2)/On Web Access/Content Type : Other. Whatever you put in the field that appears is what gets sent to the browser as the Content-Type header. Note that if you want to specify the charset in the Content-Type header and you are using the other field, you need to type it in yourself AND set the appropriate entry in the Character Set dropdown.

    As for your second point native handeling of larger plain text fields would make lots of web dev simpler, but we've been down that one recently. One thing that I'd love to see is an option for a rich text field to hold xml natively, with in-build parsing etc.

    30bruce lill  8/15/2007 6:41:43 PM  What if....XHTML?

    I would love it IF the domino generated components were xhtml.

    Currently I use pages and forms that are set to html so I can make sure it can be validated xhtml. I use forms with pass thru only when components like file uploading are needed (they never validate).

    Right now my life would be much easier if Domino would present forms that are set as html instead of generating the "can't edit html forms" error. Having to add ?readform to url's in ugly and unnecessary.

    Another thing would be to have the field id default to the field name just like a column name does.

    31Benoit de TARADE  9/10/2007 10:11:39 AM  What if....XHTML?

    XHTML?!?

    doo!

    It'could be usefull in a way. However, well formated data need well formating browser ;-)

    When we saw how browsers feels free with W3C... (mainly "the one i don't need to name")

    32Bill McCuistion  11/15/2007 7:23:28 AM  What if....XHTML?

    Just give me a TIDY option. I'll take it's word, and if I he a tidy config option, then I can make it bend to my will.

    This discussion should not of occured in public, as it is obvious that IBM/Lotus should be standards-compliant, in all respects. Bookmarking this as a "WTF" item.