UseModWiki | WikiPatches | RecentChanges | Preferences

Hi there. I've been playing around with Usemod for a few days now, trying to implement 'safe tables' and this is pretty much what I came up with. You gotta realize I had no experience with perl at all before embarking on this 'adventure' so it's quite likely this is highly buggy and probably not useful to anyone at all. Also, not every table attribute is implemented, for example, 'nowrap.' I left it out mainly because if people want no wrapping, they can use a non-breaking space. It would be trivial of course, to use this code for a template for any HTML attribute stripping.

Ahh, update! I fixed it so that it would work regardless of where the opening tags went. And yes, it was quite trivial. Just a while loop. I'm sure some of you saw it and knew the solution right away.

Quick edit: I'd copied an earlier incarnation with a few bugs (you couldn't split a tag with a lot of attributes into more than one line as per the spec- fixed).

Third edit; hmm, I was fooling around it again, and had a few extra ideas. Nothing really, just made it so that it wouldn't render a table without a closing tag (though there's still a problem with tables within tables- if one tag is missing, it'll still render the row/column tags for both- is it worth keeping this from occuring?). Also, maybe sped it up a little by not global replacing closing tags in all cases (only when an opening tag exists).

Just cleaned it up a little more. Still annoyed about tables within tables, though.

Fixed a very horrible infinite loop bug in the closing tags. If anyone is actually using this, I'd urge to update (took me awhile to figure out what I did wrong, too, even though it was like a five character bug). :(

Also fixed another bug related to the twolower bit I had. I don't like HTML tags which are all caps, so I lowercased them... before, though, it was lower casing some other text too.

I began to make it so that tag variables didn't need to be strict (ie, encompased in "), but I couldn't pull it off without a code rewrite, I think.

I was right, a code rewrite, or a lot of additional code would have been necessary to make it non-strict, but compliant (ie, attributes could be enclosed in "" or not, but not halfway). So you can write something like this <table border=1"> and it will be rendered... pretty nasty, but hopefully no one will be silly enough to do that.

I also made it so that people couldn't do a simple table, yet fill it with garbage text. You could 'hide' text within table tags before, now you're limited to how much garbage you can put in there. This gives you the ablity to 'hide' message still, but they'd be very small. e.g. <table this is a hidden message that will be parsed out border=1align=left> and so on. This was done so that, if you wanted, you could spread out attributes on multiple lines.

Note that you don't have to space between attributes (as suggested by my previous example), so <tableborder=1align=left> is perfectly valid. Why not? Might as well... it's not like everything else in UseMod is quirk free. Perfection and strictness isn't always necessary. :)

One last thing, I made it so that you could only have 99 rows/columns. And I limited the width to 800 pixels or 100 percent (200 pixels for individual cells). This is easy to edit though.

This code goes above the return $_ in sub CommonMarkup. You can enclose it within a if ($doTables) { codehere }, and create a $doTables value if you want (make its value 1), but that's up to you.

  #This code is pretty trivial. We take table attributes and put them in a string.
  #Then we iterate over that string, looking for 'safe' matches, adding those matches to
  #another string which ultimately becomes the actual tag. Simple, really. :) I think the
  #best strength of regexp, is that we can 'execute' strings. This should render tables
  #perfectly, with most of the HTML 3.0 table attributes. You can also put images in tables
  #by simply using the image URL, and enclosing it in a cell (the aligns, etc, should work).

    while (my ($table_str) = /<table(.{0,96}?)>.*?<\/table>/gis) {
  my $table_attr; my $table = "<table";
    foreach $table_attr ("border=\"?[0-9]{1\,2}\"?","cellpadding=\"?[0-9]{1\,2}\"?","cellspacing=\"?[0-9]{1\,2}\"?","width\=\"?(:?\\b(:?100|[0-9]{1,2})\\b%|\\b(:?800|[1-7]?[0-9]{1,2}\\b))\"?","align\=\"?(:?left|center|right)\"?","bgcolor\=\"?#[0-9A-Fa-f]{6}\"?")
             $table .= " " . $1 if ($table_str =~ /($table_attr)/is)
  $table .= ">"; s/<table(.*?)>(.*?)<\/table>/\L$table\E$2<\/table>/is; 
       while (my ($td_str) = /<td(.{0,96}?)>.*?<\/td>/gis) {
       my $td_attr; my $td = "<td";

         foreach $td_attr ("align\=\"?(:?left|center|right)\"?","valign\=\"?(:?top|middle|bottom|baseline)\"?","colspan\=\"?[0-9]{1\,2}\"?","rowspan\=\"?[0-9]{1\,2}\"?","width\=\"?(:?\\b(:?100|[0-9]{1,2})\\b%|\\b(:?200|[1-2]?[0-9]{1,2}\\b))\"?","bgcolor\=\"?#[0-9A-Fa-f]{6}\"?")
                 $td .= " " . $1 if ($td_str =~ /($td_attr)/is);
       $td .= ">"; s/<td(.*?)>(.*?)<\/td>/\L$td\E$2<\/td>/is;
       while (my ($th_str) = /<th(.{0,96}?)>.*?<\/th>/gis) {
       my $th_attr; my $th = "<th";

         foreach $th_attr ("align\=\"?(:?left|center|right)\"?","valign\=\"?(:?top|middle|bottom|baseline)\"?","colspan\=\"?[0-9]{1\,2}\"?","rowspan\=\"?[0-9]{1\,2}\"?","width\=\"?(:?\\b(:?100|[0-9]{1,2})\\b%|\\b(:?200|[1-2]?[0-9]{1,2}\\b))\"?","bgcolor\=\"?#[0-9A-Fa-f]{6}\"?")
                 $th .= " " . $1 if ($th_str =~ /($th_attr)/is);
       $th .= ">"; s/<th(.*?)>(.*?)<\/th>/\L$th\E$2<\/th>/is;
       while (my ($tr_str) = /<tr(.{0,36}?)>.*?<\/tr>/gis) {
       my $tr_attr; my $tr = "<tr";

         foreach $tr_attr ("align\=\"?(:?left|center|right)\"?","valign\=\"?(:?top|middle|bottom|baseline)\"?")
                 $tr .= " " . $1 if ($tr_str =~ /($tr_attr)/is);
       $tr .= ">"; s/<tr(.*?)>(.*?)<\/tr>/\L$tr\E$2<\/tr>/is;

I can't believe I keep posting buggy code. :-P

Didn't do as many test on this version, but I went over the code byte by byte about four times. It's probably a lot faster, since I deinitalized backreferences. And limited how much code could be in a table (if someone decided to put in the input box limit, it would have been slow, surely- well, that won't happen now).

Enjoy. (I know I will!)

PS. Remember to 'edit page' to get at the actual code!

UseModWiki | WikiPatches | RecentChanges | Preferences
Edit text of this page | View other revisions | Search MetaWiki
Last edited October 1, 2007 5:36 pm by MarkusLude (diff)