A Pike syntax highlighting module

So I thought I should try to port my syntax highlighting script, Syntaxer, written in PHP to Pike. Mostly for the fun of it but also to improve my knowledge of string handling in Pike. The greatest concern here is that PHP is a dynamic language and Pike is not (in the same sense) and the PHP version of Syntaxer heavily depends on dynamic loading of PHP files. The reason for this is that I generate the “syntax maps” dynamically from syntax files of Edit+. That means that if you want support for a new language just drop a .stx file in the right location and there you go. My script will convert that into a static PHP file, so that the conversion only needs to be done once, and load that file on the fly when that particular language is requested.

I thought that this method would be hard to implement in Pike – although it might be possible – so I had to come up with a slightly different approach. Frankly; it’s not that often you alter the .stx files or implement support for new languages so my solution is to manually create definitions for what ever language. But I still use the .stx files from Edit+ although one needs to copy and paste bit.

In the Pike solution each language is its own class that inherits the master class .stx. The only thing you pretty much need to put in the derived class is some .stx, .stx and .stx that specify what is what in the language. For example, the C++ definition looks like this:

31 lines of Pike
  1. inherit .Hilite;
  2. public string title = “C++”;
  3. //| Override the keywords mapping
  4. private mapping(string:multiset(string)) keywords = ([
  5. “keywords” : (<
  6. “auto”,“bool”,“break”,“case”,“catch”,“char”,“cerr”,“cin”,
  7. “class”,“const”,“continue”,“cout”,“default”,“delete”,“do”,
  8. “double”,“else”,“enum”,“explicit”,“extern”,“float”,“for”,
  9. “friend”,“goto”,“if”,“inline”,“int”,“long”,“namespace”,“new”,
  10. “operator”,“private”,“protected”,“public”,“register”,“return”,
  11. “short”,“signed”,“sizeof”,“static”,“struct”,“switch”,“template”,
  12. “this”,“throw”,“try”,“typedef”,“union”,“unsigned”,“virtual”,
  13. “void”,“volatile”,“while”,“__asm”,“__fastcall”,“__based”,
  14. “__cdecl”,“__pascal”,“__inline”,“__multiple_inheritance”,
  15. “__single_inheritance” >),
  16. “compiler” : (<
  17. “define”,“error”,“include”,“elif”,“if”,“line”,“else”,“ifdef”,“pragma” >)
  18. ]);
  19. //| Override the default since # is no line comment in C++
  20. protected array(string) linecomments = ({ “//” });
  21. void create()
  22. {
  23. ::create();
  24. colors += ([ “compiler” : “#060” ]);
  25. styles += ([ “compiler” : ({ “<b>”, “</b>” }) ]);
  26. }

And you really don’t need to make it more fancy than that. For most C-based languages the definitions in the master class .stx is enough. Just add the keywords to the .stx mapping and it looks better than nothing ;)

HTML parser

One thing that differs from the PHP version of Syntaxer is that SGML-based, or tag based, languages will be run through a HTML-parser. The downside of the PHP version is that tag content will be highlighted as well, which of course isn’t what we want, but since Pike has a decent HTML parser that behaves like a SAX parser so I wrote a class, .stx, that uses that for highlighting tag based stuff. The .stx class also inherits .stx so the methods and members are the same.

I wonder why there’s no, built-in, HTML parser for PHP?

A Roxen tag module

Of course I had to write a Roxen tag module so that we can highlight source code in Roxen web pages. This was the reason for writing the Pike module at all. The tag is named .stx which might not be the most innovative name but what the heck! The beauty of it is that I made it possible, in the module settings tab, to create a surrounding HTML template for the output. When you run some code through the parser you get the highlighted source code as well as the name of the language and how many lines of code was highlighted and it might be nice to present that as well (just like the code blocks on this site). It’s tedious writing that surrounding HTML every time so now it’s just to put that in the settings and the code blocks will always look the same.


There’s some stuff left to do but the code works well enough to be usable. And I must say that the speed of the Pike version is like a thousand times faster than the PHP version!

Oh, and I have implemented support for the following language:

  • ActionScript
  • C
  • C++
  • C#
  • CSS
  • Java
  • JavaScript
  • HTML
  • Perl
  • PHP
  • Pike
  • Python
  • Ruby
  • RXML
  • XSL

And that’s that for now.

Codify RXML tag and Syntaxer.pmod 17:31, Sat 17 October 2009 :: 183.9 kB