I will be presenting the Kiwi wikitext parser at the Wikimedia Data Summit at O’Reilly’s headquarters in Sebastopol, CA on Friday. Kiwi is a formal parser written using a PEG and relying on the Peg/Leg tool from Ian Piumarta. It’s in C and supports most of MediaWiki’s syntax in a more or less tolerant manner. This was a side project at AboutUs that Thomas Luce started one Saturday. I joined up with him shortly afterward and we have now got a mostly complete parser. There is a lot of room for improvement, but it is fast, works in semi-production at AboutUs.org, and we hope it is a promising example of what can be done with MediaWiki’s syntax. We’re hoping we can build some community support behind by presenting at the Data Summit. If you want to check out the parser first hand you should visit Sam Goldstein‘s Ruby/Sinatra-based wiki site at DrasticCode.com.
Kiwi has been released under the permissive BSD 3-clause license and can be found on GitHub.
2 Responses to Kiwi Wikitext Parser at Wikimedia Data Summit
Leave a Reply
You must be logged in to post a comment.





I just found your project and now I have hope again!approx. 2 years ago I spent an dcasioernble amount of time building a good Wikipedia offline reader, which mimics the online one in features. I used code from the gwtwiki library, which mostly works, but comes short when trying to cope with the indescribable things MediaWiki and Wikipedia does. Localisation was also a big problem to handle. It was just one big mess, and I decided, that I was not willing to write code to handle this, resulting in many ugly quirks in the final rendering.I became increasingly frustrated with the idea, that the unbelievably awesome concept and institution of Wikipedia uses this crufty and rotten backend. Especially thinking about the potential of the stored information in Wikipedia.Thank you for this. I will try the heck out of it when I get some time on my hands.This project means a lot to me, thanks for releasing the source.
Thanks for the nice comment. I hope it does what you need. If it doesn’t, feel free to hack on it and send a pull request on github!