Export to
Wednesday, March 3, 2010 at 10:25pm.
Portland Perl Mongers -- XML with Xtra X
Access Notes
Please register for class via Eventbrite: https://freegeek.eventbrite.com Please check in at the front desk when you arrive to let them know you are here for the class. Bags must be checked at the front entrance.
Website
Description
How to learn to parse huge XML documents by doing it wrong for 5 years speaker: Tyler Riddle
When XML documents can't fit into memory the vast majority of solutions available on CPAN are no longer available to you; when the XML documents are so large they take up to 16 hours to process with the standard tools for handling large documents your hands are tied even more. Tyler will cover his learning experiences creating the Parse::MediaWikiDump and MediaWiki::DumpFile modules which are made to handle the 24 gigabyte English Wikipedia dump files in a reasonable time frame.
1) Real world benchmarks of C and perl libraries used to process huge
XML documents.
2) The dirty little secret about XS and what it means for you in this
context.
3) The evolution of the implementation of a nice interface around event
oriented (SAX style) XML parsing.
4) Why XML::LibXML::Reader and XML::CompactTree are your friends and
how to tame them.
As always, the meeting will be followed by social hour at the Lucky Lab.