updates: identi.ca, twitter

="sydphp"

Sydney PHP Group provides a community for PHP developers in Sydney, Australia.
We run regular meetings in the city and membership is free and open to anyone with an interest in web development.

How do I join?

Register an account on our blog

What next?

After registration, you can RSVP one or more attendees for events. If you wish to present, come to a meeting and have a chat with a group organiser.

Who are the organisers?

Currently Tim, James, Graham and Dean. One or more of us will be attending each meeting and you can reach us by DM'ing sydphp on Twitter

Get yourself known!

Do you provide web development related services in Sydney and want to be known in the Sydney PHP development community? You can reach our community by getting your RSS/ATOM feed syndicated on sydphp.org.

Current Events

View and RSVP to current events

Subscribe to the one-way announcement mailing list at Google Groups for updates.

PHP Jobs in Sydney

We a provide a free, one-way mailing list at Google Groups. Posts are moderated before being published.

Flickr pool

Consuming XML, fast, with PHP and XMLReader

Let’s face it, XML isn’t the lightest of data serialisation formats out there. Consider and compare this:


<alternate_description>something else<alternate_description>

against this, in JSON


{ alternate_description : "something else" }

Those repetitive XML tags are really just extra bytes to download and parse. Unfortunately, sometimes, we have to consume huge gobs of XML for a project and for that we have XMLReader, the lesser known cousin of SimpleXML.


Unlike SimpleXML, which consumes the entire document before making it available for parsing, XMLReader “acts as a cursor going forward on the document stream and stopping at each node on the way” (php.net/xmlreader). Kind of like a line-by-line CSV parser but acting on the nodes of an XML document.

Choosing the right XML parser for the job is very important, as if you don’t choose correctly it can lead to unwanted and avoidable performance issues on your server.


To illustrate this, I pointed both SimpleXML and XMLReader at the same 190MB XML document via a PHP shell script, ran two tests on each extension and profiled the results.  Test one found a node at the start of the file, the other test found a node at the end of the file.


The XML document in question is a standard XML document containing 21467 records, it looks something like this:


<persons>
<person>
<name>John</name>
<!-- other nodes -->
</person>
<!-- 21466 person nodes -->
</persons>

Peak memory usage is measured by the “top” command (%MEM).


SimpleXML


Test One:
Nodes : 1
Peak Memory Usage: 18%
Processed 190MB of XML in 3.14164 seconds

Test Two:
Nodes : 21467
Peak Memory Usage: 18%
Processed 190MB of XML in 3.20796 seconds

XMLReader


Test One:
Nodes: 1
Peak Memory Usage: 0.3%
Processed 190MB of XML in 0.00128 seconds

Test Two:
Nodes : 21467
Peak Memory Usage: 0.7%
Processed 190MB of XML in 16.4478 seconds

These results really give an indication of the different uses of both extensions.


XMLReader flew through finding the first element in no time at all while SimpleXML took about the same time to find the first and the last element. The big difference is memory — XMLReader performed about 50 times better than SimpleXML.

Understandably, XMLReader took a lot longer to find the last node as it had to process each node in the document until it found a match. A seek() method on the XMLReader class would obviously be useful here to skip unwanted nodes.


Use cases


For simple parsing such as RSS feed handling and small XML documents SimpleXML is definitely the way to go. It’s easy access to document nodes is a great advantage.

For larger document importing, XMLReader wins hands-down due to its ability to read the document node by node with limited impact on system memory, in fact you can parse XML documents with XMLReader that are larger than the available system memory.


One final tip: avoid building large data structures while processing large XML documents with XMLReader as it defeats the purpose of using XMLReader in the first place — just grab the data needed to perform an operation and skip to the next iteration.


Other Resources



One Response to "Consuming XML, fast, with PHP and XMLReader"

1A Busy, Warm April « FlipStorm April 25th, 2010 05:15

[...] SimpleXML is a fantastic XML parser. Being able to turn XML into native PHP objects, arrays and variables is supremely handy. However, “Simple” is definitely the operative word! When it comes to huge amounts of XML data, SimpleXML just doesn’t cut it! [...]

Share your thoughts