Here, I have given the code for parsing or reading large xml files using php.
Large xml files in
php are parsed using this
SAX parser. The inner function of php, which gives directly the data of xml, can handle only small xml files (mostly upto 1KB to 50KB). So for parsing large xml files having data like 5MB, 50MB, etc. we have to use this method.
The code is given as simple as possible. Just enter your xml file path in 5th line. The 30th line gives the switch cases. Each switch case has one tag name of xml file. So if we want to perform any search or value of that tag or some other operation also can be done in that switch case area. From 54th line, we can do ending operation of that xml file. For ex. if we want to put data in table, we can complete table tag there.
The cdata tag is parsed in
characterdata function on line 73. All cdata tag is parsed there and we can perform operation as we want. I have given code for comparing xml tag value with
HTTP's header's variable.
I have given below code and xml file below. Enjoy it. If you have any problem just comment it.
$debug = 0;
$file = "./short.xml"; //Name of the file
global $flag;
$flag=0;
# use the 'current' vars to keep track of which tag/attribute
# the parser is currently processing
$currentTag = "";
$currentAttribs = "";
function startElement($parser, $name, $attribs)
{
global $currentTag, $currentAttribs;
$currentTag = $name;
$currentAttribs = $attribs;
# define the HTML to use for the start tag.
switch ($name) {
// case "url":
// while (list ($key, $value) = each ($attribs)) {
// echo("$key: $value\n");
// }
case "verified":
break;
default:
echo("\n");
break;
}
}
function endElement($parser, $name)
{
global $currentTag;
# output closing HTML tags
echo("\n");
# clear current tag variable
$currentTag = "";
$currentAttribs = "";
}
# process data between tags
function characterData($parser, $data)
{
global $currentTag;
# add HTML tags to the values
$url=$_GET["url"];
if($data==$url){
switch ($currentTag) {
case "url":
global $flag;
$flag=1;
echo("$data
");
break;
case "verified":
//echo("$data");
break;
default:
break;
}
}
}
# initialize parser
$xmlParser = xml_parser_create();
$caseFold = xml_parser_get_option($xmlParser,
XML_OPTION_CASE_FOLDING);
$targetEncoding = xml_parser_get_option($xmlParser,
XML_OPTION_TARGET_ENCODING);
if ($debug > 0) {
echo("Debug is set to: $debug
\n");
echo("Case folding is set to:
$caseFold
\n");
echo("Target Encoding is set to:
$targetEncoding
\n");
}
# disable case folding
if ($caseFold == 1) {
xml_parser_set_option($xmlParser,
XML_OPTION_CASE_FOLDING, false);
}
# set callback functions
xml_set_element_handler( $xmlParser, "startElement", "endElement");
xml_set_character_data_handler ($xmlParser, "characterData");
# open XML file
if (!($fp = fopen($file, "r"))) {
die("Cannot open XML data file: $file");
}
# read and parse data
while ($data = fread($fp, 4096)) {
# error handling
if (!xml_parse($xmlParser, $data, feof($fp))) {
die(sprintf("XML error: %s at line %d",
xml_error_string(xml_get_error_code($xmlParser)),
xml_get_current_line_number($xmlParser)));
xml_parser_free($xmlParser);
}
}
# free up the parser
xml_parser_free($xmlParser);
if($flag==1)
{
echo ("The item found.");
}
else
{
echo("Sorry! Item not found.");
}
<?xml version="1.0" encoding="utf-8" ?>
<urllist>
<url><![CDATA[http://us.battle.net.en.zh-vnf.in/login.html?app=wam&ref=https://www.worldofwarcraft.com/account/&eor=0&app=bam/]]></url>
<url><![CDATA[http://a-era.org//wp-content/themes/Nova/temp/aol.html]]></url>
<url><![CDATA[http://al-saqi.com/ali/administrator/templates/temp.html]]></url>
<url><![CDATA[http://www.astav.net/phpFormGenerator/use/index/form1.html]]></url>
<url><![CDATA[http://www.jmcgroup.com.tw/83617C429A994E009BA0B6DFB9916156C8AA27305BBB4AD7B769656766711E4BC8AA27305BBB4AD7B769656766711E/index.htm]]></url>
<url><![CDATA[http://mmpakistan.com/mmpnew/very.co.uk/index.html]]></url>
<url><![CDATA[http://glasgow.servershost.net/~iglesiap/old/pana/wp-content/plugins/aolservice/11my.screeenname@aim.com/my.screeenname@aim.com/my.screenname@aim.com/billingcenter/update/information/logon/]]></url>
<url><![CDATA[http://96.31.30.7/images/index.asp]]></url>
<url><![CDATA[http://sitekey.aol.com.login.psp.myaccount.default.bransonmissourivacationpackages.org/sitekey/sas/signon/do/detect/FY563DFP722FF56DS14FY563DFP722FF56DS14/update.php]]></url>
<url><![CDATA[http://84.14.252.138/icons/image3.php]]></url>
<url><![CDATA[http://www.servicodeapoioaoconsumidor.com.br/plugin/portal/wps/script/templates/GCMRequest.do?page=8488]]></url>
<url><![CDATA[http://www.athmainfosolutions.com/8m5y8k7/index.html]]></url>
<url><![CDATA[http://cielovantagens.org/cielo/cadastro.php]]></url>
<url><![CDATA[http://cielovantagens.org/cielo/]]></url>
<url><![CDATA[http://cielovantagens.org/]]></url>
<url><![CDATA[http://cielocadastrobr.com/cadastro.php]]></url>
<url><![CDATA[http://cielocadastrobr.com/cadastro.html]]></url>
<url><![CDATA[http://cielocadastrobr.com/]]></url>
<url><![CDATA[http://bpol-cartepre-poste-italiane-online-home.osa.pl/?TYPE=33554433&REALMOID=06-67b8b137-8480-11d6-ac6e-009027fd3897&GUID=SMAUTHREASON=0METHOD=GETSMAGENTNAME=-SM-?]]></url>
<url><![CDATA[http://oceanjewelsresort.net/ebay/]]></url>
<url><![CDATA[http://www.xzconsultores.pt/bios//plugins/editors/tinymce/jscripts/tiny_mce/plugins/advhr/langs/empresarial/]]></url>
<url><![CDATA[http://dc2usa.com/old_aol.1.3country/?Login=&Lis=10&LigertID=1993745&us=1]]></url>
<url><![CDATA[http://singin1987.smtp.ru/Elnlogenfranta1.txt]]></url>
</urllist>
Comments
Post a Comment