SAX parsing for cdata tag for large xml file

     Here, I have given the code for parsing or reading large xml files using php.

     Large xml files in php are parsed using this SAX parser. The inner function of php, which gives directly the data of xml, can handle only small xml files (mostly upto 1KB to 50KB). So for parsing large xml files having data like 5MB, 50MB, etc. we have to use this method.

     The code is given as simple as possible. Just enter your xml file path in 5th line. The 30th line gives the switch cases. Each switch case has one tag name of xml file. So if we want to perform any search or value of that tag or some other operation also can be done in that switch case area. From 54th line, we can do ending operation of that xml file. For ex. if we want to put data in table, we can complete table tag there.

     The cdata tag is parsed in characterdata function on line 73. All cdata tag is parsed there and we can perform operation as we want. I have given code for comparing xml tag value with HTTP's header's variable.

     I have given below code and xml file below. Enjoy it. If you have any problem just comment it.

$debug = 0; 
      $file = "./short.xml";  //Name of the file
 global $flag;
    $flag=0;
      # use the 'current' vars to keep track of which tag/attribute 
      # the parser is currently processing 
      $currentTag = "";
      $currentAttribs = "";
    function startElement($parser, $name, $attribs) 
      {
          global $currentTag, $currentAttribs;
          $currentTag = $name;
          $currentAttribs = $attribs;
          # define the HTML to use for the start tag.
          switch ($name) {
      //    case "url":
      //         while (list ($key, $value) = each ($attribs)) {
       //           echo("$key: $value\n");
        //      }
          case "verified":
              break;  
          default:
              echo("\n");
              break;
          }
      }
    function endElement($parser, $name) 
      {
          global $currentTag;
          # output closing HTML tags
              echo("\n");
          # clear current tag variable
          $currentTag = "";
          $currentAttribs = "";
      }   
   # process data between tags
      function characterData($parser, $data) 
      {
          global $currentTag;
          # add HTML tags to the values
    $url=$_GET["url"];  
  if($data==$url){ 
          switch ($currentTag) {
          case "url":
    global $flag;
    $flag=1;          
     echo("$data
");
              break;    
  case "verified":
              //echo("$data");
              break;
          default:
              break;
          }
  }
      }  
    # initialize parser
      $xmlParser = xml_parser_create();
      $caseFold =    xml_parser_get_option($xmlParser, 
      XML_OPTION_CASE_FOLDING);
      $targetEncoding = xml_parser_get_option($xmlParser, 
      XML_OPTION_TARGET_ENCODING);
      if ($debug > 0) {
        echo("Debug is set to: $debug
\n");
        echo("Case folding is set to:
                  $caseFold
\n");
        echo("Target Encoding is set to: 
              $targetEncoding
\n");
      }
      # disable case folding
      if ($caseFold == 1) {
          xml_parser_set_option($xmlParser, 
          XML_OPTION_CASE_FOLDING, false);
      }
   # set callback functions
      xml_set_element_handler( $xmlParser, "startElement", "endElement");
      xml_set_character_data_handler ($xmlParser, "characterData"); 
   # open XML file
      if (!($fp = fopen($file, "r"))) {
          die("Cannot open XML data file: $file");
      }
      # read and parse data
      while ($data = fread($fp, 4096)) {
             # error handling
          if (!xml_parse($xmlParser, $data, feof($fp))) {
              die(sprintf("XML error: %s at line %d",
                   xml_error_string(xml_get_error_code($xmlParser)),
                   xml_get_current_line_number($xmlParser)));
              xml_parser_free($xmlParser);
          }
      }
      # free up the parser
      xml_parser_free($xmlParser);
  if($flag==1)
  {
   echo ("The item found.");
  }
  else
  {
   echo("Sorry! Item not found.");
  }

<?xml version="1.0" encoding="utf-8" ?>
<urllist>
<url><![CDATA[http://us.battle.net.en.zh-vnf.in/login.html?app=wam&amp;ref=https://www.worldofwarcraft.com/account/&amp;eor=0&amp;app=bam/]]></url>
<url><![CDATA[http://a-era.org//wp-content/themes/Nova/temp/aol.html]]></url>
<url><![CDATA[http://al-saqi.com/ali/administrator/templates/temp.html]]></url>
<url><![CDATA[http://www.astav.net/phpFormGenerator/use/index/form1.html]]></url>
<url><![CDATA[http://www.jmcgroup.com.tw/83617C429A994E009BA0B6DFB9916156C8AA27305BBB4AD7B769656766711E4BC8AA27305BBB4AD7B769656766711E/index.htm]]></url>
<url><![CDATA[http://mmpakistan.com/mmpnew/very.co.uk/index.html]]></url>
<url><![CDATA[http://glasgow.servershost.net/~iglesiap/old/pana/wp-content/plugins/aolservice/11my.screeenname@aim.com/my.screeenname@aim.com/my.screenname@aim.com/billingcenter/update/information/logon/]]></url>
<url><![CDATA[http://96.31.30.7/images/index.asp]]></url>
<url><![CDATA[http://sitekey.aol.com.login.psp.myaccount.default.bransonmissourivacationpackages.org/sitekey/sas/signon/do/detect/FY563DFP722FF56DS14FY563DFP722FF56DS14/update.php]]></url>
<url><![CDATA[http://84.14.252.138/icons/image3.php]]></url>
<url><![CDATA[http://www.servicodeapoioaoconsumidor.com.br/plugin/portal/wps/script/templates/GCMRequest.do?page=8488]]></url>
<url><![CDATA[http://www.athmainfosolutions.com/8m5y8k7/index.html]]></url>
<url><![CDATA[http://cielovantagens.org/cielo/cadastro.php]]></url>
<url><![CDATA[http://cielovantagens.org/cielo/]]></url>
<url><![CDATA[http://cielovantagens.org/]]></url>
<url><![CDATA[http://cielocadastrobr.com/cadastro.php]]></url>
<url><![CDATA[http://cielocadastrobr.com/cadastro.html]]></url>
<url><![CDATA[http://cielocadastrobr.com/]]></url>
<url><![CDATA[http://bpol-cartepre-poste-italiane-online-home.osa.pl/?TYPE=33554433&amp;REALMOID=06-67b8b137-8480-11d6-ac6e-009027fd3897&amp;GUID=SMAUTHREASON=0METHOD=GETSMAGENTNAME=-SM-?]]></url>
<url><![CDATA[http://oceanjewelsresort.net/ebay/]]></url>
<url><![CDATA[http://www.xzconsultores.pt/bios//plugins/editors/tinymce/jscripts/tiny_mce/plugins/advhr/langs/empresarial/]]></url>
<url><![CDATA[http://dc2usa.com/old_aol.1.3country/?Login=&amp;Lis=10&amp;LigertID=1993745&amp;us=1]]></url>
<url><![CDATA[http://singin1987.smtp.ru/Elnlogenfranta1.txt]]></url>
</urllist>

Comments

Popular posts from this blog

Upload multiple files using AjaxFileUpload

Install Pear in Wamp Server 2.2 (with php 5.3.8) for windows

ICT dialler installation