Tuesday, May 14, 2013

Using wordpress export data with PHP simpleXML

I had a site running in word press. This was a 256 MB slice and WP 3.2+, I must say (in a relative sense of course) is not light on resources. So I decided to move this site to my own code. That also meant moving the word press data to my own schema. So I took an XML dump using word press export tool and imported it back using my own scripts that use PHP and SimpleXML.

XML from Wordpress export tool has namespaces and multiple elements of same name so I reckoned my skeleton script can be of use to someone. Here we try to grab the  title, publication date, link (permalink), categories, tags and content from original wordpress post.

The code follows



 
error_reporting(-1);
    libxml_use_internal_errors(true);

    function process_post($title,$category,$tags,$createdOn) {
        if(empty($content)) { return ; }
        // process post

    }


    // start:script 
    // wp.xml contains dump of wordpress posts

    if (file_exists('wp.xml')) {
        $doc = simplexml_load_file('wp.xml');

        if($doc === false) {
            echo "Failed loading XML\n";
            foreach(libxml_get_errors() as $error) {
                echo "\t", $error->message;
            }
        }

    } else {
        echo('Failed to open wp.xml.');
        exit ;
    }


    foreach($doc->channel->item as $item) {

        $title = $item->title ;
        // content and other elements can be  wrapped inside 
        // a separate namespace. To deal with such elements we 
        // use item->children on the namespace given in wp.xml 

        $ns_wp = $item->children("http://wordpress.org/export/1.1/");
        $attachment = $ns_wp->attachment_url ;

        if(empty($attachment)) {
            $ns_content = $item->children("http://purl.org/rss/1.0/modules/content/");
            $content =  (string) $ns_content->encoded;
            $link = $item->link ;

            $pubDate = $item->pubDate ;
            $createdOn = date("Y-m-d", strtotime($pubDate));

            $tags = "" ;
            $category = "" ;

            // tags and category
            // we can have multiple category elements inside an item

            foreach($item->category as $elemCategory) { 

                if(strcmp($elemCategory["domain"],"category") == 0 ) {
                    $category = $category." ".$elemCategory["nicename"] ;
                } 

                if(strcmp($elemCategory["domain"],"post_tag") == 0 ) {
                    $tags = $tags." ".$elemCategory["nicename"] ;
                } 
            }

            printf("title = %s, category = %s ,tags = %s , pub_date = %s  \n",$title,$category,$tags,$createdOn);
            process_post($title,$category,$tags,$createdOn);
        }

    }





© Life of a third world developer
Maira Gall