EgonDev

June 27, 2008

Adding a Parser in PHP

Filed under: What I Did Today — Tags: — egoncasteel @ 3:34 pm

What is Parsing? (You can click this link to see the Wikipedia article)
In short parsing is taking chunks of data and replacing them with something else, based on the chunk you took out.

So why is this useful you may ask. Lets take a look at a problem the users of OpsWeb (The Intranet site I develop) had. One of the largest parts of OpsWeb is the article section. There are over 600 technical articles and more being added every week. A lot of times users will want to link from one article to another. To do this they had to add a normal HTML link. Even thou my user population is made of IT staff and programmers most of them have little or no experience with HTML. This made linking hard, and it should be easy. We want lots of links to help connect information together.

Here is the solution:
From now on to add a link to an article you do not use HTML. You now write the link as [[article#]], or [[article#:link text]].

Examples:
[[530]] When the page is displayed it is shown as OpsWeb Improvements and Changes
[[530:List of Changes]] -> List of Changes

Both of the examples make a link to article 530, and the only thing the user needs to know is the number of the article. (The users are very aware of the article numbers as they are used as a short hand reference on process flowcharts and emails. Example “Hay Bob i need you to preform the reset procedure from article 126”)

How to write a simple text parser in PHP

Enough into on to the good stuff. Lets look at some code.

public function body_parser($text){
//paser to prossess body text of an article
$result = preg_replace_callback("/\[\[(\d+)\]\]/", array($this, 'p_format_link'), $text);
$result = preg_replace_callback("/\[\[(\d+):(.*)\]\]/", array($this, 'p_format_link_with_text'), $result);
return $result;
}
public function p_format_link($given){
return
"<a href="http://egondev.com/opsweb/articles/articles_control.php?action=display_one&amp;record_id=">"
.$this-&gt;get_title_from_id_DB($given[1])
."</a>";
}
public function p_format_link_with_text($given){
return
"<a href="http://egondev.com/opsweb/articles/articles_control.php?action=display_one&amp;record_id=">"
.$given[2]
."</a>";
}

The Function body_parser finds all of our link tags (the [[article#]] bits) using regular expressions and then pass the links to one of the other 2 functions to be formatted in to HTML. Then it puts the formatted text into the body in place of our link tag.
Example:
$body = “hello I am a short article with a link to [[530]]”;
$body = body_parser($body);
echo $body;

>> “hello I am a short article with a link to OpsWeb Improvements and Changes”

The heart of this thing is the preg_replace_callback function. What it does is it searches a string using a regular expression. When it finds a match in that string it calls the function given as the second param and pass the regular expression match to the function. Then it will replace the matched part of the sting with the return of the function.

mixed preg_replace_callback ( mixed $pattern , callback $callback , mixed $subject [, int $limit [, int &$count ]] )

Lets take a look at one of the preg_replace_callback lines from my parser
$result = preg_replace_callback(“/\[\[(\d+)\]\]/”, array($this, ‘p_format_link’), $text);

  • “/\[\[(\d+)\]\]/” will match any chunk of text starting with [[ followed by 1 or more digits and ending with ]]
  • The (\d+) part of the expression will give us just the digits as a sub part of the matched text.
  • array($this, ‘p_format_link’) this is the call back function. The syntax looks odd because all 3 of these functions are in a class to reference the call back function correctly you need to use this form. With out a class structure it would just be the name of the function you want to pass the match to. No ()s.
  • $text is the sting we a passing to the preg_replace_callback function.
  • $result is the new string with the matches subsited with the return form the call back function.

May 27, 2008

Wieghted Searches in MySQL

Filed under: What I Did Today — Tags: — egoncasteel @ 9:45 pm

MySQL has the Match Against syntax for preforming full text searches, but what if you have multiple fields and would like to weight there values differently. A key word in a title has more meaning then a key word in the body right. Below is the function I am using now on the article component of OpsWeb, the intranet site I develop. I will step you through how it works.

public function search($search_string){
	//gets the adodb wrapper if it hasn't been loaded already
	require_once($this->paths['adodb_path']->get_primary().'adodb.inc.php');
	$DB = &ADONewConnection($this->DNS);

	$query = "
	SELECT
		ROWID
		,title
		,description
		,MATCH (title) AGAINST('$search_string' )*3
		+ MATCH (description) AGAINST('$search_string' )*2
		+ MATCH (body) AGAINST('$search_string' ) score
	FROM articles
	WHERE  MATCH (title, description, body) AGAINST('$search_string' IN BOOLEAN MODE)
	AND retired != '1'
	ORDER BY score DESC;
	";

	$results = $DB->Execute($query);

	if ($results){
		return $results->GetRows();
	}else{
		return array(array(description => "No Results Found for '$search_string'"));
	}
}

In the query string you can see that I am retrieving normal 3 fields and one computed field called score. You can learn more about the match against syntax here. Simply put the the server looks in the match field for things that look like the against string and returns a score based on how well the string matched and how unique the words are. All I am doing here is adding some multipliers to give fields different weights, title worth more points the the body and such. This gives me my weighted score.

The next part of the query string to note is the where clause. Here I have another match against statement. This one is using the boolean mode switch. Boolean searches have a lot of additional options you can add to a search to narrow down your results. The problem is that MySQL doesn’t return a score for boolean searches. All matches have a score of 1 if that row is a match. By adding this extra match against statement to the where clause we preform the weighted search only against a sub set of rows that match the boolean search, and keep that ability to narrow the search while still giving well scored results.

With this new search the results returned have become a lot better.

November 21, 2007

The rest of Skyrates… for now

Filed under: What I Did Today — Tags: , , — egoncasteel @ 5:45 pm

The Cobalt Report is complete. I just finished putting is the last of the graphs. Now users have access to the information on trends.

I switched to jpgraph. The PHP/SWF module wasn’t able to handle all the data points. jpgraph may lot be as pretty and it dosen’t have animations, but it is really robust. Also the documataion is extreamly good. If you need to do graphs in PHP check it out.

November 14, 2007

Still More Skyrates

Filed under: What I Did Today — Tags: , — egoncasteel @ 4:58 pm

A lot of works has been done on Skyrates since my last post on the subject. It is now logging data every 30min from the influence page instead of reading it live. This has sped up the site noticeably, and will  keep me out of trouble with the Skyrates Devs, Hi guys :). Also the first round of charts are in. I will add line graphs to show trends after the missions are turned back on in Skyrates. Right now they would all be strait lines.

The charts are made by PHP/SWF charts. A cool script that turns PHP arrays and converts them into flash animations.

November 8, 2007

On page links for each letter.

Filed under: What I Did Today — Tags: , — egoncasteel @ 3:35 pm

I did a bit of work on Opsweb yesterday that was neat. I have a page that list our security codes, and it was requested that there be links along the top for each letter going to that area of the list. Well there are a couple problems with that.

  • You can get the first letter of a list item really easy, but if you just stuck that in an anchor tag it would be messy even if it did work.
  • If you go looking for the next letter what happens if your list doesn’t include an item starting with that letter?

here is what I added to my class:

	public function alpha_anchor($title){
		$letter = strtoupper(substr($title, 0, 1));
		$search_result = array_search($letter, $this->alpha);
		if (!($search_result === false)){
			array_splice($this->alpha, $search_result, 1);
			return "<a name='$letter'/>";
		}
		return;
	}

public final  function __construct() {
	$this->alpha = explode(' ','A B C D E F G H I J K L M N O P Q R S T U V W X Y Z');

So when looping through the array returned from your DB query you run each title through the alpha_anchor method and only the first title of each letter gets an anchor tag.

November 3, 2007

Cobalt Intelligence Corps

Filed under: What I Did Today — Tags: , — egoncasteel @ 3:56 pm

So I play this game called Skyrates. It is Flash based but it is like an mmo. You have an account and a faction, skills, upgrades on your plane, and so on. Well today my faction lost several islands. So in response to this I spent the morning coding the Cobalt Report.  It fetches the page on the Skyrates web site that list what factions influence at each of the 38 islands is, and spits out a report telling us where the best islands to attack are and where we need to defend.

I would give you the link but it is for blue eyes only. Top secret.

What did I learn…

All sorts of regular expression and array manipulation functions are need to take html from a web site not meant to be used as input data and  turn in to something you can use.

Create a free website or blog at WordPress.com.