PHPでHTMLの名前をコンテンツ内容から抽出する

2010年2月15日月曜日

PHPでHTMLの名前をコンテンツ内容から抽出する

面白いコードが出来たので投稿します。

Bloggerの機能でありますが

HTMLの内容から
適切なファイル名を抽出するコードです。

//タイトル
$title = $_POST["art_title"];
//bodyの中身
$html = $_POST["art_content"];

/// html保存名の決定
$striped_content = strip_tags($title." ".$html);
$striped_content = strtolower($striped_content);
$striped_content = str_replace(" ","",$striped_content);

$match = array();
$hit_cnt = preg_match_all('/[a-zA-Z]{3,10}/',$striped_content,$match);

$match[0] = array_merge(array_unique($match[0]));

$first_word = (isset($match[0][0])) ? $match[0][0] : "blog";
$second_word = (isset($match[0][1])) ? $match[0][1] : "article";
$art_id = date("ymdHis").mtime().rand(1,100);

$saveName = "$first_word-$second_word-$art_id.html";

こんな感じです。

形態素解析でやってるわけじゃないので

日本語には対応してません。

あくまで英語単語のみです。

テストを重ねたコードではないので

バグ等あればコメントください！

Bloggerは単語の関連性とかも見て

_(アンダーバー)と-(ハイフン)の使い分けもやってるみたいです。
~~(僕にはこれが限界ですｗ)~~

kgbnBlog - PHPとかJavaScriptとかWEB技術全般の研究日記

Twitterでこの記事についてつぶやく

2010年2月15日月曜日