Make your own free website on Tripod.com

This is a summary of the process I used to convert an ezBoard bulletin board to a phpBB bulletin board.

 

  1. Get the raw HTML pages
  2. Convert HTML pages into single text file
  3. Create three separate files from prior single file to match phpBB DB layout
  4. Upload three files into phpBB database
  5. Miscellaneous “clean up”

 

Now for the real work!

 

Step 1) Get the raw HTML pages

 

This was kind of the easy part, although only because I came across an message board where someone had suggested a utility do this. It was TelePort Pro, available at www.download.com (a c-Net site). It is a shareware utility but it allows for 40 projects and each project is limited to 500 downloads. So try it out once or twice to get the exclusions working the way you want, i.e. don’t get images, or links to ezBoard control info like showProfile, showPrevMessage, showNextMessage, etc. So after a little work and like me a little manual work you should be left with only the HTML pages for each forum topic (note that a topic may have multiple posts, you want to know that for later). Before going to step 2) I would suggest renaming each page to match the new forum number in phpBB. For example:

 

ff_nnn.htm – ff = phpBB forum number, nnn=sequence number (i.e. start with 001, then 002, etc.)

 

The above step made it easier for me for two reasons:

  1. Keeping all HTML pages in a common directory for processing (step 2)
  2. Later identification of the new forum number in step 3

 

Step 2) Convert HTML pages into single text file

 

In this step I took the multiple HTML pages (ff_nnn.HTM) in a single directory and read each one using PERL, as I had a PERL server available. You can use certainly use an language that is available, Java, C, Basic. All that is required is to be able to do I/Os, string searches and math. So what I created was a text file with the phpBB forum number (see prior note in step 1), the HTML file name (ff_nnn.HTM), the username, the original date entered, subject and the message data (HTML encoded). I could have gone further and gotten date edited, edited by, user post count, etc. but I figured that those fields were the important ones. So to get those I went through several iterations of pattern searches. For example, the username that made the entry is always (I think J) proceeded by “<span title=class>” and enclosed by “</span>”. You’ll note that sometimes it has an anchor (“<a href” and “</a>”) inside as well, when the user is registered (not a guest). The date entered always follows the username and is enclosed by a font tag, “<font color=#FFFF00>” and “</font>”. The subject follows that and is also enclosed in a span tag, “<span title=class>” and enclosed by “</span>”. Finally the message follows and starts at a horizontal rule tag, “<hr size=1>” and ends at an end italics tag “</i>”. This however brings up the first lesson, many of which I won’t go into here because they have to do with my ignorance of PERL. Which is that you must have all the message before looking for the end. This is due to the fact that the end italics tag maybe part of the message and not the message “terminator”. So I needed to look for a unique terminator. This turned out to be either “</i></td></tr><tr bgcolor=#6A74B4” or </i></td></tr></table>”. The fact there are two is that either a message can end in another message beginning or it can end at the end of a table. Since I forgot to mention that the message are displayed in a table format, left side being user/posting info and the right being subject and message post. So when I constructed may logic to capture username, date, subject and message I made sure I had at least one complete message by checking the string to see if had a beginning “tag” and an ending one. Again what I choose was “<span class=title>” for the begin and either “</i></td></tr><tr bgcolor=#6A74B4” or </i></td></tr></table>” as the end.

 

At this point I also began the process of “cleaning up” the message (and subject noted below), of course I was learning what to do as I went so there were many steps here but I will summarize them for the readers benefit. For example:

 

 

 

 

I also did some manual clean up of the final combined file where I removed some 0x0d characters as they caused problems in the upload to MySQL. I also found that the “\” character was a problem, so since I was going to present my messages in HTML I changed any backslash to &#92; before I uploaded to MySQL. I also removed some remaining “tppabs” elements as they are note standard HTML.

 

At the end of all this processing I was left with a new file that was tab delimited with a CR at the end containing the phpBB forum number (see prior note in step 1), the HTML file name (ff_nnn.HTM), the username, the original date entered, subject and the message data (HTML encoded). This file feed into the next step.

 

Step 3) Create three separate files from prior single file to match phpBB DB layout

 

In this step I needed to learn about the MySQL DB that phpBB uses which ending up meaning three tables, and the one I should have included and will discuss later. The three tables are:

 

 

Each of these tables is related to one another in a hierarchy, so that a topic must exist before a post can exist and a post must exist before post text can exist. In actual fact there are no constraints to enforce the above, but I hope that explains how the tables are related.

 

In any case here is what I figured I had to do with the information I captured previously based on the phpBB DB layouts above:

 

phpBB_topics

Column Name

Type/Size

Assumption

Value Used

topic_id

 

Next sequence in table

Counter starting one greater than last entry

forum_id

 

Forum number from phpBB DB (see phpbb_forum table)

Number that matches phpBB forum that post belongs to

topic_title

Char(60)

First subject for post

Saved subject from first post in group, the reason why the HTML file name is important

topic_poster

 

Number representing user

Choose –1 which is anonymous in my DB, made this step easier

topic_time

 

Number of seconds since 1970 or something like that

Looked at the table after making a “test” post and used that entry for all, used 1068077771 which is Nov 5, 2003 4:16pm

topic_views

 

Number of times topic has been viewed/read

Set to zero

topic_replies

 

Number of replies to original post

Captured count of entries on same HTML page, subtracted one

topic_status

 

Assumed 0=open and 1=closed

Set to zero

topic_vote

 

Assumed 0=none vote and 1=vote

Set to zero

topic_type

 

Assumed 0=regular and ??

Set to zero

topic_first_post_id

 

post_id from first post in the topic

Saved first post_id when processing HTML pages

topic_last_post_id

 

post_id from last post in the topic

Saved last post_id when processing HTML pages

topic_moved_id

 

No clue

Set to zero, like the rest of the entries in my DB

 

 

phpbb_posts

Column Name

Type/Size

Assumption

Value Used

post_id

 

Next sequence in table

Counter starting one greater than last entry

topic_id

 

Match topic_id from phpbb_topic table

Saved value from phpbb_topic topic_id

forum_id

 

Forum number from phpBB DB (see phpbb_forum table)

Number that matches phpBB forum that post belongs to

poster_id

 

Number representing user

Choose –1 which is anonymous in my DB, made this step easier

post_time

 

Number of seconds since 1970 or something like that

Looked at the table after making a “test” post and used that entry for all, used 1068077771 which is Nov 5, 2003 4:16pm

poster_ip

 

Encoded hex value for IP address

Checked table and used a value from my “test” entry

post_username

VarChar(25)

Name of poster

Used username captured from HTML documents

enable_bbcode

 

0=no,1=yes?

Set to zero

enable_html

 

0=no,1=yes?

Set to one, since I had HTML content from the HTML pages

enable_smilies

 

0=no,1=yes?

Set to one as ezBoard uses smilies

enable_sig

 

0=no,1=yes?

Set to zero as the entries I added where all anonymous ones

post_edit_time

 

Again that number of seconds since 1970 or something for last edit date/time

Set to NULL (‘\N’ for import into MySQL)

post_edit_count

 

Times post edited

Set to zero

 

phpbb_posts_text

Column Name

Type/Size

Assumption

Value Used

post_id

 

Sequence number matching phpbb_posts post_id

phpbb_posts post_id

bbcode_uid

 

Not sure but “test” post with bbcode off left this blank

Set to blank as not using bbcodes

post_subject

Char(60)

Specific subject for post/reply

Subject from specific entry in HTML page

post_text

Text

Actual user message

Message text from HTML page, also added original poster name (username) and original post date, didn’t know how to take the mm/dd/yy hh:mm am/pm and calculate seconds since epoch

 

Note that you can get your listing of the phpBB DB by using the MySQL command mysqldump, as such:

 

mysql> mysqldump DATABASE (path to file)\filename

 

http://www.mysql.com/doc/en/mysqldump.html

 

The file that is created is a test file that can be viewed in a browser or text editor.

 

So now the work to do was to read a single file, created in step 2 and create three separate files to match the layout of the afore tables. So here goes some pseudo logic:

 

set post_id_start = last_post_id_in_DB + 1

set topic_id = last_topic_id_in_DB + 1

set reply_count = -1

 

while NOT EOF on mainfile {

read line from mainfile

split values from buffer into forumNbr, postID, username, date, subject, message on tab (‘\t’)

 

if NOT SET current_post_id {

current_postID = postID

post_id = post_id_start

saved_subject = subject

saved_forumNbr = forumNbr

first_post = post_id

} else {

post_id += 1

reply_count = reply_count + 1

}

 

if current_postID NE postID {

write topics_file ( topic_id + "\t" + saved_forumNbr + "\t" + saved_subject + "\t-1\t1068077771\t0\t" + reply_count + "\t0\t0\t0\t" + first_post + "\t" + last_post + "\t0\n")

current_postID = postID

topic_id = topic_id + 1

saved_subject = subject

saved_forumNbr = forumNbr

reply_count = -1

first_post = post_id

last_post = 0

} else {

last_post = post_id

}

 

## note that message includes username and original posting date

## and HTML code to format it

print posts_text_file (post_id + "\t \t" + subject + "\t" + username + "<br>" + $date + "<br>" + message."\n")

 

## note the \\ infront of N, to make sure ‘\N’ is written to the file

write posts_file (post_id + "\t" + topic_id + "\t" + forumNbr + "\t-1\t" +

"1068077771\ta9c709b0\t" + username +."\t0\t1\t1\t0\t" +

"\\N\t0\n")

 

}

 

reply_count = reply_count + 1

 

## write last topic file entry

write topics_file (topic_id + "\t" + saved_forumNbr + "\t" + saved_subject + "\t-1\t" + "1068077771\t0\t" + reply_count + "\t0\t0\t0\t" +

first_post + "\t" + last_post + "\t0\n")

Figure 1

 

So after running the above, maybe with a few tests like me, you’ll have three text files. Each will be tab delimited with ‘\n’ as the end of line. On to step 4!

 

Step 4) Upload three files into phpBB database

 

This is the easiest part of the process, as for my the last step was done in PERL and my knowledge of PERL (at the time of writing J was limited). In any case in this step you will import your three files into the phpBB DB. However I would suggest creating a “test” version of the phpBB DB so you can check your work! Myself, I was lucky enough to be able have a test phpBB to check my results on. Here is the syntax:

 

LOAD DATA INFILE 'c:/apache/htdocs/mytextfile.txt'

INTO TABLE mytable FIELDS TERMINATED BY '\t'

LINES TERMINATED BY '\n';

 

Note that ‘c:/apache/htdocs/mytextfile.txt’ should be replaced with drive/path/filename for your work. As well it is ‘/’ for Windows

 

See http://www.mysql.com/doc/en/Loading_tables.html for more details

 

So run the above command in MySQL and short of any errors your tables should now contain the messages you wanted to import.

 

Note that my results here get several warnings, and as I was on MySQL 4.0.14 I couldn’t display warnings or errors. So I ended up exporting the tables with a select into outfile like so:

 

SELECT a,b,a+b INTO OUTFILE "c:/temp/result.text"

FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'

LINES TERMINATED BY "\n"

FROM test_table;

 

See http://www.mysql.com/doc/en/SELECT.html for details

 

and importing it into MS access to check it (suckie MS J).

 

Once you are satisfied then the only thing left is a clean up step, which is the next and last step.

 

Step 5) Miscellaneous “clean up”

 

The one thing I discovered after importing my files into the phpBB DB, which was why it was nice to have a test phpBB setup, was that the phpbb_forums table also needed to be updated. So I ran a few queries over phpbb_posts to get the information and then used MySQLs batch update to run the created updates. So here is the idea:

 

select forum_id, count(*) from phpbb_posts

group by forum_id

order by forum_id

 

this gave me a count of total posts for each forum

 

select forum_id, count(*) from phpbb_topics

group by forum_id

order by forum_id

 

this gave me a count of total topics for each forum

 

select forum_id, max(post_id) from phpbb_posts

group by forum_id

order by forum_id

 

this gave me the last post_id for each forum

 

Now I took that and created the following batch file for MySQL:

 

# these SQLs will update the phpbb_forums DB

update phpbb_forums set forum_topics = 218 where forum_id = 2;

update phpbb_forums set forum_topics =  49 where forum_id = 4;

update phpbb_forums set forum_topics =  25 where forum_id = 5;

update phpbb_forums set forum_topics =  20 where forum_id = 6;

update phpbb_forums set forum_topics =  31 where forum_id = 7;

update phpbb_forums set forum_topics =  33 where forum_id = 8;

update phpbb_forums set forum_topics =  32 where forum_id = 9;

update phpbb_forums set forum_topics =  25 where forum_id = 10;

update phpbb_forums set forum_topics =  12 where forum_id = 11;

update phpbb_forums set forum_topics =  60 where forum_id = 12;

update phpbb_forums set forum_topics =   8 where forum_id = 13;

update phpbb_forums set forum_topics =   2 where forum_id = 14;

update phpbb_forums set forum_topics =   1 where forum_id = 15;

 

# these SQLs will update the phpbb_forums

update phpbb_forums set forum_posts = 456 where forum_id = 2;

update phpbb_forums set forum_posts =  71 where forum_id = 4;

update phpbb_forums set forum_posts =  48 where forum_id = 5;

update phpbb_forums set forum_posts =  28 where forum_id = 6;

update phpbb_forums set forum_posts =  51 where forum_id = 7;

update phpbb_forums set forum_posts =  97 where forum_id = 8;

update phpbb_forums set forum_posts =  56 where forum_id = 9;

update phpbb_forums set forum_posts =  35 where forum_id = 10;

update phpbb_forums set forum_posts =  13 where forum_id = 11;

update phpbb_forums set forum_posts =  75 where forum_id = 12;

update phpbb_forums set forum_posts =  12 where forum_id = 13;

update phpbb_forums set forum_posts =   7 where forum_id = 14;

update phpbb_forums set forum_posts =   2 where forum_id = 15;

 

# these SQLs will update the phpbb_forums

update phpbb_forums set forum_last_post_id = 608 where forum_id = 2;

update phpbb_forums set forum_last_post_id = 677 where forum_id = 4;

update phpbb_forums set forum_last_post_id = 725 where forum_id = 5;

update phpbb_forums set forum_last_post_id = 753 where forum_id = 6;

update phpbb_forums set forum_last_post_id = 804 where forum_id = 7;

update phpbb_forums set forum_last_post_id = 901 where forum_id = 8;

update phpbb_forums set forum_last_post_id = 957 where forum_id = 9;

update phpbb_forums set forum_last_post_id =  48 where forum_id = 10;

update phpbb_forums set forum_last_post_id =  61 where forum_id = 11;

update phpbb_forums set forum_last_post_id = 136 where forum_id = 12;

update phpbb_forums set forum_last_post_id = 148 where forum_id = 13;

update phpbb_forums set forum_last_post_id = 155 where forum_id = 14;

update phpbb_forums set forum_last_post_id =  12 where forum_id = 15;

Figure 2

 

The only other step I had to do was to enable HTML in the phpBB administrator section and that was that!

 

I hope this helps, I know it would have shorten four weeks of work into one or two for me!!