This is a summary of the process I used to convert an ezBoard bulletin board to a phpBB bulletin board.
Now for the real work!
This was kind of the easy part, although only because I came across an message board where someone had suggested a utility do this. It was TelePort Pro, available at www.download.com (a c-Net site). It is a shareware utility but it allows for 40 projects and each project is limited to 500 downloads. So try it out once or twice to get the exclusions working the way you want, i.e. don’t get images, or links to ezBoard control info like showProfile, showPrevMessage, showNextMessage, etc. So after a little work and like me a little manual work you should be left with only the HTML pages for each forum topic (note that a topic may have multiple posts, you want to know that for later). Before going to step 2) I would suggest renaming each page to match the new forum number in phpBB. For example:
ff_nnn.htm – ff = phpBB forum number, nnn=sequence number (i.e. start with 001, then 002, etc.)
The above step made it easier for me for two reasons:
In this step I took the multiple HTML pages (ff_nnn.HTM) in a single directory and read each one using PERL, as I had a PERL server available. You can use certainly use an language that is available, Java, C, Basic. All that is required is to be able to do I/Os, string searches and math. So what I created was a text file with the phpBB forum number (see prior note in step 1), the HTML file name (ff_nnn.HTM), the username, the original date entered, subject and the message data (HTML encoded). I could have gone further and gotten date edited, edited by, user post count, etc. but I figured that those fields were the important ones. So to get those I went through several iterations of pattern searches. For example, the username that made the entry is always (I think J) proceeded by “<span title=class>” and enclosed by “</span>”. You’ll note that sometimes it has an anchor (“<a href” and “</a>”) inside as well, when the user is registered (not a guest). The date entered always follows the username and is enclosed by a font tag, “<font color=#FFFF00>” and “</font>”. The subject follows that and is also enclosed in a span tag, “<span title=class>” and enclosed by “</span>”. Finally the message follows and starts at a horizontal rule tag, “<hr size=1>” and ends at an end italics tag “</i>”. This however brings up the first lesson, many of which I won’t go into here because they have to do with my ignorance of PERL. Which is that you must have all the message before looking for the end. This is due to the fact that the end italics tag maybe part of the message and not the message “terminator”. So I needed to look for a unique terminator. This turned out to be either “</i></td></tr><tr bgcolor=#6A74B4” or </i></td></tr></table>”. The fact there are two is that either a message can end in another message beginning or it can end at the end of a table. Since I forgot to mention that the message are displayed in a table format, left side being user/posting info and the right being subject and message post. So when I constructed may logic to capture username, date, subject and message I made sure I had at least one complete message by checking the string to see if had a beginning “tag” and an ending one. Again what I choose was “<span class=title>” for the begin and either “</i></td></tr><tr bgcolor=#6A74B4” or </i></td></tr></table>” as the end.
At this point I also began the process of “cleaning up” the message (and subject noted below), of course I was learning what to do as I went so there were many steps here but I will summarize them for the readers benefit. For example:
I also did some manual clean up of the final combined file where I removed some 0x0d characters as they caused problems in the upload to MySQL. I also found that the “\” character was a problem, so since I was going to present my messages in HTML I changed any backslash to \ before I uploaded to MySQL. I also removed some remaining “tppabs” elements as they are note standard HTML.
At the end of all this processing I was left with a new file that was tab delimited with a CR at the end containing the phpBB forum number (see prior note in step 1), the HTML file name (ff_nnn.HTM), the username, the original date entered, subject and the message data (HTML encoded). This file feed into the next step.
In this step I needed to learn about the MySQL DB that phpBB uses which ending up meaning three tables, and the one I should have included and will discuss later. The three tables are:
Each of these tables is related to one another in a hierarchy, so that a topic must exist before a post can exist and a post must exist before post text can exist. In actual fact there are no constraints to enforce the above, but I hope that explains how the tables are related.
In any case here is what I figured I had to do with the information I captured previously based on the phpBB DB layouts above:
phpBB_topics
|
Column Name |
Type/Size |
Assumption |
Value Used |
|
topic_id |
|
Next sequence in table |
Counter starting one greater than last entry |
|
forum_id |
|
Forum number from phpBB DB (see phpbb_forum table) |
Number that matches phpBB forum that post belongs to |
|
topic_title |
Char(60) |
First subject for post |
Saved subject from first post in group, the reason why the HTML file name is important |
|
topic_poster |
|
Number representing user |
Choose –1 which is anonymous in my DB, made this step easier |
|
topic_time |
|
Number of seconds since 1970 or something like that |
Looked at the table after making a “test” post and used that entry for all, used 1068077771 which is Nov 5, 2003 4:16pm |
|
topic_views |
|
Number of times topic has been viewed/read |
Set to zero |
|
topic_replies |
|
Number of replies to original post |
Captured count of entries on same HTML page, subtracted one |
|
topic_status |
|
Assumed 0=open and 1=closed |
Set to zero |
|
topic_vote |
|
Assumed 0=none vote and 1=vote |
Set to zero |
|
topic_type |
|
Assumed 0=regular and ?? |
Set to zero |
|
topic_first_post_id |
|
post_id from first post in the topic |
Saved first post_id when processing HTML pages |
|
topic_last_post_id |
|
post_id from last post in the topic |
Saved last post_id when processing HTML pages |
|
topic_moved_id |
|
No clue |
Set to zero, like the rest of the entries in my DB |
phpbb_posts
|
Column Name |
Type/Size |
Assumption |
Value Used |
|
post_id |
|
Next sequence in table |
Counter starting one greater than last entry |
|
topic_id |
|
Match topic_id from phpbb_topic table |
Saved value from phpbb_topic topic_id |
|
forum_id |
|
Forum number from phpBB DB (see phpbb_forum table) |
Number that matches phpBB forum that post belongs to |
|
poster_id |
|
Number representing user |
Choose –1 which is anonymous in my DB, made this step easier |
|
post_time |
|
Number of seconds since 1970 or something like that |
Looked at the table after making a “test” post and used that entry for all, used 1068077771 which is Nov 5, 2003 4:16pm |
|
poster_ip |
|
Encoded hex value for IP address |
Checked table and used a value from my “test” entry |
|
post_username |
VarChar(25) |
Name of poster |
Used username captured from HTML documents |
|
enable_bbcode |
|
0=no,1=yes? |
Set to zero |
|
enable_html |
|
0=no,1=yes? |
Set to one, since I had HTML content from the HTML pages |
|
enable_smilies |
|
0=no,1=yes? |
Set to one as ezBoard uses smilies |
|
enable_sig |
|
0=no,1=yes? |
Set to zero as the entries I added where all anonymous ones |
|
post_edit_time |
|
Again that number of seconds since 1970 or something for last edit date/time |
Set to NULL (‘\N’ for import into MySQL) |
|
post_edit_count |
|
Times post edited |
Set to zero |
phpbb_posts_text
|
Column Name |
Type/Size |
Assumption |
Value Used |
|
post_id |
|
Sequence number matching phpbb_posts post_id |
phpbb_posts post_id |
|
bbcode_uid |
|
Not sure but “test” post with bbcode off left this blank |
Set to blank as not using bbcodes |
|
post_subject |
Char(60) |
Specific subject for post/reply |
Subject from specific entry in HTML page |
|
post_text |
Text |
Actual user message |
Message text from HTML page, also added original poster name (username) and original post date, didn’t know how to take the mm/dd/yy hh:mm am/pm and calculate seconds since epoch |
Note that you can get your listing of the phpBB DB by using the MySQL command mysqldump, as such:
mysql> mysqldump DATABASE (path to file)\filename
http://www.mysql.com/doc/en/mysqldump.html
The file that is created is a test file that can be viewed in a browser or text editor.
So now the work to do was to read a single file, created in step 2 and create three separate files to match the layout of the afore tables. So here goes some pseudo logic:
set post_id_start = last_post_id_in_DB + 1
set topic_id = last_topic_id_in_DB + 1
set reply_count = -1
while NOT EOF on mainfile {
read line from mainfile
split values from buffer into forumNbr, postID, username, date, subject, message on tab (‘\t’)
if NOT SET current_post_id {
current_postID = postID
post_id = post_id_start
saved_subject = subject
saved_forumNbr = forumNbr
first_post = post_id
} else {
post_id += 1
reply_count = reply_count + 1
}
if current_postID NE postID {
write topics_file ( topic_id + "\t" + saved_forumNbr + "\t" + saved_subject + "\t-1\t1068077771\t0\t" + reply_count + "\t0\t0\t0\t" + first_post + "\t" + last_post + "\t0\n")
current_postID = postID
topic_id = topic_id + 1
saved_subject = subject
saved_forumNbr = forumNbr
reply_count = -1
first_post = post_id
last_post = 0
} else {
last_post = post_id
}
## note that message includes username and original posting date
## and HTML code to format it
print posts_text_file (post_id + "\t \t" + subject + "\t" + username + "<br>" + $date + "<br>" + message."\n")
## note the \\ infront of N, to make sure ‘\N’ is written to the file
write posts_file (post_id + "\t" + topic_id + "\t" + forumNbr + "\t-1\t" +
"1068077771\ta9c709b0\t" + username +."\t0\t1\t1\t0\t" +
"\\N\t0\n")
}
reply_count = reply_count + 1
## write last topic file entry
write topics_file (topic_id + "\t" + saved_forumNbr + "\t" + saved_subject + "\t-1\t" + "1068077771\t0\t" + reply_count + "\t0\t0\t0\t" +
first_post + "\t" + last_post + "\t0\n")
Figure 1
So after running the above, maybe with a few tests like me, you’ll have three text files. Each will be tab delimited with ‘\n’ as the end of line. On to step 4!
This is the easiest part of the process, as for my the last step was done in PERL and my knowledge of PERL (at the time of writing J was limited). In any case in this step you will import your three files into the phpBB DB. However I would suggest creating a “test” version of the phpBB DB so you can check your work! Myself, I was lucky enough to be able have a test phpBB to check my results on. Here is the syntax:
LOAD DATA INFILE 'c:/apache/htdocs/mytextfile.txt'
INTO TABLE mytable FIELDS TERMINATED BY '\t'
LINES TERMINATED BY '\n';
Note that ‘c:/apache/htdocs/mytextfile.txt’ should be replaced with drive/path/filename for your work. As well it is ‘/’ for Windows
See http://www.mysql.com/doc/en/Loading_tables.html for more details
So run the above command in MySQL and short of any errors your tables should now contain the messages you wanted to import.
Note that my results here get several warnings, and as I was on MySQL 4.0.14 I couldn’t display warnings or errors. So I ended up exporting the tables with a select into outfile like so:
SELECT a,b,a+b INTO OUTFILE "c:/temp/result.text"
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"'
LINES TERMINATED BY "\n"
FROM test_table;
See http://www.mysql.com/doc/en/SELECT.html for details
and importing it into MS access to check it (suckie MS J).
Once you are satisfied then the only thing left is a clean up step, which is the next and last step.
The one thing I discovered after importing my files into the phpBB DB, which was why it was nice to have a test phpBB setup, was that the phpbb_forums table also needed to be updated. So I ran a few queries over phpbb_posts to get the information and then used MySQLs batch update to run the created updates. So here is the idea:
select forum_id, count(*) from phpbb_posts
group by forum_id
order by forum_id
this gave me a count of total posts for each forum
select forum_id, count(*) from phpbb_topics
group by forum_id
order by forum_id
this gave me a count of total topics for each forum
select forum_id, max(post_id) from phpbb_posts
group by forum_id
order by forum_id
this gave me the last post_id for each forum
Now I took that and created the following batch file for MySQL:
# these
SQLs will update the phpbb_forums DB
update
phpbb_forums set forum_topics = 218 where forum_id = 2;
update
phpbb_forums set forum_topics = 49
where forum_id = 4;
update
phpbb_forums set forum_topics = 25
where forum_id = 5;
update
phpbb_forums set forum_topics = 20
where forum_id = 6;
update
phpbb_forums set forum_topics = 31
where forum_id = 7;
update
phpbb_forums set forum_topics = 33
where forum_id = 8;
update
phpbb_forums set forum_topics = 32
where forum_id = 9;
update
phpbb_forums set forum_topics = 25
where forum_id = 10;
update
phpbb_forums set forum_topics = 12
where forum_id = 11;
update
phpbb_forums set forum_topics = 60
where forum_id = 12;
update
phpbb_forums set forum_topics = 8
where forum_id = 13;
update
phpbb_forums set forum_topics = 2
where forum_id = 14;
update
phpbb_forums set forum_topics = 1
where forum_id = 15;
# these
SQLs will update the phpbb_forums
update
phpbb_forums set forum_posts = 456 where forum_id = 2;
update
phpbb_forums set forum_posts = 71 where
forum_id = 4;
update
phpbb_forums set forum_posts = 48 where
forum_id = 5;
update
phpbb_forums set forum_posts = 28 where
forum_id = 6;
update
phpbb_forums set forum_posts = 51 where
forum_id = 7;
update
phpbb_forums set forum_posts = 97 where
forum_id = 8;
update
phpbb_forums set forum_posts = 56 where
forum_id = 9;
update
phpbb_forums set forum_posts = 35 where
forum_id = 10;
update
phpbb_forums set forum_posts = 13 where
forum_id = 11;
update
phpbb_forums set forum_posts = 75 where
forum_id = 12;
update
phpbb_forums set forum_posts = 12 where
forum_id = 13;
update
phpbb_forums set forum_posts = 7 where
forum_id = 14;
update
phpbb_forums set forum_posts = 2 where
forum_id = 15;
# these
SQLs will update the phpbb_forums
update
phpbb_forums set forum_last_post_id = 608 where forum_id = 2;
update
phpbb_forums set forum_last_post_id = 677 where forum_id = 4;
update
phpbb_forums set forum_last_post_id = 725 where forum_id = 5;
update
phpbb_forums set forum_last_post_id = 753 where forum_id = 6;
update
phpbb_forums set forum_last_post_id = 804 where forum_id = 7;
update
phpbb_forums set forum_last_post_id = 901 where forum_id = 8;
update
phpbb_forums set forum_last_post_id = 957 where forum_id = 9;
update
phpbb_forums set forum_last_post_id =
48 where forum_id = 10;
update
phpbb_forums set forum_last_post_id =
61 where forum_id = 11;
update
phpbb_forums set forum_last_post_id = 136 where forum_id = 12;
update
phpbb_forums set forum_last_post_id = 148 where forum_id = 13;
update
phpbb_forums set forum_last_post_id = 155 where forum_id = 14;
update
phpbb_forums set forum_last_post_id =
12 where forum_id = 15;
Figure 2
The only other step I had to do was to enable HTML in the phpBB administrator section and that was that!
I hope this helps, I know it would have shorten four weeks of work into one or two for me!!