I have a video on demand style site that hosts series of videos under different modules. However with the way I have designed the database it is proving to be very slow. I have asked this question before and someone suggested indexing, but i cannot seem to get my head around it. But I would like someone to help with the structure of the database here to see if it can be improved. The core table is Videos:
ID bigint(20) (primary key, auto-increment)
pID text
airdate text
title text
subject mediumtext
url mediumtext
mID int(11)
vID int(11)
sID int(11)
pID is a unique 5 digit string to each video that is a shorthand identifier. Airdate is the TS, (stored in text format, right there maybe I should change that to TIMESTAMP AUTO UPDATE), title is self explanatory, subject is self explanatory, url is the hard link on the site to the video, mID is joined to another table for the module title, vID is joined to another table for the language of the video, (english, russian, etc) and sID is the summary for the module, a paragraph stored in an external database.
The slowest part of the website is the logging part of it. I store the data in another table called ‘Hits’:
id mediumint(10) (primary key, auto-increment)
progID text
ts int(10)
Again, here (this was all made a while ago) but my Timestamp (ts) is an INT instead of ON UPDATE CURRENT TIMESTAMP, which I guess it should be. However This table is now 47,492 rows long and the script that I wrote to process it is very very slow, so slow in fact that it times out. A row is added to this table each time a user clicks ‘Play’ on the website and then so the progID is the same as the pID, and it logs the php time() timestamp in ts
.
Basically I load the entire database of ‘Hits’ into an array and count the hits in each day using the TS column. I am guessing (i’m quite slow at all this, but I had no idea this would happen when I built the thing) that this is possibly the worst way to go about this.
So my questions are as follows:
- Is there a better way of structuring the ‘Videos’ table, is so, what do you suggest?
- Is there a better way of structuring ‘hits’, if so, please help/tell me!
Or is it the fact that my tables are fine and the PHP coding is crappy?
7
Basically I load the entire database of ‘Hits’ into an array and count the hits in each day using the TS column.
You want speed, De-normalize.
This is one of those rare occasions when De-normalization is OK. to slow you site even a little bit for something as minor as hit counter is Crazy.
So create a Table like such:
Video bigint(20)
Date (Date Only)
HitCounter int
Each time you have a hit Update this table, adding one to the HitCounter
. (Insert a new record with the first hit of the day). Now you can always get your counts in an instant.
Important Note: Anytime you de-normalize like this make sure you have a routine that can repair the data in your summary table. This routine should run regularly.
As others have pointed out, because you have so little data, I am in agreement with them in doubting the counter is the cause of your performance problem. The solution above is most likely similar to what YouTube is doing for counts.
-
Why do you have a string pID when you already have a unique integer identifier in ID? The ID column, declared as a primary key is probably already indexed, and integer comparisons are much faster than string ones. Why can’t you just use ID instead of pID everywhere and drop the pID column?
-
I think #1 above is a minor error. Have you timed your application to prove to yourself exactly where the slowness is? Unless you are absolutely certain what is slow, you are wasting your time trying to speed up the wrong things.
In Java I use:
long startMs = System.currentTimeMs();
logger.info("Time before doing first thing: " + System.currentTimeMs() - startMs);
// do first thing.
logger.info("Time before doing second thing: " + System.currentTimeMs() - startMs);
// do second thing.
logger.info("Time after doing second thing: " + System.currentTimeMs() - startMs);
// etc.
logger.info("Total time: " + System.currentTimeMs() - startMs);
Run the slow part of your program at least 5 times, maybe 10 times to be sure that your test system isn’t busy doing something else and throwing your timings off. Write down your timings, specifically what is slow, exactly how slow, AND the total time. You don’t want to speed up one part by slowing down another part so much that you actually make the whole thing slower.
I can see a couple of things you should look at.
What is pID used for? If it’s just to link the 2 tables an int will be a lot faster and simplify the videos table.
You’re right, AirDate should be stored as a timestamp/datetime field, whenever possible you want to keep the text fields to the minimum required. I’d even consider moving title, subject (& a new description field) into a separate table (this keeps the main videos table as small and fast as possible):
ID bigint (same as videos.ID field)
title mediumtext
subject mediumtext
description mediumtext
For the hits table, I’d do exactly as @Morons suggested, possibly with an additional table to log each play separately (if this is important):
ID bigint (same as videos.ID field)
playedat timestamp/datetime
playedby int (if userid is known) / text (IP address if videos can be played anonymously)
@Morons suggestion will give the biggest speed increase though. As a basic rule, whenever possible, push as much data logic onto the DB server, it’s what they’re designed for and they’re very good (fast) at it, also minimize the number of times you hit the DB and the amount of information you’re transferring back over the network.