I am about to design a video conversion service, that is scalable on the conversion side.
The architecture is as follows:
- Webpage for video upload
- When done, a message gets sent out to one of several resizing servers
- The server locates the video, saves it on disk, and converts it to several formats and resolutions
- The resizing server uploads the output to a content server, and messages back that the conversion is done.
Messaging is something I have covered, but right now I am transferring via FTP, and wonder if there is a better way? is there something faster, or more reliable? All the servers will be sitting in the same gigabit switch or neighboring switch, so fast transfer is expected.
EDIT:
The question goes on the server <-> server side of things. The servers are co-located in same LAN, so the security of the interconnection there is not expected to be the main issue.
6
I would use autonomous servers. Each server hosts the uploading frontend, the encoder and the download service. This way you don’t have to transfer files around. To scale, simply add more servers- it sounds like you don’t have any obstacles doing that.
Research if you can stream the process; start encoding the file while it is still uploading, download the result while it is still being encoded. This won’t reduce the cost of any of the operations, but the end user will perceive a significant benefit.
Offer alternatives to HTTP upload if your files are large. If the upload stops for whatever reason, the user must restart the upload from scratch.
Instead of sending the files to the servers, I would suggest you to use a partition on the server which is mounted on all the servers. So we dont have to move the file at all.
4
“Webpage for video upload” will actually upload the video to the web server. That’s just the way HTTPS and an HTML form with a file-upload works. Make sure to set the right content-type on your form tag properly for a file. You cannot send other parameters to the server with the same request as a file upload because of that same content-type parameter. A big problem may be people with big files and/or slow internet that keep your HTTP connection open for 15 minutes or more. Most people’s ISPs have a maximum upload rate of 500Kbps (bits/sec, not bytes/sec) or something really low like that. Long connections make you a target for DOS attacks.
If you want the file to end up somewhere other than your web server, you’ll be writing code on the server side. Your web application will have to pipe/stream it to that different location.
FTP is very slightly faster than HTTP, maybe 5-10% max, but I think my experience shows it’s closer to 1%. FTP might be better for transferring 2 or more files at the same time, I don’t know. I think all modern browsers support FTP natively as well as HTTP. To use another protocol as someone suggested, you need server side code. I’d use HTTP for your first pass.
Someone raised a good point about encryption. You want that if anyone will send confidential or non-public video. It will slow things down somewhat, but probably not that much. HTTPS with a server side certificate will take care of that if you need it.
When transferring large files, you need to use a protocol that ensures that they are not used by the recipient until the copy is complete. Maybe you touch another file, or send a “all-done transferring myFile2321.avi” message. Compressing large files and decompressing them after the transfer is often faster than copying them over a network or the internet. If you pay for bandwith it may be cheaper as well.
So, all in all, those are some details to think about and some suggestions. I don’t see anything wrong with your over-all plan. Take a look at how YouTube does this, because I think they already do everything you are planning, just in a different context or for a different reason.
1
Another option: Consider using a shared storage area, which all the servers can access.
For this, you would need to use a distributed file system. There are several to choose from (NFS, SMB, etc.) – which one is best would be another question :-).
Of course, if you use this route, take care to have some locking mechanism to avoid two servers working on the same file.
2