Python in Big Data?

Can python be efficiently implemented in big data field? To be precise I am building an web app that analyses really big data in medical health care field consisting of medical history and huge personal information. I need some advice on how to handle very big data in python efficiently and with high performance. Also are their some open source packages in python available which have high performance and efficiency in big data handling?

About users and data:
Each user has about 3gb of data. Users are grouped based on their family and friend circles and the data is then analysed to predict important information and co-relations out of that. Currently I am talking about 10,000 users and will be rapidly increasing number of users.

That is a very vague question, there is no canon definition for what constitutes big data. From a development point of view the only thing that truly changes how you need to handle data is if you have so much that you can’t fit it all in memory at once.

How much of a problem that is depend greatly on what you need to do with the data, for most jobs you can do a single pass scheme where you load a block of data, do whatever needs to done with it, unload it, and go on to the next.

Sometimes issues can be solved by doing an organization pass, first going through data organizing it into chunks of data that need to be handled together, then going through each chunk.

If that strategy doesn’t fit your task you can still get a long way with OS handled disk swapping, handle the data in blocks as far as possible, but if you need a little arbitrary access here and there it is still going to work.

And of course an always excellent strategy when dealing with a lot of data is to dwarf it by hardware. You can get 64 GB of memory in 16 GB blocks for $500, if you are working with that much data it is an easily justified investment. Some good SSDs is a nobrainer.

Specific case:

A big part of this job is definitely to reduce those 3GB data per person. It is often a bit of an art in its own right to find out what can be thrown away, but given the amount I must presume that you have got a fair amount of bulk measurements, in general you should first find patterns and aggregations for those data, and then use those results for comparing persons to one another. The majority of your raw data is either noise, repetition or irrelevant, you have got to cut that away.

This reduction process is easily suitable for a cluster as you can just give each process its own pile of persons.

The processing thereafter is a bit more tricky, what is optimal depend on a lot of factors, and you will probably have to do some trial and error. If you can make it fit the job try to load select pieces of data from all persons on the same computer and compare those, do the same with other pieces of data on other computers. Use those results as new data sets etc.

It depends on what you want from your handling of big data. This concept is relatively vague. For example, if you’re talking about MapReduce jobs across disparate data sources, then you may be interested in using Hadoop Streaming with the Dumbo library. If you’re talking about statistical analysis, then NumPy and SciPy (as mentioned by Akira71) are interesting, as well as pandas (a data analysis toolkit). If you want graphing, look into matplotlib.

However, if you’re talking about the storage and querying of big data, Python is not your best bet. You will want something like the Hadoop ecosystem to make this perform well, perhaps with layers on top for querying and building intermediate data sets. One project that really interests me is Spark; you may want to look at it as well. Unfortunately, this type of application framework does not play to Python’s strengths.

Python is used extensively in the big data field. There are a couple of packages that tend to get used quite a bit and they are probably the main reason Python has made inroads so deeply into big data:

NumPy – The Fundamental package for scientific computing in Python
SciPy – Mathematics, Scientific and Engineering package

Given that they are both open source and the popularity and ease of learning Python has pretty much catapulted it’s use in Academia. This in turn has caused it to be used more and more outside academia and in larger companies or when students move into work roles they bring these packages with them.

These are very good packages and I had dabbled with them in a few projects. I have not used Python enough in Big Data projects to answer your ancillary question on how to handle Big Data efficiently with Python.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 00:53

Thẻ: big-data, python, web-applications

Thiết kế website giá rẻ

Danh mục

Python in Big Data?

Specific case: