Handling the process of large-scale lists [closed]

What are the efficient ways to process huge lists (+10 millions), and things to consider while manipulating huge lists.

First question, when should I use recursion, and when I shouldn’t. In both cases, we are allocating memory to store data (the memory-complexity should be equal), using recursive approach with a huge list would be memory-consuming (stocking the call in Heap-memory), as well as storing data in a list (stack-memory). Which approach with less-cost? (usually we can write any recursive algorithm to iterative algorithm).

Second question, When it comes to memory-comlexity, should I think of transforming this list to other data structure (like LL, BST, .. to easily manipulate the data), The problem with such solutions is that the memory allocation in this case would be doubled (every node would have reference+node value instead of a value in a list).

Or should I think of hard-copying this list to disk (text-data or csv), and process the data by portion. The problem of this solution is that some operations demand the correlation between values, we can’t process every portion independently, then reduce and concatenate the data.

Scenario 1: Sorting a huge list (Merge Sort(Recursive) Vs others )

Scnario 2: let’s consider this quick example (it’s not the pefect and the most relevant to what I am trying to ask, but just to show another example).

We are adding product of 4 successive elements to every element of those four elements.

L=range(100)
from operator import mul
def prodElement(L):
    getItems= lambda i: L[i:i+4]
    list_mul=[]
    for i in xrange(0,len(L)-3,4):
        p=reduce(lambda x,y:x*y,getItems(i))
        L[i],L[i+1],L[i+2],L[i+3]= L[i]+p,L[i+1]+p,L[i+2]+p,L[i+3]+p
    return L


def prodElement_rec(r,L,i=0):
    if i>=len(L)-1: #Stop
        return res
    else: 
        if i%4==0 or i==0:
            p= L[i]*L[i+1]*L[i+2]*L[i+3]
            L[i],L[i+1],L[i+2],L[i+3]= L[i]+p,L[i+1]+p,L[i+2]+p,L[i+3]+p
            return prodElement_rec(r,L,i+4)
res=[]
assert prodElement_rec(res,L)==prodElement(L)

Again, I am not trying to solve a specific issue, but am trying to understand the best approaches to be used when it comes to huge lists (+10 millions items). What are the methods that I should think of, and critical issues that I should avoid in order to solve problem using huge lists.

In the current form your question is likely to be closed as “too broad”, but since you asked in this broad form, I try to give you a broad answer:

recursion: don’t use recursion when the expected recursion depth will be in the order of magnitude of your list’s size. Especially when your programming language does not optimize tail recursions automatically (Python does not). And assumed your recursion depth stays reasonable: use it only when it makes the code simpler and easier to understand. In your example above, the recursion depth will be about “list size /4” (which will probably end in a stack overflow when your list size is as large as you wrote), and the code is definitely not simpler than the non-recursive variant.
writing data to disk: do this only when you expect to exceed the available main memory of your system (which means the memory available to your program). This will imply the need for a so-called external algorithm for the give task, which is almost always more complicated than using in-memory algorithms. So do this only if you must. EDIT, due to the comment above: using a database is a good alternative for a lot of scenarios. This will introduce some additional programming overhead on one hand, but can save a lot at the other.
which data structure to use: start with the most simplest data structure which is suitable for the given task, and then measure your performance (maybe with smaller lists first, but be sure you know the approximate order of growth of your algorithm’s runtime when you start extrapolating for bigger lists). Only when your code does not run fast enough, try more complex data structures and algorithms (and don’t forget to measure again).

To sum it up: it depends on what you are going to do with that lists, and your memory and time constraints.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 21:25

Thẻ: algorithms, big-o, data-structures, python, recursion

Thiết kế website giá rẻ

Danh mục

Handling the process of large-scale lists [closed]