Asynchronously returning a hierarchal data using .NET TPL… what should my return object “look” like?

I want to use the .NET TPL to asynchronously do a DIR /S and search each subdirectory on a hard drive, and want to search for a word in each file… what should my API look like?

In this scenario I know that each sub directory will have 0..10000 files or 0…10000 directories. I know the tree is unbalanced and want to return data (in relation to its position in the hierarchy) as soon as it’s available. I am interested in getting data as quickly as possible, but also want to update that result if “better” data is found (better means closer to the root of c:)

I may also be interested in finding all matches in relation to its position in the hierarchy. (akin to a report)

Question:

How should I return data to my caller?

My first guess is that I think I need a shared object that will maintain the current “status” of the traversal (started | notstarted | complete ) , and might base it on the System.Collections.Concurrent.

Another idea that I’m considering is the consumer/producer pattern (which ConcurrentCollections can handle) however I’m not sure what the objects “look” like.

Optional Logical Constraint: The API doesn’t have to address this, but in my “real world” design, if a directory has files, then only one file will ever contain the word I’m looking for. If someone were to literally do a DIR /S as described above then they would need to account for more than one matching file per subdirectory.

More information :

I’m using Azure Tables to store a hierarchy of data using these TPL extension methods. A “node” is a table. Not only does each node in the hierarchy have a relation to any number of nodes, but it’s possible for each node to have a reciprocal link back to any other node. This may have issues with recursion but I’m addressing that with a shared object in my recursion loop.

Note that each “node” also has the ability to store local data unique to that node. It is this information that I’m searching for. In other words, I’m searching for a specific fixed RowKey in a hierarchy of nodes.

When I search for the fixed RowKey in the hierarchy I’m interested in getting the results FAST (first node found) but prefer data that is “closer” to the starting point of the hierarchy.

Since many nodes may have the particular RowKey I’m interested in, sometimes I may want to get a report of ALL the nodes that contain this RowKey.

So I would break this out to two stages – the directory listing / traversal and the search function.

The listing / traversal is a classic tree node problem and there’s a boodle of examples out there for solving that problem. What we’ve done to solve that problem is to ID the upper layers first; retrieve that; then work on retrieves within the lower layers. The approach lends itself well to recursion and you end up with some tight code.

I would use a skip / take model and pick a smaller amount for each request. That will help keep the callback open / waiting while the traversal is taking place.

For an API structure… I would expose a single function to retrieve the structure. That function will provide the callback(s) to the calling function and will essentially act as a buffer for the actual traversal functions. Depending upon performance of the listing, you can now easily move this to a parallel traversal model and speed things up. .NET 4 and 4.5 have some pretty slick enhancements for parallelization.

How you want to pass the tree structure back is going to depend upon what you’re using for the UI. Pick a data structure that will play nicely with your UI components.

The search function would be a second call, but it should be able to rely upon | mirror some of the functions you created with the directory listing. Since you have the liberty of choosing your search order now (ie, you’re controlling where it starts from based upon the directory listing) you can preferentially search the closer-to-root locations first.

Here, I would probably create two calls into the API or just overload a single one. The overload would be the check for getting the first, closet to root instance or get all of them. As an aside, consider how you want to handle multiple hits at the same depth away from root.

Finally, depending upon how performant the listing is, you may want to consider some form of caching the directory structures. It shouldn’t be that expensive to keep in memory, but should save you some time by avoiding redundant IO.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 17:24

Thẻ: api, async, azure, parallel-programming

Thiết kế website giá rẻ

Danh mục

Asynchronously returning a hierarchal data using .NET TPL… what should my return object “look” like?