Socket Connecting to Large number of IPs

I have a text file of ~600 CIDR notation IP blocks which, when expanded, amount to ~17.5M IP addresses. I need to socket connect to each one. If it connects, I add it to a “live” list, if it returns an error/refusal, to a “dead” list. Then the socket is closed. I don’t need to read from it, I don’t need to write to it. Obviously, this is a problem of scale, if we generously assume that the connection takes only one second to return success or failure, it would take months to complete, but likely several years. I need to get it down to <24 hours.

Right now I’m using Python to expand/count each of the IP addresses, because it is trivial to do so. I am writing a simple multi-threaded C program to address the above problem. There are a few ways I have thought of to tackle this:

  1. Purely using C: I have not found a way to expand a CIDR block in C, (handling strings in general is a pain). I could probably cook something up, but if something already exists I’d love to hear about it.
    Will I be able to spawn enough threads? Even if I spawn a thread for each block, that 600 threads! I feel like I need to shrink the stack space allotted to the threads to do this maybe? Even so, I need to be able to handle a large number of strings because the blocks need to be expanded. Regardless, I have looked at the list by hand, and one of the blocks has a /10 CIDR notation, which amounts to >4M IPs by itself. This would still take far too long.

  2. Spawning C processes from Python: This would trivialize the string problem, and each individual IP could be sent to an instance of a C function called from Python, which would then end. The question I have is: when Python calls an external C function, does it continue running with the C process in parallel? Or does it wait for the C function to complete? I know Python does not allow multi-threading (or rather, it does, but it’s somewhat of a joke since only one line is interpreted at a time), so is this the correct way to “export” multi-threading?

  3. Vice versa: As above, but with C calling Python code, is this “more” correct? Which is to say, can C initiate multiple Python processes and continue to do it’s own thing?

  4. Something completely different.

Any questions, suggestions, or concerns are welcome. Please point out anything I might be missing or incorrect assumptions I have made.

Thank you for your help.

2

You’re going to struggle to make this work as well as you’re hoping. The precise figures vary depending on operating system, but if you try opening more than a few hundred sockets at a time on an ongoing basis you’re going to start running out of system resources pretty quickly. On windows desktop machines the limit is lower still (windows desktop prevents activities like this as part of am intentional plan to reduce the effectiveness of ddos attacks and worms).

I would suggest:

  • use a single-threaded process and non-blocking i/o (e.g. select in c, I don’t know if python supports this)

  • distribute your task over a small cluster so that you only need 100 or so sockets on each machine. A cloud service (eg amazon ec2) may be your best option.

Also see https://stackoverflow.com/a/3923785/441899 which has hints on tuning a linux system to increase the number of parallel connection attempts you can make.

1

You need to break this down into two steps:
First, use python to parse the text file and generate a list of IP address that are easy to consume in C.

Second, let’s look at the exact problem.
You want to “connect” but you are not going to read or write. I am not sure what the purpose of this is. Couldn’t you use ping to accomplish the same thing?
If you still want to open a socket then you need to implement the three way TCP/IP handshake ( SYN, SYN-ACK, ACK) in a single thread. You will be dealing directly with the underlying IP layer, essentially simulating what TCP does for you.
If you remember that each ‘connection’ is really just a pair of address, port combinations and you have 64k ports at your disposal, then your speed is limited only by the latency of the handshake.
(This is starting to sound like a good homework question…)
If you can fire out the SYN packets at a rate of several hundred a second, and each transaction has a round trip latency of 200 ms… You can work out how long it will take you to work through your list of millions of addresses.

Here are some useful references.
You want to learn to use raw sockets. You will be implementing the taco handshake yourself.

http://www.tenouk.com/Module43a.html

https://stackoverflow.com/questions/110341/tcp-handshake-with-sock-raw-socket

http://gonullyourself.org/library/A%20brief%20programming%20tutorial%20in%20C%20for%20raw%20sockets.txt

Good luck

2

Focus on the problem, not the solution. Taking this problem back to abstract terms, you’ve got a scenario that screams out for multiple processes communicating via queues. One process (the “input reader”) would loop reading items from your input list (CIDR blocks) and append them to a to-be-enumerated queue. A second set of processes (the “enumerators”) would loop grabbing the topmost to-be-enumerated item, expand them and append the results (individual IP addresses), one at a time, to a to-be-checked queue. A third set of processes (the “checkers”) would loop grabbing the topmost to-be-checked item, perform the check, and append the results to a to-be-reported queue. The last process (the “reporter”) would loop grabbing the topmost item from the to-be-reported queue and writing it to the final results.

2

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa

Socket Connecting to Large number of IPs

I have a text file of ~600 CIDR notation IP blocks which, when expanded, amount to ~17.5M IP addresses. I need to socket connect to each one. If it connects, I add it to a “live” list, if it returns an error/refusal, to a “dead” list. Then the socket is closed. I don’t need to read from it, I don’t need to write to it. Obviously, this is a problem of scale, if we generously assume that the connection takes only one second to return success or failure, it would take months to complete, but likely several years. I need to get it down to <24 hours.

Right now I’m using Python to expand/count each of the IP addresses, because it is trivial to do so. I am writing a simple multi-threaded C program to address the above problem. There are a few ways I have thought of to tackle this:

  1. Purely using C: I have not found a way to expand a CIDR block in C, (handling strings in general is a pain). I could probably cook something up, but if something already exists I’d love to hear about it.
    Will I be able to spawn enough threads? Even if I spawn a thread for each block, that 600 threads! I feel like I need to shrink the stack space allotted to the threads to do this maybe? Even so, I need to be able to handle a large number of strings because the blocks need to be expanded. Regardless, I have looked at the list by hand, and one of the blocks has a /10 CIDR notation, which amounts to >4M IPs by itself. This would still take far too long.

  2. Spawning C processes from Python: This would trivialize the string problem, and each individual IP could be sent to an instance of a C function called from Python, which would then end. The question I have is: when Python calls an external C function, does it continue running with the C process in parallel? Or does it wait for the C function to complete? I know Python does not allow multi-threading (or rather, it does, but it’s somewhat of a joke since only one line is interpreted at a time), so is this the correct way to “export” multi-threading?

  3. Vice versa: As above, but with C calling Python code, is this “more” correct? Which is to say, can C initiate multiple Python processes and continue to do it’s own thing?

  4. Something completely different.

Any questions, suggestions, or concerns are welcome. Please point out anything I might be missing or incorrect assumptions I have made.

Thank you for your help.

2

You’re going to struggle to make this work as well as you’re hoping. The precise figures vary depending on operating system, but if you try opening more than a few hundred sockets at a time on an ongoing basis you’re going to start running out of system resources pretty quickly. On windows desktop machines the limit is lower still (windows desktop prevents activities like this as part of am intentional plan to reduce the effectiveness of ddos attacks and worms).

I would suggest:

  • use a single-threaded process and non-blocking i/o (e.g. select in c, I don’t know if python supports this)

  • distribute your task over a small cluster so that you only need 100 or so sockets on each machine. A cloud service (eg amazon ec2) may be your best option.

Also see https://stackoverflow.com/a/3923785/441899 which has hints on tuning a linux system to increase the number of parallel connection attempts you can make.

1

You need to break this down into two steps:
First, use python to parse the text file and generate a list of IP address that are easy to consume in C.

Second, let’s look at the exact problem.
You want to “connect” but you are not going to read or write. I am not sure what the purpose of this is. Couldn’t you use ping to accomplish the same thing?
If you still want to open a socket then you need to implement the three way TCP/IP handshake ( SYN, SYN-ACK, ACK) in a single thread. You will be dealing directly with the underlying IP layer, essentially simulating what TCP does for you.
If you remember that each ‘connection’ is really just a pair of address, port combinations and you have 64k ports at your disposal, then your speed is limited only by the latency of the handshake.
(This is starting to sound like a good homework question…)
If you can fire out the SYN packets at a rate of several hundred a second, and each transaction has a round trip latency of 200 ms… You can work out how long it will take you to work through your list of millions of addresses.

Here are some useful references.
You want to learn to use raw sockets. You will be implementing the taco handshake yourself.

http://www.tenouk.com/Module43a.html

https://stackoverflow.com/questions/110341/tcp-handshake-with-sock-raw-socket

http://gonullyourself.org/library/A%20brief%20programming%20tutorial%20in%20C%20for%20raw%20sockets.txt

Good luck

2

Focus on the problem, not the solution. Taking this problem back to abstract terms, you’ve got a scenario that screams out for multiple processes communicating via queues. One process (the “input reader”) would loop reading items from your input list (CIDR blocks) and append them to a to-be-enumerated queue. A second set of processes (the “enumerators”) would loop grabbing the topmost to-be-enumerated item, expand them and append the results (individual IP addresses), one at a time, to a to-be-checked queue. A third set of processes (the “checkers”) would loop grabbing the topmost to-be-checked item, perform the check, and append the results to a to-be-reported queue. The last process (the “reporter”) would loop grabbing the topmost item from the to-be-reported queue and writing it to the final results.

2

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật