How to manage intermediate outputs efficiently?

I am implementing a C preprocessor in C…

I have the three functions:

trigraph replacing function
Line splicing function
comment removing function

However these functions work separately on files i.e.

First function will take a file and replace the trigraphs producing a temp-file1 as output

Second function will take temp-file1 as input and splice the lines and produce another temp-file2.

Third function will take temp-file2 as input and remove comment and produce yet another temp-file3.

And now the main preprocessing tasks will be performed on temp-file3 and a .i file will be produced as final output.

Now, I have 3 options :

Use temp files
use pipes
instead of intermediate temp-files or pipes use strings(i.e. whole temp-file1, 2 and 3 will be three big strings!!)

I have three doubts…

Option 1 seems less efficient than option2
option 2 seems to be perfect but will I be limited by size of that unnamed pipe? (since I have single process i.e. function 1 2 & 3 will be called one after another) What if temp output size > pipe’s total capacity?
option 3… Is it efficient, easy over previous two?

Please tell me, Which option should I choose?

option 4 is to refactor the functions so they can work on a stream and only process data as needed

in essence you call on function 3 if that needs more data it will call function 2 and if that needs more data it calls function 1 which reads directly from the input file; this will transform the preprocessor into a single pass instead of the 4-pass you have now

option 5 is concurrent processing, where you use a producer-consumer queue between the producing 1 and consuming 2 and a queue between the producing 2 and consuming 3, which produces for the main processing

option 5 will allow you to reuse more code as you can just replace all fwrites with pushes and all fread with polls (each blocking as the buffer fills up/gets empty) but you’ll need to spawn a thread for each function

Option 1 is a way of allocating memory (in this case, from the page cache), with the following caveats:

if your temporary files aren’t on the (a) temporary file system, your data will also be written pointlessly to disk
it’s possible for other processes to read and maybe modify your temporary files

Option 2 won’t work as stated: the pipe buffer will fill, and your write will block. Pipes are only safe if they’re being read & written concurrently (whether by different processes, different threads, or suitably co-ordinated co-routines).

Option 3 is reasonable. Note that if your three functions can only shorten the file, you could simply re-write a single buffer in-place.

All three options have serious drawbacks.

As indicated by @Useless in his answer, temporary files have the drawback of doing needless disk access with the risk of external entities modifying the files.
Both option 2 and 3 limit the size of the files you can process. In option 2, this is limited by the internal buffer of the pipes and in option 3 it is limited by the amount of memory you have free.

I would advise to consider a fourth option:

You have listed for stages in the processing

trigraph processing
line splicing
comment removal
main preprocessing

Option 4 would be that each stage calls the function of the preceding stage to provide it with characters that have been processed up to that point.
So, the main preprocessing function requests characters from the comment removal function.
The comment removal function in turn requests characters from the line splicing function. If those characters indicate the start of a comment, more characters are requested until the entire comment has been seen. Those characters are discarded and a single space is returned to the caller. Characters outside comments are returned as-is.
The line splicing and trigraph processing functions work similarly, with the trigraph function being the only one that reads a file.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 14:03

Thẻ: c++, strings, unix

Thiết kế website giá rẻ

Danh mục

How to manage intermediate outputs efficiently?