How can I best manage making open source code releases from my company’s confidential research code?

My company (let’s call them Acme Technology) has a library of approximately one thousand source files that originally came from its Acme Labs research group, incubated in a development group for a couple years, and has more recently been provided to a handful of customers under non-disclosure. Acme is getting ready to release perhaps 75% of the code to the open source community. The other 25% would be released later, but for now, is either not ready for customer use or contains code related to future innovations they need to keep out of the hands of competitors.

The code is presently formatted with #ifdefs that permit the same code base to work with the pre-production platforms that will be available to university researchers and a much wider range of commercial customers once it goes to open source, while at the same time being available for experimentation and prototyping and forward compatibility testing with the future platform. Keeping a single code base is considered essential for the economics (and sanity) of my group who would have a tough time maintaining two copies in parallel.

Files in our current base look something like this:

> // Copyright 2012 (C) Acme Technology, All Rights Reserved.
> // Very large, often varied and restrictive copyright license in English and French,
> // sometimes also embedded in make files and shell scripts with varied 
> // comment styles. 
> 
> 
>   ... Usual header stuff...
>
> void initTechnologyLibrary() {
>     nuiInterface(on);
> #ifdef  UNDER_RESEARCH
>     holographicVisualization(on);
> #endif
> }

And we would like to convert them to something like:

> // GPL Copyright (C) Acme Technology Labs 2012, Some rights reserved.
> // Acme appreciates your interest in its technology, please contact [email protected] 
> // for technical support, and www.acme.com/emergingTech for updates and RSS feed.
> 
>   ... Usual header stuff...
>
> void initTechnologyLibrary() {
>     nuiInterface(on);
> }

Is there a tool, parse library, or popular script that can replace the copyright and strip out not just #ifdefs, but variations like #if defined(UNDER_RESEARCH), etc.?

The code is presently in Git and would likely be hosted somewhere that uses Git. Would there be a way to safely link repositories together so we can efficiently reintegrate our improvements with the open source versions? Advice about other pitfalls is welcome.

It seems like it wouldn’t be too difficult to write a script to parse the preprocessors, compare them to a list of defined constants (UNDER_RESEARCH, FUTURE_DEVELOPMENT, etc.) and, if the directive can be evaluated to false given what’s defined, remove everything up to the next #endif.

In Python, I’d do something like,

import os

src_dir = 'src/'
switches = {'UNDER_RESEARCH': True, 'OPEN_SOURCE': False}
new_header = """// GPL Copyright (C) Acme Technology Labs 2012, Some rights reserved.
// Acme appreciates your interest in its technology, please contact [email protected] 
// for technical support, and www.acme.com/emergingTech for updates and RSS feed.
"""

filenames = os.listdir(src_dir)
for fn in filenames:
    contents = open(src_dir+fn, 'r').read().split('n')
    outfile = open(src_dir+fn+'-open-source', 'w')
    in_header = True
    skipping = False
    for line in contents:
        # remove original header
        if in_header and (line.strip() == "" or line.strip().startswith('//')):
            continue
        elif in_header:
            in_header = False
            outfile.write(new_header)

        # skip between ifdef directives
        if skipping:
            if line.strip() == "#endif":
                skipping = False
            continue
        # check
        if line.strip().startswith("#ifdef"):
            # parse #ifdef (maybe should be more elegant)
            # this assumes a form of "#ifdef SWITCH" and nothing else
            if line.strip().split()[1] in switches.keys():
                skipping = True
                continue

        # checking for other forms of directives is left as an exercise

        # got this far, nothing special - echo the line
        outfile.write(line)
        outfile.write('n')

I’m sure there are more elegant ways to do it, but this is quick and dirty and seems to get the job done.

I was thinking about passing your code through the preprocessor to only expand macros, thus outputting only the interesting part in the #ifdefs.

Something like this should work:

gcc -E yourfile.c

But:

You’ll lose all comments. You can use -CC to (kind of) preserve them, but then you’ll still have to strip off the old copyright notice
#includes are expanded too, so you’ll end up with a big file containing all the content of the included header files
You’ll lose “standard” macros.

There might be a way to limit which macros are expanded; however my suggestion here is to split up things, instead of doing (potentially hazardous) processing on the files (by the way, how would you plan to maintain them after? e.g. reintroduce code from the opensource version into your closed source?).

That is, try putting the code you want to opensource in in external libraries as much as possible, then use them as you would with any other library, integrating with other “custom” closed-source libraries.

It might take a bit longer at first to figure out how to restructure things, but it’s definitely the right way to accomplish this.

I have a solution but it will require a little work

pypreprocessor is a library that provides a pure c-style preprocessor for python that can also be use as a GPP (General Purpose Pre-Processor) for other types of source code.

Here’s a basic example:

from pypreprocessor import pypreprocessor

pypreprocessor.input = 'input_file.c'
pypreprocessor.output = 'output_file.c'
pypreprocessor.removeMeta = True
pypreprocessor.parse()

The preprocessor is extremely simple. It makes a pass through the source and conditionally comments out source based on what is defined.

Defines can be set either through #define statements in the source or by setting them in the pypreprocessor.defines list.

Setting the input/output parameters allow you to explicitly define which files are being opened/closed so a single preprocessor can be setup to batch process a large number of files if desired.

Setting the removeMeta parameter to True, the preprocessor should automatically extract any and all preprocessor statements leaving only the post-processed code.

Note: Usually this wouldn’t need to be set explicitly because python removed commented code automatically during the compilation to bytecode.

I only see one edge case. Because you’re looking to preprocess C source, you may want to set the processor defines explicitly (ie through pypreprocessor.defines) and tell it to ignore the #define statements in the source. That should keep it from accidentally removing any constants you may use in your project’s source code. There currently is no parameter to set this functionality but it would be trivial to add.

Here’s a trivial example:

from pypreprocessor import pypreprocessor

# run the script in 'production' mode
if 'commercial' in sys.argv:
    pypreprocessor.defines.append('commercial')

if 'open' in sys.argv:
    pypreprocessor.defines.append('open')

pypreprocessor.removeMeta = True
pypreprocessor.parse()

Then the source:

#ifdef commercial
// Copyright 2012 (C) Acme Technology, All Rights Reserved.
// Very large, often varied and restrictive copyright license in English and French,
// sometimes also embedded in make files and shell scripts with varied 
// comment styles.
#ifdef open
// GPL Copyright (C) Acme Technology Labs 2012, Some rights reserved.
// Acme appreciates your interest in its technology, please contact [email protected] 
// for technical support, and www.acme.com/emergingTech for updates and RSS feed.
#endif

Note: Obviously, you’ll need to sort out a way to set the input/output files but that shouldn’t be too difficult.

Disclosure: I am the original author of pypreprocessor.

Aside: I originally wrote it as a solution to the dreaded python 2k/3x maintenance issue. My approach was, do 2 and 3 development in the same source files and just include/exclude the differences using preprocessor directives. Unfortunately, I discovered the hard way that it’s impossible to write a true pure (ie doesn’t require c) preprocessor in python because the lexer flags syntax errors in incompatible code before the preprocessor gets a chance to run. Either way, it’s still useful under a wide range of circumstances including yours.

Probably it would be good idea to

1.add comment tags like :

> // *COPYRIGHT-BEGIN-TAG*
> // Copyright 2012 (C) Acme Technology, All Rights Reserved.
> // Very large, often varied and restrictive copyright license in English and French,
> // sometimes also embedded in make files and shell scripts with varied 
> // comment styles. 
> // *COPYRIGHT-ENG-TAG*
>   ... Usual header stuff...
>
> void initTechnologyLibrary() {
>     nuiInterface(on);
> #ifdef  UNDER_RESEARCH
>     holographicVisualization(on);
> #endif
> }

2. Write script for open source builder to go through all files
and replace text between COPYRIGHT-BEGIN-TAG and COPYRIGHT-ENG-TAG tags

I’m not going to show you a tool to convert your codebase, plenty of answers already did that. Rather, I’m answering your comment about how to handle branches for this.

You should have 2 branches:

Community (let’s call the open source version like this)
Professional (let’s call the closed source version like this)

The preprocessors shouldn’t exist. You have two different versions. And a cleaner codebase overall.

You’re afraid of maintaining two copies in parallel? Don’t worry, you can merge!

If you’re making modifications to the community branch, just merge them in the professional branch. Git handles this really well.

This way, you keep 2 maintained copies of your codebase. And releasing one for open source is easy as pie.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 20:03

Thẻ: licensing, open-source, parsing, version-control

Thiết kế website giá rẻ

Danh mục

How can I best manage making open source code releases from my company’s confidential research code?