Program design with multiple data sets, algorithms and results

I have a question regarding the design of my software.

There are four different data sets and three consecutive algorithms operating on the data sets and on the results from the previous algorithm runs.

Let D1 – D4 be the set of data sets, A1 – A3 the set of algorithms and R1 – R3 the set of results from the respective algorithms.

A1 uses data sets D1, D2 and D3 and has to run on n different parameters.
A2 uses data sets D1 and D3 as well as R1_1 – R1_n.
A3 uses data sets D1, D2 and D4 as well as R2.

  D1    D1    D1
  D2    D3    D2
  D3          D4

  |     |     |
  v     v     v

n*A1    A2    A3
     _     _
  |  /| |  /| |
  v /   v /   v

R1_1    R2    R3
...
R1_n

Each of the results R1_1 – R1_n, R2 and R3 should have some output functionality.

So my question is how to organize the program and its classes. I guess I would want to create separate classes for each of D1 – D4, A1 – A3 and R1 – R3 and make the data and result objects attributes of the algorithm objects? Are there any reasons for or against creating n objects R1_1 – R1_n instead of one object R1 containing an array of n results?

Should the data classes include their own respective ‘readFromFile’ functions or would it be better to have a global utility class that reads all data from file and generates the data objects? The same goes for output functions, should they be placed in each of the result classes or should they be globally organized?

Kind regards!

The answer depend on details which is not mentioned in the question.

It will depend on the exact details of D1, D2, …, A1, A2. Such detail may not be worthwhile to mention in your question, because it will make the question too localized, and few will read all of the details. It may still help to get a more tailored answer.

If you find my answer too simplistic, you should be adding more detail to your question.

How much code could be shared between each type of data handler?

It depends on how much commonality exists between each type of data. By commonality I mean the similarity in structure, or sub-structure.

For example, given that

Each PersonContactInfo contains one or more TelephoneContact record.
Each BusinessDepartmentContactInfo (also) contains one or more TelephoneContact record.

Then, if there is any code to be written for the TelephoneContact record, it can be reused between the two or more classes that uses it.

By similarity in structure, what I mean is not similarity in values or statistical distribution. I can point you to a counterexample.

Code reuse is applicable to all programming paradigms. Some paradigms and programming languages have specific syntax and library features to help with code reuse.

Choosing a programming paradigm for your problem.

The decision of what programming paradigm to use should come after:

You understand the size of data (order of magnitude) that you intend to handle.
- If you need to handle billions of data, you may need to consider alternative paradigms, such as data-oriented design, database, or distributed processing.
You have a basic outline of what functionality needs to be in the software, but without going into too much detail.
You have a outline / preliminary plan of how you want your software to be structured.
- A very useful technique is “CRC card”, which stands for Class, Responsibility and Collaboration. It can be used for both object-oriented and non-object-oriented design.
You have identified the commonalities, as described above.

Deciding where to use POD (plain old data structure) or Objects.

If you have decided to use Java, it is recommended to use object-oriented design, as radarbob suggested above, because the advantage of using Java is lost if one do not take advantage of OOD.

It is assumed that you have finished with the Class-Responsibility-Collaborations step. In Object-oriented design, the result of CRC becomes your initial Java class design.

You can make further changes to your class design as you see fit, as in refactoring.

Writing code to read data from a file, and save results to a file.

There are many choices – so many that it is overwhelming.

At the minimum, a data reader needs to be implemented for the dataset format that is used by D1, D2, D3 and D4.
Java supports object serialization as part of its language.
However, if the source code of the class is modified, previously serialized data cannot be deserialized anymore – it will be lost.
- Because this is a very prevalent issue, there are many solutions: (just to name a few)
  - https://stackoverflow.com/questions/17914197/serializing-objects-with-changing-class-source-code
  - http://my.safaribooksonline.com/book/databases/hadoop/9780596521974/serialization/id4397802

Also, you may consider the following trade-off:

Space efficient – take up as little disk space as possible – often means binary data and non-human-readable.
Human-readable – use text as much as possible (XML or JSON), but which takes up more disk space
Human-readable combined with standard compression, such as GZIP. This is a balanced approach between space and readability.

Should I use a database?

Unfortunately I do not have much experience with database. Other people can contribute ideas.

This question may be better suited to be posted at programmers.stackexchange.com or codereview.stackexchange.com

However, here are some tips about when to break things into new classes:
https://stackoverflow.com/questions/15917696/is-it-better-to-have-more-java-classes-or-have-fewer-classes-doing-more-work

If I were you, I would only keep three of the classes at most: A1, A2 and A3. Alternatively, you could make the three algorithms methods of the same class. To keep your code DRY, you want one method responsible for reading in files (and one for output), unless the input files are fundamentally different.

Every object you create has some overhead, so your performance would be better if you can create just one object to hold your results.

Should the data classes include their own respective ‘readFromFile’ functions or would it be better to have a global utility class that reads all data from file and generates the data objects?

Follow the Single Responsibility Principle (SRP).

Don’t put data fetch in your data class(es).

Give your data class all the functionality it needs; for example implementing Comparable, Iterator, Equality as needed. Don’t make the data-client have to brute force do these kinds of things to the data object.

Neither the algorithm nor data classes should not know how to read the data – it should ask some other class. Likewise the algorithm should ask the data class to sort itself (for example), not do it for the data class.

Minimize Coupling

Don’t “new-up” your data objects inside the algorithm class, for example.

Loose coupling suggests you may have essentially a framework of sorts to drive the whole thing; perhaps a class to read data, one to build specific algorithm/data object combinations (Factory pattern), one to “get the data and algorithm together”, one to coordinate algorithm execution, et cetera.

Think OO all the way down into details

Do not play a “more vs fewer” classes game. This is a simply wrong-headed. Design a good structure (class hierarchy) for your data that strongly adheres to SRP. That is, data of any complexity is likely to be a composite of other types each handling it’s own behavior. Done well you will be amazed at how simple, clean, and small code becomes both in the containing data class and data clients.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 12:07

Thẻ: data, design, java

Program design with multiple data sets, algorithms and results

I have a question regarding the design of my software.

There are four different data sets and three consecutive algorithms operating on the data sets and on the results from the previous algorithm runs.

Let D1 – D4 be the set of data sets, A1 – A3 the set of algorithms and R1 – R3 the set of results from the respective algorithms.

A1 uses data sets D1, D2 and D3 and has to run on n different parameters.
A2 uses data sets D1 and D3 as well as R1_1 – R1_n.
A3 uses data sets D1, D2 and D4 as well as R2.

  D1    D1    D1
  D2    D3    D2
  D3          D4

  |     |     |
  v     v     v

n*A1    A2    A3
     _     _
  |  /| |  /| |
  v /   v /   v

R1_1    R2    R3
...
R1_n

Each of the results R1_1 – R1_n, R2 and R3 should have some output functionality.

Kind regards!

The answer depend on details which is not mentioned in the question.

If you find my answer too simplistic, you should be adding more detail to your question.

How much code could be shared between each type of data handler?

It depends on how much commonality exists between each type of data. By commonality I mean the similarity in structure, or sub-structure.

For example, given that

Each PersonContactInfo contains one or more TelephoneContact record.
Each BusinessDepartmentContactInfo (also) contains one or more TelephoneContact record.

Then, if there is any code to be written for the TelephoneContact record, it can be reused between the two or more classes that uses it.

By similarity in structure, what I mean is not similarity in values or statistical distribution. I can point you to a counterexample.

Code reuse is applicable to all programming paradigms. Some paradigms and programming languages have specific syntax and library features to help with code reuse.

Choosing a programming paradigm for your problem.

The decision of what programming paradigm to use should come after:

You understand the size of data (order of magnitude) that you intend to handle.
- If you need to handle billions of data, you may need to consider alternative paradigms, such as data-oriented design, database, or distributed processing.
You have a basic outline of what functionality needs to be in the software, but without going into too much detail.
You have a outline / preliminary plan of how you want your software to be structured.
- A very useful technique is “CRC card”, which stands for Class, Responsibility and Collaboration. It can be used for both object-oriented and non-object-oriented design.
You have identified the commonalities, as described above.

Deciding where to use POD (plain old data structure) or Objects.

If you have decided to use Java, it is recommended to use object-oriented design, as radarbob suggested above, because the advantage of using Java is lost if one do not take advantage of OOD.

It is assumed that you have finished with the Class-Responsibility-Collaborations step. In Object-oriented design, the result of CRC becomes your initial Java class design.

You can make further changes to your class design as you see fit, as in refactoring.

Writing code to read data from a file, and save results to a file.

There are many choices – so many that it is overwhelming.

At the minimum, a data reader needs to be implemented for the dataset format that is used by D1, D2, D3 and D4.
Java supports object serialization as part of its language.
However, if the source code of the class is modified, previously serialized data cannot be deserialized anymore – it will be lost.
- Because this is a very prevalent issue, there are many solutions: (just to name a few)
  - https://stackoverflow.com/questions/17914197/serializing-objects-with-changing-class-source-code
  - http://my.safaribooksonline.com/book/databases/hadoop/9780596521974/serialization/id4397802

Also, you may consider the following trade-off:

Space efficient – take up as little disk space as possible – often means binary data and non-human-readable.
Human-readable – use text as much as possible (XML or JSON), but which takes up more disk space
Human-readable combined with standard compression, such as GZIP. This is a balanced approach between space and readability.

Should I use a database?

Unfortunately I do not have much experience with database. Other people can contribute ideas.

This question may be better suited to be posted at programmers.stackexchange.com or codereview.stackexchange.com

However, here are some tips about when to break things into new classes:
https://stackoverflow.com/questions/15917696/is-it-better-to-have-more-java-classes-or-have-fewer-classes-doing-more-work

Every object you create has some overhead, so your performance would be better if you can create just one object to hold your results.

Should the data classes include their own respective ‘readFromFile’ functions or would it be better to have a global utility class that reads all data from file and generates the data objects?

Follow the Single Responsibility Principle (SRP).

Don’t put data fetch in your data class(es).

Minimize Coupling

Don’t “new-up” your data objects inside the algorithm class, for example.

Think OO all the way down into details

Filed under: softwareengineering - @ 12:07

Thẻ: data, design, java

Program design with multiple data sets, algorithms and results

I have a question regarding the design of my software.

There are four different data sets and three consecutive algorithms operating on the data sets and on the results from the previous algorithm runs.

Let D1 – D4 be the set of data sets, A1 – A3 the set of algorithms and R1 – R3 the set of results from the respective algorithms.

A1 uses data sets D1, D2 and D3 and has to run on n different parameters.
A2 uses data sets D1 and D3 as well as R1_1 – R1_n.
A3 uses data sets D1, D2 and D4 as well as R2.

  D1    D1    D1
  D2    D3    D2
  D3          D4

  |     |     |
  v     v     v

n*A1    A2    A3
     _     _
  |  /| |  /| |
  v /   v /   v

R1_1    R2    R3
...
R1_n

Each of the results R1_1 – R1_n, R2 and R3 should have some output functionality.

Kind regards!

The answer depend on details which is not mentioned in the question.

If you find my answer too simplistic, you should be adding more detail to your question.

How much code could be shared between each type of data handler?

It depends on how much commonality exists between each type of data. By commonality I mean the similarity in structure, or sub-structure.

For example, given that

Each PersonContactInfo contains one or more TelephoneContact record.
Each BusinessDepartmentContactInfo (also) contains one or more TelephoneContact record.

Then, if there is any code to be written for the TelephoneContact record, it can be reused between the two or more classes that uses it.

By similarity in structure, what I mean is not similarity in values or statistical distribution. I can point you to a counterexample.

Code reuse is applicable to all programming paradigms. Some paradigms and programming languages have specific syntax and library features to help with code reuse.

Choosing a programming paradigm for your problem.

The decision of what programming paradigm to use should come after:

You understand the size of data (order of magnitude) that you intend to handle.
- If you need to handle billions of data, you may need to consider alternative paradigms, such as data-oriented design, database, or distributed processing.
You have a basic outline of what functionality needs to be in the software, but without going into too much detail.
You have a outline / preliminary plan of how you want your software to be structured.
- A very useful technique is “CRC card”, which stands for Class, Responsibility and Collaboration. It can be used for both object-oriented and non-object-oriented design.
You have identified the commonalities, as described above.

Deciding where to use POD (plain old data structure) or Objects.

If you have decided to use Java, it is recommended to use object-oriented design, as radarbob suggested above, because the advantage of using Java is lost if one do not take advantage of OOD.

It is assumed that you have finished with the Class-Responsibility-Collaborations step. In Object-oriented design, the result of CRC becomes your initial Java class design.

You can make further changes to your class design as you see fit, as in refactoring.

Writing code to read data from a file, and save results to a file.

There are many choices – so many that it is overwhelming.

At the minimum, a data reader needs to be implemented for the dataset format that is used by D1, D2, D3 and D4.
Java supports object serialization as part of its language.
However, if the source code of the class is modified, previously serialized data cannot be deserialized anymore – it will be lost.
- Because this is a very prevalent issue, there are many solutions: (just to name a few)
  - https://stackoverflow.com/questions/17914197/serializing-objects-with-changing-class-source-code
  - http://my.safaribooksonline.com/book/databases/hadoop/9780596521974/serialization/id4397802

Also, you may consider the following trade-off:

Space efficient – take up as little disk space as possible – often means binary data and non-human-readable.
Human-readable – use text as much as possible (XML or JSON), but which takes up more disk space
Human-readable combined with standard compression, such as GZIP. This is a balanced approach between space and readability.

Should I use a database?

Unfortunately I do not have much experience with database. Other people can contribute ideas.

This question may be better suited to be posted at programmers.stackexchange.com or codereview.stackexchange.com

However, here are some tips about when to break things into new classes:
https://stackoverflow.com/questions/15917696/is-it-better-to-have-more-java-classes-or-have-fewer-classes-doing-more-work

Every object you create has some overhead, so your performance would be better if you can create just one object to hold your results.

Should the data classes include their own respective ‘readFromFile’ functions or would it be better to have a global utility class that reads all data from file and generates the data objects?

Follow the Single Responsibility Principle (SRP).

Don’t put data fetch in your data class(es).

Minimize Coupling

Don’t “new-up” your data objects inside the algorithm class, for example.

Think OO all the way down into details

Filed under: softwareengineering - @ 12:07

Thẻ: data, design, java

Program design with multiple data sets, algorithms and results

I have a question regarding the design of my software.

There are four different data sets and three consecutive algorithms operating on the data sets and on the results from the previous algorithm runs.

Let D1 – D4 be the set of data sets, A1 – A3 the set of algorithms and R1 – R3 the set of results from the respective algorithms.

A1 uses data sets D1, D2 and D3 and has to run on n different parameters.
A2 uses data sets D1 and D3 as well as R1_1 – R1_n.
A3 uses data sets D1, D2 and D4 as well as R2.

  D1    D1    D1
  D2    D3    D2
  D3          D4

  |     |     |
  v     v     v

n*A1    A2    A3
     _     _
  |  /| |  /| |
  v /   v /   v

R1_1    R2    R3
...
R1_n

Each of the results R1_1 – R1_n, R2 and R3 should have some output functionality.

Kind regards!

The answer depend on details which is not mentioned in the question.

If you find my answer too simplistic, you should be adding more detail to your question.

How much code could be shared between each type of data handler?

It depends on how much commonality exists between each type of data. By commonality I mean the similarity in structure, or sub-structure.

For example, given that

Each PersonContactInfo contains one or more TelephoneContact record.
Each BusinessDepartmentContactInfo (also) contains one or more TelephoneContact record.

Then, if there is any code to be written for the TelephoneContact record, it can be reused between the two or more classes that uses it.

By similarity in structure, what I mean is not similarity in values or statistical distribution. I can point you to a counterexample.

Code reuse is applicable to all programming paradigms. Some paradigms and programming languages have specific syntax and library features to help with code reuse.

Choosing a programming paradigm for your problem.

The decision of what programming paradigm to use should come after:

You understand the size of data (order of magnitude) that you intend to handle.
- If you need to handle billions of data, you may need to consider alternative paradigms, such as data-oriented design, database, or distributed processing.
You have a basic outline of what functionality needs to be in the software, but without going into too much detail.
You have a outline / preliminary plan of how you want your software to be structured.
- A very useful technique is “CRC card”, which stands for Class, Responsibility and Collaboration. It can be used for both object-oriented and non-object-oriented design.
You have identified the commonalities, as described above.

Deciding where to use POD (plain old data structure) or Objects.

If you have decided to use Java, it is recommended to use object-oriented design, as radarbob suggested above, because the advantage of using Java is lost if one do not take advantage of OOD.

It is assumed that you have finished with the Class-Responsibility-Collaborations step. In Object-oriented design, the result of CRC becomes your initial Java class design.

You can make further changes to your class design as you see fit, as in refactoring.

Writing code to read data from a file, and save results to a file.

There are many choices – so many that it is overwhelming.

At the minimum, a data reader needs to be implemented for the dataset format that is used by D1, D2, D3 and D4.
Java supports object serialization as part of its language.
However, if the source code of the class is modified, previously serialized data cannot be deserialized anymore – it will be lost.
- Because this is a very prevalent issue, there are many solutions: (just to name a few)
  - https://stackoverflow.com/questions/17914197/serializing-objects-with-changing-class-source-code
  - http://my.safaribooksonline.com/book/databases/hadoop/9780596521974/serialization/id4397802

Also, you may consider the following trade-off:

Space efficient – take up as little disk space as possible – often means binary data and non-human-readable.
Human-readable – use text as much as possible (XML or JSON), but which takes up more disk space
Human-readable combined with standard compression, such as GZIP. This is a balanced approach between space and readability.

Should I use a database?

Unfortunately I do not have much experience with database. Other people can contribute ideas.

This question may be better suited to be posted at programmers.stackexchange.com or codereview.stackexchange.com

However, here are some tips about when to break things into new classes:
https://stackoverflow.com/questions/15917696/is-it-better-to-have-more-java-classes-or-have-fewer-classes-doing-more-work

Every object you create has some overhead, so your performance would be better if you can create just one object to hold your results.

Should the data classes include their own respective ‘readFromFile’ functions or would it be better to have a global utility class that reads all data from file and generates the data objects?

Follow the Single Responsibility Principle (SRP).

Don’t put data fetch in your data class(es).

Minimize Coupling

Don’t “new-up” your data objects inside the algorithm class, for example.

Think OO all the way down into details

Filed under: softwareengineering - @ 12:07

Thẻ: data, design, java

Danh mục

The answer depend on details which is not mentioned in the question.

How much code could be shared between each type of data handler?

Choosing a programming paradigm for your problem.

Deciding where to use POD (plain old data structure) or Objects.

Writing code to read data from a file, and save results to a file.

Should I use a database?

Follow the Single Responsibility Principle (SRP).

Minimize Coupling

Think OO all the way down into details

The answer depend on details which is not mentioned in the question.

How much code could be shared between each type of data handler?

Choosing a programming paradigm for your problem.

Deciding where to use POD (plain old data structure) or Objects.

Writing code to read data from a file, and save results to a file.

Should I use a database?

Follow the Single Responsibility Principle (SRP).

Minimize Coupling

Think OO all the way down into details

The answer depend on details which is not mentioned in the question.

How much code could be shared between each type of data handler?

Choosing a programming paradigm for your problem.

Deciding where to use POD (plain old data structure) or Objects.

Writing code to read data from a file, and save results to a file.

Should I use a database?

Follow the Single Responsibility Principle (SRP).

Minimize Coupling

Think OO all the way down into details

The answer depend on details which is not mentioned in the question.

How much code could be shared between each type of data handler?

Choosing a programming paradigm for your problem.

Deciding where to use POD (plain old data structure) or Objects.

Writing code to read data from a file, and save results to a file.

Should I use a database?

Follow the Single Responsibility Principle (SRP).

Minimize Coupling

Think OO all the way down into details