I know three terms that have to do with storing objects or data. I’m not sure I understand the exact differences between the terms. Please confirm if what I’m about to write is accurate.
A data structure is a group of data types or objects of the same type, grouped together in some formation.
A collection is the same as a data structure (I don’t mean a Java Collection, I mean a collection in the more general meaning).
An aggregate is an object containing a collection or data structure, providing an interface to communicate with the inner data structure. For example, the object someContainer
, which contains an inner ArrayList
, is an aggregate because it contains a data structure, encapsulates it, and provides an interface to manipulate it.
Is all of this accurate?
A Data Structure is any structure that holds data. Data Structures are distinguished from each other by their memory usage and performance characteristics. For example, the lookup performance of a Hash Table is O(1), while that of a balanced binary tree is O(log n).
A Collection is any data structure that can hold zero or more data items. Generally, the data items will be of the same type or, in languages supporting inheritance, derived from some common ancestor type. Collections can utilize certain data structures, such as Hash Tables and Balanced Binary Trees, to improve their memory and performance characteristics.
An Aggregate, in the general sense, is a whole combined from individual elements (the individual elements can form a collection). Examples include Sum and Average functions, the results of those functions, and the process by which data is combined in a data warehouse to speed up retrieval.
4
A Data Structure is a computer science term for some way of ordering data, and how you can store/retrieve data from it.
A Collection is a Data Structure which can hold an arbitrary number of objects. There are many types of collections, differentiated by their data organization and by their store/retrieve algorithms, giving different complexity in different use-cases. For example:
- an Array has a random access read (which means that if you know at what index you item is, you can retrieve it with complexity O(1)), but for adding a new item beyond the capacity of the array, you need to copy it to an array with double capacity, giving a worst case of write of O(n).
- a Linked list, on the other hand, doesn’t have random access, meaning that you need O(n) for retrieving an object, but adding an element to it is in O(1).
An Aggregate function is a database term for taking a vector data (which is an arbitrary number of items) and by using some operation (addition, average, etc.) creating a scalar (which is a single item, or a fixed number of items) result. Examples may be count
, sum
, average
, max
. One feature of aggregate functions is that it does not retain whole the information it receives, but keeps only some aspect of it.