When does ‘optimizing code’ == ‘structuring data’?

A recent article by ycombinator lists a comment with principles of a great programmer.

#7. Good programmer: I optimize code. Better programmer: I structure data. Best programmer: What’s the difference?

Acknowledging subjective and contentious concepts – does anyone have a position on what this means? I do, but I’d like to edit this question later with my thoughts so-as not to predispose the answers.

2

Nine times out of ten, when you structure your code/models well, optimization will become obvious. How many times have you seen a hornets nest and found it totally suboptimal, where upon restructuring it, lots of redundancies became extremely obvious.

A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.
– Antoine de Saint-Exupery

A well structured system will be minimal in nature, and due to it’s minimal nature it will be optimized because how little there is to it relates directly to how little it does to accomplish it’s goal.

Edit:
To expound upon the point other’s have taken away from this, it’s also completely accurate to see the statement as identifying the relation between code and data. That relation is thusly: If you change the structure of your data, you will need to change your code to respect the altered structure. If you wish to optimize your code, chances are you will need to change the structure of your data to make your code capable of handling the data more optimally.

That said, there is a totally separate possibility that was being eluded to here, and that would be that this fellow having relations with YCombinator may be referring to code AS data in the LISP tradition of homoiconicity. It’s a stretch to surmise this as the meaning in my mind, but it is YCombinator so I wouldn’t rule out that the quote is simply saying LISPers are the “Best Programmer”s.

4

I think the author is hinting that any restructuring of the data leads to code restructuring. Therefore, restructuring the data with the goal of optimizing your system will force you to optimize your code as well, prompting the “what’s the difference?” response.

Note that an “uber-excellent programmer” may reply to “what’s the difference?” that there is some difference left in there: once you venture into optimizing for improved use of the CPU cache, you may keep the layout of your data structures the same, but change the order in which you access them can make a great deal of a difference.

3

Consider the most obvious example of this – “searching for user data is too slow!”

If your user data is not indexed or at least sorted, then restructuring your data will quickly yield increased code performance. If the data is structured properly and you’re just iterating through the collection (rather than using the indexes or doing something like a binary search) then modifying the code yields increased code performance.

Programmers are problem solvers. While it is useful to distinguish between algorithms and data structures, they cannot often exist in isolation. The best programmers know this, and don’t isolate themselves unnecessarily.

I don’t agree with the statement mentioned above, well at least without explanation. I see coding is the activity involving the utilization of some data structures. Data structures would generally influence coding. So there is a difference between the two in my opinion.

I think the author should have written the last part as “Best programmer: I optimize both.”

There is a great book (at least it was in when published) called: Algorithms+Data Structures = Programs.

Optimizing code can sometimes improve speed by a factor of two, and occasionally by a factor of ten or even twenty, but that’s about it. That may sound like a lot, and if a 75% of a program’s execution time is spent in a five-line routine whose speed easily could be doubled, such an optimization may well be worth making. On the other hand, one’s selection of data structures may affect execution speed by many orders of magnitude. A modern hyper-optimized multi-threaded processor running super-optimized code to look up data by key in a 10,000,000-item linear linked list stored in RAM would be slower than a much slower processor running a rather simply-coded nested hash table. Indeed, if one had the data laid out properly, even a 1980’s computer fetching data from a hard drive might beat the modern CPU using the inferior data structure.

That having been said, designing efficient data structures often requires more complex trade-offs than optimizing code. For example, in many cases the data structures which allow data to be accessed most efficiently are less efficient to update (sometimes by orders of magnitude) than those which allow fast updates, and those which allow the fastest updates may allow the slowest access. Further, in many cases, data structures which are optimal for large data sets may be comparatively inefficient with small ones. A good programmer should strive to balance those competing factors with the amount of programmer time required to implement and maintain various data structures, and be able to strike a decent balance among them.

To articulate my best guess at what the article means, I’ll assume an unspoken subtext (which seems to be missing in the article) that any programmer should understand about optimization:

  • optimization comes only after you’ve got the program up and running correctly:
    • make it run correctly, then make it run fast
    • this principle is the point of Knuth’s maxim, “premature optimization is the root of all evil”
  • if and when you’ve determined that optimization is not premature, you must measure it properly first to determine what actually needs optimizing, and again and again during optimization, to tell what effects your attempts at optimization are having.
    • if your code runs in development, the profiler is your friend in this.
    • if your code runs in production, you must instrument your code, and make friends with your logging system instead.

Now, then: your measurements will tell you where in your code the machine is burning the most cycles. A “good” programmer will focus on optimizing those parts of the code, rather than wasting time optimizing the irrelevant parts.

However, you can often make larger gains by looking at the system as a whole, and finding some way to allow the machine to do less work. Frequently, these changes require reworking the organization of your data; thus, a “better” programmer will find himself structuring data more often than not.

The “best programmer” will have a thorough mental model of how the machine works, a good grounding in algorithm design, and a practical understanding of how they interact. This allows him to consider the system as an integrated whole — he will see no difference between optimizing the code and the data, because he evaluates them at an architectural level.

Data structures drive a lot of things relative to performance. I think that we can look at problems hard and long with a preconceived idea about the ideal data structure, and in this context of thinking, even create proofs (often by induction) of optimality. For example, if we put a sorted list into an array and evaluate things like the cost to insert an element we might decide on average we need to shift 1/2 of the array for each insertion. For each binary search, we can find a matching item (or not) in log n steps.

Alternatively, if we defer our decision about data structure (avoid premature optimization) and study the data coming in and the context where we will use it, how big it is, what latencies occur and which ones matter to users, how much memory we have vs. would use with data representations we know or can devise.

In an area like sorting and searching, there is a lot to know. Truly great programmers have been working on this a long time. Understanding these problems well is useful, and it is a great thing if you know more methods than when you finished undergrad data structures class. Binary trees can provide superior performance for insertions in exchange for higher memory use. Hash tables provide even bigger improvements, but for more memory still. A radix tree and radix sort can carry improvements even further.

Creative structuring of the data can help reframe a problem and open the door to new algorithms that make hard applications faster and sometimes impossible tasks possible.

Best programmer: What’s the difference?

Best programmer? No. Lousy programmer. I’m assuming the word “optimization” means those things that programmers typically try to optimize, memory or CPU time. In this sense, optimization goes against the grain of almost every other software metric. Understandability, maintainability, testability, etc.: These all take short shrift when optimization is the goal — unless what one is trying to optimize is human understandability, maintainability, testability, etc. Not to mention cost. Writing an speed / space optimal algorithm costs considerably more in terms of developer time than does naively coding the algorithm as presented in some text or journal. A lousy programmer doesn’t know the difference. A good one does. The best programmer knows how to determine exactly what needs to be optimized and does so judiciously.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị
Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa
Thiết kế website Thiết kế website Thiết kế website Cách kháng tài khoản quảng cáo Mua bán Fanpage Facebook Dịch vụ SEO Tổ chức sinh nhật