How do I systematically evaluate the performance of my Python script?

How do I know if my code is running fast enough? Is there a measurable way to test the speed & performance of my code?

For example, I have script that is reading CSV files and writing new CSV files while using Numpy to calculate statistics. Below, I’m using cProfiler for my Python script but after seeing resulting stats, what do I do next? In this case, I can see that the methods mean, astype, reduce from numpy, method writerow from csv and method append of python lists is taking a significant portion of the time.

How can I know if my code can improve or not?

<code> python -m cProfile -s cumulative OBSparser.py

176657699 function calls (176651606 primitive calls) in 528.419 seconds

Ordered by: cumulative time

ncalls tottime percall cumtime percall filename:lineno(function)

1 0.003 0.003 528.421 528.421 OBSparser.py:1(<module>)

1 0.000 0.000 526.874 526.874 OBSparser.py:45(start)

1 165.767 165.767 526.874 526.874 OBSparser.py:48(parse)

7638018 6.895 0.000 179.890 0.000 {method 'mean' of 'numpy.ndarray' objects}

7638018 56.780 0.000 172.995 0.000 _methods.py:53(_mean)

7628171 57.232 0.000 57.232 0.000 {method 'writerow' of '_csv.writer' objects}

7700878 52.580 0.000 52.580 0.000 {method 'reduce' of 'numpy.ufunc' objects}

7615219 50.640 0.000 50.640 0.000 {method 'astype' of 'numpy.ndarray' objects}

7668436 28.595 0.000 36.853 0.000 _methods.py:43(_count_reduce_items)

15323753 31.503 0.000 31.503 0.000 {numpy.core.multiarray.array}

45751805 13.439 0.000 13.439 0.000 {method 'append' of 'list' objects}

</code>

<code> python -m cProfile -s cumulative OBSparser.py 176657699 function calls (176651606 primitive calls) in 528.419 seconds Ordered by: cumulative time ncalls tottime percall cumtime percall filename:lineno(function) 1 0.003 0.003 528.421 528.421 OBSparser.py:1(<module>) 1 0.000 0.000 526.874 526.874 OBSparser.py:45(start) 1 165.767 165.767 526.874 526.874 OBSparser.py:48(parse) 7638018 6.895 0.000 179.890 0.000 {method 'mean' of 'numpy.ndarray' objects} 7638018 56.780 0.000 172.995 0.000 _methods.py:53(_mean) 7628171 57.232 0.000 57.232 0.000 {method 'writerow' of '_csv.writer' objects} 7700878 52.580 0.000 52.580 0.000 {method 'reduce' of 'numpy.ufunc' objects} 7615219 50.640 0.000 50.640 0.000 {method 'astype' of 'numpy.ndarray' objects} 7668436 28.595 0.000 36.853 0.000 _methods.py:43(_count_reduce_items) 15323753 31.503 0.000 31.503 0.000 {numpy.core.multiarray.array} 45751805 13.439 0.000 13.439 0.000 {method 'append' of 'list' objects} </code>

  python -m cProfile -s cumulative OBSparser.py
     176657699 function calls (176651606 primitive calls) in 528.419 seconds
  Ordered by: cumulative time
  ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       1    0.003    0.003  528.421  528.421 OBSparser.py:1(<module>)
       1    0.000    0.000  526.874  526.874 OBSparser.py:45(start)
       1  165.767  165.767  526.874  526.874 OBSparser.py:48(parse)
 7638018    6.895    0.000  179.890    0.000 {method 'mean' of 'numpy.ndarray' objects}
 7638018   56.780    0.000  172.995    0.000 _methods.py:53(_mean)
 7628171   57.232    0.000   57.232    0.000 {method 'writerow' of '_csv.writer' objects}
 7700878   52.580    0.000   52.580    0.000 {method 'reduce' of 'numpy.ufunc' objects}
 7615219   50.640    0.000   50.640    0.000 {method 'astype' of 'numpy.ndarray' objects}
 7668436   28.595    0.000   36.853    0.000 _methods.py:43(_count_reduce_items)
15323753   31.503    0.000   31.503    0.000 {numpy.core.multiarray.array}
45751805   13.439    0.000   13.439    0.000 {method 'append' of 'list' objects}

Can somebody explain the best practices?

How do I know if my code is running fast enough?

That very much depends on your use case — your program runs for 1.4 hours which might or might not be fast enough. If this is a one-time process 1.4 hours is not that much – spending any time on optimization is hardly worth the investment. On the other hand, if this is a process that should run e.g. once every hour, clearly it is worth finding a less time-consuming approach

Is there a measurable way to test the speed & performance of my code?

yes, profiling – and you’ve already done that. That’s a good start.

what do I do next?

Best practices include:

measure baseline performance (before any optimization)
analyze the parts where the program spends most of its time
reduce run-time complexity (the Big-O type)
check for the potential of parallel computation
compare against baseline performance

You have already done 1. So let’s move to 2.

Analysis

In your case the program spends most of it’s time in line OBSparser.py:48, of which a third is spent calculating the mean 7638018 times.

As the profiler output shows, this is on an ndarray, i.e. using numpy, and it doesn’t look like it’s taking a lot of time on a per-call basis. A quick calculation confirms that:

179′ / 7.638.018 = 23.6 microseconds per call

Since that’s already implemented in C-code (numpy), there is likely not much you can do to improve the per-call performance by changing the actual mean code (or using another library).

However, ask yourself several questions:

How can the number of calls to .mean() be reduced?
Can the calls to .mean() be implemented more efficiently?
Could the data be grouped and each group be processed independently?
ask more questions

Other calls worth looking at are to .astype() and reduce, I focused on .mean()simply for illustration.

Reducing complexity

Not knowing what your code actually does, here’s my 5cents on the specifics, anyway:

On 2., a quick check on my i7 core reveals that for ndarray.mean() to take 20-odd microseconds, this takes around 50 values. So I’m guessing your are grouping values and then calling .mean() on every group.
There might be more efficient ways – a search on numpy group aggregate performance or some variant of that might find you some helpful pointers.

Parallel computation

On 3. I’m guessing multi-processing is unlikely to be a solution here, since your computations seem mostly CPU-bound and the overhead of launching seperate tasks and exchanging data probably outweighs the benefits.

However there might be some use of SIMD-approach, i.e. vectorization. Again, just a hunch.

Compare against baseline performance

To reduce the time it takes to re-profile, consider subsetting your data such that the performance behavior is still visible (i.e. 23 us per call to .mean()) but where the total running time is under maybe 1-2 minutes, or even less. This will help you evaluate several approaches before applying them to your program in full. There is no use in running the full process over and over again just to test some small optimization.

You have forgotten the most basic question:

Is the speed satisfactory for the use case?

If the answer is “yes” -> don’t profile
If no, you might look at your table.

But honestly, it looks not terribly useful, because almost all time is spend in OBSparser.py:48(parse), which takes a LONG time. I would suggest you refactor that method into several separate methods.
You might use a visualizer to visualize the results, pycharm has good support for that use case.

This is what non-functional requirements of performance are for.

The notion of fast enough has nothing technical per se. It depends on user perception of your product, and should be translated through the requirements. This is the only objective way for you to tell whether your actual implementation is fast enough or not.

If you don’t have those requirements, anything else is speculation and unconstructive.

The user tells you that the app feels slow, but at any point, anybody specifies what slow means in terms of milliseconds, on which hardware and for which feature? Unconstructive: you can’t improve the code based on that, and you essentially can’t tell that a revision ago, the code was unacceptably slow, and now, it’s fast enough.
You think a specific feature can run faster than it currently is? That’s premature optimization, and goes against your users, who may not care at all about the speed of this feature, and may prioritize a specific bug, or need a new feature, or need something else to be faster.

How can I know if my code can improve or not?

Assume it always can. Some of the techniques include:

Rewriting code to use more memory but less CPU, or more CPU but less memory. This often leads to code which is very difficult to read, understand and maintain; this is one of the reasons why premature optimization should be avoided.
Using different data structures.
Relying on caching, precomputing stuff or using OLAP cubes.
Moving low level, including down to the Assembler.
Not doing the task. At all. That’s the ultimate optimization from N seconds to zero.

As others have noted, don’t optimize unless the speed is unsatisfactory.

You’ve moved on to the next step which is to profile.

Once you’ve profiled its time to look for possible optimization candidates:

Looking at your process which runs for 528 seconds.
You have one call to OBSparser.py:48(parse) using 166 seconds. If you could totally eliminate that time you would reduce the total time by only 31%
You have a number of calls to routines consuming between 50 and 60 seconds. Eliminating the time spend on any one of those calls would save about 10% of the time.

I don’t see any place you can significantly improve performance. With a lot of work, you might be able to gain a 10 to 20% performance improvement. Unless there are strong reasons to improve performance, I would consider the optimization done.

I usually don’t find optimization very useful if I haven’t identified a routine using at least 80% of the time. A tenfold performance improvement on such a routine will cut the time to 30% or less.

If you do find such a routine, look for a better algorithm. If you don’t, don’t waste your time.

It’s not about testing, it’s about tuning.
Since you’re doing a lot of I/O, any sort of “CPU profiler” is not what you want.
The method I always use is this.

Here’s what I would do if I were you: Tune the program until it is as fast as possible.
Then if it is not fast enough to be satisfactory, get faster hardware.

The way I would do it is take a number of samples manually.
Some of them will be in the process of doing I/O.
If they are mostly in I/O, then I would ask if there is any way to avoid some of that I/O.
(Don’t assume ahead of time that all I/O it’s doing is necessary. You may find that it’s doing something that could actually be avoided.)
If you can avoid some of the I/O, that will speed you up accordingly.

Now look at the samples landing in non-I/O processing.
Is it significant, like it takes more than 10% of the samples?
If so, is there any way to speed that up, by avoiding some of the work?

Each time you find something to improve, fix the program and run it all over again.
You may be pleasantly surprised that, since the last fix, some new thing shows up to fix, that you didn’t see before, but now it’s important.
When you can’t find anything more to fix, you can declare the program “as fast as you or probably anyone can make it”.

Then if it’s still not fast enough, your only option is faster CPU, solid-state disk drive, or whatever.

Trang chủ Giới thiệu Sinh nhật bé trai Sinh nhật bé gái Tổ chức sự kiện Biểu diễn giải trí Dịch vụ khác Trang trí tiệc cưới Tổ chức khai trương Tư vấn dịch vụ Thư viện ảnh Tin tức - sự kiện Liên hệ Chú hề sinh nhật Trang trí YEAR END PARTY công ty Trang trí tất niên cuối năm Trang trí tất niên xu hướng mới nhất Trang trí sinh nhật bé trai Hải Đăng Trang trí sinh nhật bé Khánh Vân Trang trí sinh nhật Bích Ngân Trang trí sinh nhật bé Thanh Trang Thuê ông già Noel phát quà Biểu diễn xiếc khỉ Xiếc quay đĩa Dịch vụ tổ chức sự kiện 5 sao Thông tin về chúng tôi Dịch vụ sinh nhật bé trai Dịch vụ sinh nhật bé gái Sự kiện trọn gói Các tiết mục giải trí Dịch vụ bổ trợ Tiệc cưới sang trọng Dịch vụ khai trương Tư vấn tổ chức sự kiện Hình ảnh sự kiện Cập nhật tin tức Liên hệ ngay Thuê chú hề chuyên nghiệp Tiệc tất niên cho công ty Trang trí tiệc cuối năm Tiệc tất niên độc đáo Sinh nhật bé Hải Đăng Sinh nhật đáng yêu bé Khánh Vân Sinh nhật sang trọng Bích Ngân Tiệc sinh nhật bé Thanh Trang Dịch vụ ông già Noel Xiếc thú vui nhộn Biểu diễn xiếc quay đĩa Dịch vụ tổ chức tiệc uy tín Khám phá dịch vụ của chúng tôi Tiệc sinh nhật cho bé trai Trang trí tiệc cho bé gái Gói sự kiện chuyên nghiệp Chương trình giải trí hấp dẫn Dịch vụ hỗ trợ sự kiện Trang trí tiệc cưới đẹp Khởi đầu thành công với khai trương Chuyên gia tư vấn sự kiện Xem ảnh các sự kiện đẹp Tin mới về sự kiện Kết nối với đội ngũ chuyên gia Chú hề vui nhộn cho tiệc sinh nhật Ý tưởng tiệc cuối năm Tất niên độc đáo Trang trí tiệc hiện đại Tổ chức sinh nhật cho Hải Đăng Sinh nhật độc quyền Khánh Vân Phong cách tiệc Bích Ngân Trang trí tiệc bé Thanh Trang Thuê dịch vụ ông già Noel chuyên nghiệp Xem xiếc khỉ đặc sắc Xiếc quay đĩa thú vị

Filed under: softwareengineering - @ 14:33

Thẻ: performance, python

Thiết kế website giá rẻ

Danh mục

How do I systematically evaluate the performance of my Python script?