You can achieve the best performance from your C6000 code if you follow this flow when you are writing and debugging your code:
There are three phases of code development for the C6000:
You can develop your C/C++ code for phase 1 without any knowledge of the C6000. Compile with the --opt_level=3 option and without any --debug option. Identify any inefficient areas in your C/C++ code. See Section 4.17 for more information about debugging and profiling optimized code. To improve the performance of your code, proceed to phase 2.
In phase 2, use the intrinsics and compiler options that are described in this book to improve your C/C++ code. Check the performance of your altered code. Refer to the TMS320C6000 Programmer's Guide for hints on refining C/C++ code. If your code is still not as efficient as you would like it to be, proceed to phase 3.
In this phase, you extract the time-critical areas from your C/C++ code and rewrite the code in linear assembly. You can use the assembly optimizer to optimize this code. When you are writing your first pass of linear assembly, you should not be concerned with the pipeline structure or with assigning registers. Later, when you are refining your linear assembly code, you might want to add more details to your code, such as partitioning registers.
Improving performance in this stage takes more time than in phase 2, so try to refine your code as much as possible before using phase 3. Then, you should have smaller sections of code to work on in this phase.