To optimize the performance from the compiler generated code, use either OPTIMIZE(2) or OPTIMIZE(3) together with DFT(REORDER).
If you use OPTIMIZE(2) or OPTIMIZE(3) with DFT(ORDER) rather than DFT(REORDER), the runtime performance is less optimal and the compile time might be much longer.