Evaluating Flags in the Flag Status Register (FSR) After Floating Point Operations

The Flag Status Register (FSR), which contains bits representing floating point status, can be accessed using the __get_FSR(type) API, which is defined in c7x.h. The API takes a type argument, which refers to a valid scalar or vector floating point type (except for "float3") that is being used with the floating point operation.

The API returns an "OR" of the data bits for all pertinent vector lanes. The result is an 8-bit value containing the following fields:

For example:

float4 a = ... ; float4 b = ... ; float4 c = a * b; uint8_t fsr_val = __get_FSR(float4);

The __get_FSR(type) API is provided to make accessing the FSR easier. The actual hardware register is a 64-bit value that is divided into eight 8-bit chunks. Each 8-bit chunk corresponds to a 64-bit vector slice of data in either the input or output data, depending on the operation being performed. A 64-bit slice may consist of a 64-bit double or two 32-bit float values that are OR'd together by the hardware.

However, for vector operations, while this "OR" is done for every 64-bit slice, the results for all 64-bit slices are not OR'd together by the hardware. The reason for this is that when partial vectors are used (less than 512 bits), the upper lanes of a vector are considered invalid and are ignored and therefore shouldn't be reflected in the final FSR result. To ensure that only the information pertinent to the valid lanes of a vector are reflected, the API allows users to specify the scalar or vector type of the data they are working with. The API will then ensure that only the valid 64-bit vector slices are OR'd together through a sequence of instructions to produce a final 8-bit result. All invalid lanes are therefore ignored.

NOTE

Using the __get_FSR(type) API results in performance degradations. This is because the API inserts a sequence of instructions to ensure that only the valid vector lanes are reflected in the final result. The API also prevents loop vectorization throughout a function in which it is used because vectorization would change the number of valid vector lanes in ways the user isn't able to track.