Graphing

covid-ht graphing support is enabled by the GRAPHING setting.

It aims to provide “context” for the classification of an observation.

It generates a graph per pairwise combination of the variables specified in the GRAPHING_FIELDS setting. Note that this variables must be quantitative, categorical variables are not supported at this moment.

The coordinates of the observation being classified will be plotted on each graph with vertical and horizontal green lines.

Setting GRAPHING_DATASET to True will plot the dataset used for the classifier inference with blue for the NEGATIVE class (condition not present) and red for the POSITIVE.

Using GRAPHING_COND_DEC_FUNCTION will include in a contour form the Conditional Decision Function of the classifier, where darker blue implies more confidence on NEGATIVE while darker red on POSITIVE.

It is conditional because each plane is plotted considering the rest of the fields to be fixed at the value of the observation, i.e. for {rbc: 2, wbc: 2, plt: 100}, the plot of the Conditional Decision Function on (rbc, wbc) will show how the classifier considers each point of the graph given plt = 100, while the plot on (wbc, plt) will be given rbc = 2.

These planes aim to provide an approximation to the higher dimensional decision function of the classifier - which is not possible to graph - and should give you the idea of how “is” the observation relative to the classes (POSITIVE / NEGATIVE) in those variables, providing insights about the specifics - i.e. “This patient has these variables in line with the POSITIVE group yet these others are not, to be considered POSITIVE it would have had these variables at these values”.

The classifier takes into account all the variables together, its result may consider interactions among them which may not be reflected in the planes.

The score (Probability) of the decision is what should be taken into account for reliance, considering the metrics on the general perfomance of the classifier - i.e. accuracy, specificity, sensitivity, etc.

The Conditional Decision Function is expensive in computational terms, because of that the GRAPHING_MESH_STEPS can be specified to control the resolution of the mesh in which the function will be evaluated.

Also, techniques which do not support NA values natively and it is not implemented at an engine level have to resort to django-ai imputation - i.e. SVM - which increases the computational cost.

The implementation can be found here.