Measuring on-screen representation: A new approach outlined

Computer vision is revolutionising the way we quantify and analyse on-screen representation, offering a powerful tool to systematically measure aspects like presence, prominence, and portrayal. This automated visual analysis technique can provide objective, scalable methods for detecting and characterising who appears on screen, how frequently and prominently they appear, and how they are visually portrayed.

Presence, Prominence, and Portrayal

Presence

Using object detection and facial recognition algorithms, computer vision can identify and count distinct characters or individuals appearing on screen over time, thus quantifying on-screen presence automatically.

Prominence

Techniques like semantic segmentation and attention-mapping can measure screen time, relative size, and positioning of subjects. For example, prominence can be inferred by analysing bounding box sizes around faces or bodies, duration of appearance, and spatial centrality on screen.

Portrayal

Beyond simple presence, computer vision combined with machine learning classifiers can analyse posture, facial expressions, gaze direction, and context cues to infer emotional or stereotypical portrayals. Advanced vision-language models can even interpret scenes with textual context to analyse narrative framing or roles portrayed.

These capabilities rely on deep learning, convolutional neural networks (CNNs), and newer architectures such as multitask vision-language models that jointly consider visual data and semantic context.

Ethical and Logistical Considerations

While computer vision offers significant potential, its deployment must be ethical and responsible. Privacy and consent, bias and fairness, accuracy and interpretability, data volume and quality, context sensitivity, responsibility and accountability are all critical factors to consider.

Extracting detailed on-screen representation data may involve processing faces and identities, raising privacy concerns especially if used beyond publicly available media. Ethical use mandates strict compliance with privacy laws and consideration of consent where applicable.

Bias and Fairness

Computer vision models are susceptible to biased training data, which can skew interpretation of presence and portrayal towards stereotypes or underrepresentation. Ensuring diverse and balanced datasets and performing fairness audits is critical.

Accuracy and Interpretability

Automated methods must be validated for reliability, as misclassification can misrepresent groups or narratives. Developing interpretable models that provide insight into their decisions can increase trust and usefulness.

Data Volume and Quality

Large, annotated datasets are necessary to train accurate models. Logistically, acquiring, curating, and processing high volumes of video data with diverse contexts is resource-intensive.

Context Sensitivity

Portrayal nuances depend heavily on cultural, temporal, and contextual factors that may not be fully captured by visual data alone. Ethical frameworks should acknowledge these limitations and avoid overgeneralization.

Responsibility and Accountability

Clear policies should govern who uses these analyses and for what purposes, preventing misuse in ways that could reinforce harmful biases or infringe on individual rights.

In summary, computer vision offers a valuable tool for quantifying on-screen presence, prominence, and portrayal at scale. However, ethical deployment requires attention to privacy, bias, accuracy, and contextual nuance, combined with transparent and responsible governance of the methodologies and findings. This balanced approach can significantly widen and deepen the empirical evidence base about representation in media.

Limitations and Future Directions

The framework advises against computationally inferring characters' or people's demographics. Researchers have found that darker-skinned females have lower accuracy rates in many commercial face detection models. There is a need for more research on when face detections are missed and the causes of this, as well as the factors that cause different faces to be mistaken as the same face.

Programmes with many recurring frontal faces allow for easier clustering and identification of character appearances. Examples of metrics to capture 'Portrayal' include emotion of faces or the words by a character and likelihood of appearing next to particular objects like weapons or drinks. Examples of metrics to capture 'Prominence' include duration of screen time and likelihood to appear as a solo face on screen. Examples of metrics to capture 'Presence' include the makeup of the cast by gender/ethnicity.

The optimal model parameters for clustering faces vary by the type of programme being analysed. Interdisciplinary efforts are key to thoughtfully deploy computational methods to generate richer and more regular data about representation.

The framework suggests that measures of representation can be categorised under '3P's - presence, prominence, and portrayal. Programmes with higher variance in viewpoint, more crowds (smaller faces), and darker lighting give less reliable clustered faces.

The framework aims to compare different methods of data compilation for representation and promote the use of quantitative analysis.

Computer vision, through the use of object detection and facial recognition algorithms, can automatically quantify the presence of distinct characters or individuals on screen over time.
Techniques like semantic segmentation and attention-mapping can measurement of screen time, relative size, and positioning of subjects to infer their prominence.
Combined with machine learning classifiers, computer vision can analyze posture, facial expressions, gaze direction, and context cues to infer emotional or stereotypical portrayals.
These capabilities rely on deep learning, convolutional neural networks (CNNs), and newer architectures such as multitask vision-language models.
While the ethical deployment of computer vision is crucial, considerations include privacy and consent, bias and fairness, accuracy and interpretability, data volume and quality, context sensitivity, responsibility, and accountability.
In terms of limitations, researchers have found that darker-skinned females have lower accuracy rates in many commercial face detection models, and there is a need for more research on this issue.
Interdisciplinary efforts are key to thoughtfully deploying computational methods to generate richer and more regular data about representation, and the framework suggests that measures of representation can be categorized under '3P's - presence, prominence, and portrayal.

Measuring on-screen representation: A new approach outlined