Big Sistah — Quantifying the wellbeing of multilingual remote workers in real-time

PRIN 2022 Munoz Martin

Abstract

Abstract Remote workers’ wellbeing has been mainly studied with competing survey methods and introspective indicators, yielding scattered, hardly comparable results. Work psychology, HCI, usability and writing process studies have developed some innovative indicators to study remote workers, but current research efforts rarely triangulate data to reliably derive new knowledge—especially from the scope of the remote workers’ cognition and wellbeing. Thus, the ongoing labor revolution towards remote working is mostly based on trial and error, often with mixed results. New consulting start-ups provide companies with guidance to switch to remote working, but they tend to focus on the control of the employees and their productivity, and they generally disregard the deep changes in the remote workers’ ways and in work setups. The landslide shift to remote working setups, due to the pandemic, crucially opened a unique window of opportunities. From a labor welfare perspective, it lets us study emerging behaviors in remote workers common to many job profiles, from healthcare providers to white collar civil servants. Remote workers add human value (e.g., expert analysis and decision making) to intensive information-processing tasks through HCI, often in multilingual settings. In 2022, more than 50% of the world’s population was using the Internet ca. 7 hours a day—so, not only for leisure—and they reach contents in other languages daily. Gist machine translation is now part of people’s everyday lives. Current tools to measure a few new indicators are often proprietary software, mainly for remote workers’ surveillance. Open-source prototypes are too generic, and combining them leads to clunky, unrealistic, and unreliable research settings and results. Social research inspired by situated cognition, like this project, demands non-invasive methods to access remote workers’ performance in their natural work environments and with full respect to their privacy. To do so, this project created Big Sistah, a research suite to study and to perform real-time monitoring of multilingual remote workers’ activities at their workstations, covering their (1) profiles, (2) emerging working habits, (3) mental fatigue, stress, and motivation through attentional changes, and (4) their impact on efficiency, efficacy, and productivity. The key contributions are (a) a set of new and integrated indicators and (b) the technology to seamlessly collect data for such indicators at runtime, empowering the scientific community with an open-source, interdisciplinary research toolbox. The project was coordinated at the University of Bologna (Forlì campus, MC2 Lab) by Ricardo Muñoz Martín, with Sara Puerini and Du Zhiqiang, and carried out jointly with the University of Milano-Bicocca unit led by Giovanni Denaro, with Pietro Braione and Roberto Crotti. Achieved results Big Sistah 1.0 consists of four modules. Saga and Taylor are web applications; Echo and Munio are desktop applications. Together they cover the full research workflow, from project setup and participant profiling to behavioral data collection and analysis. Saga is the research management environment. Principal investigators use it to configure studies, build and manage research teams, recruit and organize participants into groups, schedule activities through an integrated calendar, and handle informed consent through a built-in form builder or by uploading existing PDF forms. Saga also serves as the configuration hub for Taylor and Echo: it generates personalized configuration files controlling exactly what each participant’s Echo session records, applying privacy settings at the individual level, and configures Taylor instruments per group. Two access roles are supported: PI, with full administrative control, and Researcher, with scoped access to assigned projects and data only. Taylor is the participant-facing profiling and assessment platform, delivered through a gamified interface designed to reduce the demands of what can be a lengthy session for working participants. Its instrument library is organized in three tiers. Basic calibration is divided into language-based instruments—the Language History Questionnaire 3 and LexTALE lexical decision task—and non-language-based instruments—the Typing Task, an extended multilingual version of the Inputlog typing task adapted for language-specific characteristics and more realistic copying behavior, and the Big Five Markers personality questionnaire. The questionnaire suite is divided into affect instruments—the Positive and Negative Affect Schedule, Transportation Scale, Reduced Emotional Resonance in LX, and NASA Task Load Index—and work wellbeing instruments—the Oldenburg Burnout Inventory, Utrecht Work Engagement Scale, and Copenhagen Psychosocial Questionnaire III. The cognitive and psycholinguistic test battery covers five domains: memory (Consonant Trigram, Letter Memory, Digit Span, N-Back, Sternberg Memory, Change Detection, STM Binding), attention (Self-Paced Reading, Flanker, Inhibition of Return, Simon, Spatial Orientation, Sustained Attention to Response, Visual Search), inhibitory control (Go No Go, Negative Priming, Stop Signal, Stroop), cognitive flexibility (Cued Task Switching, Task Switching, Wisconsin Card Sorting), and planning (Tower of Hanoi). All tests are auto-scored against published norms where applicable, and each is introduced through a film reference that embodies its cognitive demand in a pop-art interface. Echo is the cross-platform desktop data-collection application, running unobtrusively at system level across all applications a remote worker uses on Windows and macOS. It records up to six synchronized channels: keyboard and mouse interaction with millisecond precision, internet activity, screen logging with OCR support for nineteen language models including non-alphabetic scripts, event-triggered screen capture, full screen recording, and audio. All streams are Unix-timestamped for seamless multimodal alignment. Each participant receives a personalized configuration file that specifies active channels, encodes an expiration date after which Echo goes dark, and is bound to that participant’s credentials. Privacy is structural rather than procedural: Echo ignores private browsing, AES-encrypts all outgoing files, enforces consent before installation, anonymizes participants through a double-nickname system, and gives participants full control over recording including password-protected pause. The University of Milano-Bicocca unit developed Hylog as a complementary hybrid keylogging add-on for IME-mediated non-alphabetic script input, submitted for publication to IEEE Access. Munio is the analytical environment, operated exclusively by the researcher and running entirely offline with no network activity and no data transmission. It processes Echo’s encrypted session archives through four sequential stages: preprocessing, using three parallel NLP engines (NLTK, SpaCy, Stanza) with a weighted voting mechanism across eight operations from tokenization to named entity recognition; annotation, allowing researchers to review, correct, merge, split, and retag automatic analyses, and to add custom tagsets, across seventeen languages; alignment, linking source texts, target texts, and Echo behavioral events across four columns through the Task Segment Framework, including non-consecutive linking and revisit tracking; and text profiling, applying nineteen indicators normalized to a 0–100 scale and computable from token to full-text level, covering lexical properties (lexical density, diversity, frequency, hapax legomena, named entities, numbers, n-gram proportion and diversity, nominality, descriptiveness, context dependency), cross-linguistic properties (cognates, false friends), syntactic properties (sentence length, syntactic complexity with cross-linguistic calibration), and discourse properties (explicit cohesion, implicit cohesion, punctuation). All outputs are produced in JSON and TSV formats and feed a normalized SQLite relational database queryable by SQL, R, Python, or any compatible tool. Version 2.0 will integrate Taylor profiling data into the same database. Pilot studies confirmed the viability of the full workflow in natural working environments. Echo and Munio will be released as open-source software. Work is underway to localize the interface and the gamified assessment instruments into additional languages, extending the platform’s multilingual accessibility and supporting its adoption by a broader international research community.

Dettagli del progetto

Responsabile scientifico: Ricardo Munoz Martin

Strutture Unibo coinvolte:
Dipartimento di Interpretazione e Traduzione

Coordinatore:
ALMA MATER STUDIORUM - Università di Bologna(Italy)

Contributo totale di progetto: Euro (EUR) 201.271,00
Contributo totale Unibo: Euro (EUR) 110.695,00
Durata del progetto in mesi: 24
Data di inizio 15/10/2023
Data di fine: 28/02/2026

Loghi degli enti finanziatori