SensSoft Proposal

Abstract

The Software as a Sensor™ (SensSoft) Project offers an open-source (ALv2.0) software tool usability testing platform. It includes a number of components that work together to provide a platform for collecting data about user interactions with software tools, as well as archiving, analyzing and visualizing that data. Additional components allow for conducting web-based experiments in order to capture this data within a larger experimental framework for formal user testing. These components currently support Java Script-based web applications, although the schema for “logging” user interactions can support mobile and desktop applications, as well. Collectively, the Software as a Sensor Project provides an open source platform for assessing how users interacted with technology, not just collecting what they interacted with.

Proposal

The Software as a Sensor™ Project is a next-generation platform for analyzing how individuals and groups of people make use of software tools to perform tasks or interact with other systems. It is composed of a number of integrated components:

Each component is available through its own repository to support organic growth for each component, as well as growth of the whole platform’s capabilities.

Background and Rationale

Any tool that people use to accomplish a task can be instrumented; once instrumented, those tools can be used to report how they were used to perform that task. Software tools are ubiquitous interfaces for people to interact with data and other technology that can be instrumented for such a purpose. Tools are different than web pages or simple displays, however; they are not simply archives for information. Rather, they are ways of interfacing with and manipulating data and other technology. There are numerous consumer solutions for understanding how people move through web pages and displays (e.g., Google Analytics, Adobe Omniture). There are far fewer options for understanding how software tools are used. This requires understanding how users integrate a tool’s functionality into usage strategies to perform tasks, how users sequence the functionality provided them, and deeper knowledge of how users understand the features of software as a cohesive tool. The Software as a Sensor™ Project is designed to address this gap, providing the public an agile, cost-efficient solution for improving software tool design, implementation, and usability.

Software as a Sensor™ Project Overview

Figure 1. User ALE Elastic Back End Schema, with Transfer Protocols.

Funded through the DARPA XDATA program and other sources, the Software as a Sensor™ Project provides an open source (ALv2.0) solution for instrumenting software tools developed for the web so that when users interact with it, their behavior is captured. User behavior, or user activities, are captured and time-stamped through a simple application program interface (API) called User Analytic Logging Engine (User ALE). User ALE’s key differentiator is the schema that it uses to collect information about user activities; it provides sufficient context to understand activities within the software tool’s overall functionality. User ALE captures each user initiated action, or event (e.g., hover, click, etc.), as a nested action within a specific element (e.g., map object, drop down item, etc.), which are in turn nested within element groups (e.g., map, drop down list) (see Figure 1). This information schema provides sufficient context to understand and disambiguate user events from one another. In turn, this enables myriad analysis possibilities at different levels of tool design and more utility to end-user than commercial services currently offer. Once instrumented with User ALE, software tools become human signal sensors in their own right. Most importantly, the data that User ALE collects is owned outright by adopters and can be made available to other processes through scalable Elastic infrastructure and easy-to-manage Restful APIs. Distill is the analytic framework of the Software as a Sensor™ Project, providing (at release) segmentation and graph analysis metrics describing users’ interactions with the application to adopters. The segmentation features allow adopters to focus their analyses of user activity data based on desired data attributes (e.g., certain interactions, elements, etc.), as well as attributes describing the software tool users, if that data was also collected. Distill’s usage and usability metrics are derived from a representation of users’ sequential interactions with the application as a directed graph. This provides an extensible framework for providing insight as to how users integrate the functional components of the application to accomplish tasks.

Figure 2. Software as a Sensor™ System Architecture with all components.

The Test Application Portal (TAP) provides a single point of interface for adopters of the Software as a Sensor™ project. Through the Portal, adopters can register their applications, providing version data and permissions to others for accessing data. The Portal ensures that all components of the Software as a Sensor™ Project have the same information. The Portal also hosts a number of python D3 visualization libraries, providing adopters with a customizable “dashboard” with which to analyze and view user activity data, calling analytic processes from Distill. Finally, the Subject Tracking and Online User Testing (STOUT) application, provides support for HCI/UX researchers that want to collect data from users in systematic ways or within experimental designs. STOUT supports user registration, anonymization, user tracking, tasking (see Figure 3), and data integration from a variety of services. STOUT allows adopters to perform human subject review board compliant research studies, and both between- and within-subjects designs. Adopters can add tasks, surveys and questionnaires through 3rd party services (e.g., SurveyMonkey). STOUT tracks users’ progress by passing a unique user IDs to other services, allowing researchers to trace progress by passing a unique user IDs to other services, allowing researchers to trace form data and User ALE logs to specific users and task sets (see Figure 4).

Figure 3. STOUT assigns participants subjects to experimental conditions and ensures the correct task sequence. STOUT’s Django back end provides data on task completion, this can be used to drive other automation, including unlocking different task sequences and/or achievements.

Figure 4. STOUT User Tracking. Anonymized User IDs (hashes) are concatenated with unique Task IDs. This “Session ID” is appended to URLs (see Highlighted region), custom variable fields, and User ALE, to provide and integrated user testing data collection service.

STOUT also provides for data polling from third party services (e.g., SurveyMonkey) and integration with python or R scripts for statistical processing of data collected through STOUT. D3 visualization libraries embedded in STOUT allow adopters to view distributions of quantitative data collected from form data (see Figure 5).

Figure 5. STOUT Visualization. STOUT gives experimenters direct and continuous access to automatically processed research data.

Insights from User Activity Logs

The Software as a Sensor™ Project provides data collection and analytic services for user activities collected during interaction with software tools. However, the Software as a Sensor™ Project emerged from years of research focused on the development of novel, reliable methods for measuring individuals’ cognitive state in a variety of contexts. Traditional approaches to assessment in a laboratory setting include surveys, questionnaires, and physiology (Poore et al., 2016). Research performed as part of the Software as a Sensor™ project has shown that the same kind of insights derived from these standard measurement approaches can also be derived from users’ behavior. Additionally, we have explored insights that can only be gained by analyzing raw behavior collected through software interactions (Mariano et al., 2015). The signal processing and algorithmic approaches resulting from this research have been integrated into the Distill analytics stack. This means that adopters will not be left to discern for themselves how to draw insights from the data they gather about their software tools, although they will have the freedom to explore their own methods as well. Insights from user activities provided by Distill’s analytics framework fall under two categories, broadly classified as functional workflow and usage statistics:
Functional workflow insights tell adopters how user activities are connected, providing them with representations of how users integrate the application’s features together in time. These insights are informative for understanding the step-by-step process by which users interact with certain facets of a tool. For example, questions like “how are my users, constructing plots?” are addressable through workflow analysis. Workflows provide granular understanding of process level mechanics and can be modeled probabilistically through a directed graph representation of the data, and by identification of meaningful sub-sequences of user activities actually observed in the population. Metrics derived provide insight about the structure and temporal features of these mechanics, and can help highlight efficiency problems within workflows. For example, workflow analysis could help identify recursive, repetitive behaviors, and might be used to define what “floundering” looks like for that particular tool. Functional workflow analysis can also support analyses with more breadth. Questions like, “how are my users integrating my tools’ features into a cohesive whole? Are they relying on the tool as a whole or just using very specific parts of it?” Adopters will be able to explore how users think about software as cohesive tools and examine if users are relying on certain features as central navigation or analytic features. This allows for insights into whether tools are designed well enough for users to understand that they need to rely on multiple features together. Through segmentation, adopters can select the subset of the data software element, action, user demographics, geographic location, etc. they want to analyze. This will allow them to compare, for example, specific user populations against one another in terms of how they integrate software functionality. Importantly, the graph-based analytics approach provides a flexible representation of the time series data that can capture and quantify canonical usage patterns, enabling direct comparisons between users based on attributes of interest. Other modeling approaches have been utilized to explore similar insights and may be integrated at a later date (Mariano, et al., 2015). Usage statistics derive metrics from simple frequentist approaches to understanding, coarsely, how much users are actually using applications. This is different from simple “traffic” metrics, however, which assess how many users are navigating to a page or tool. Rather usage data provides insight on how much raw effort (e.g., number of activities) is being expended while users are interacting with the application. This provides deeper insight into discriminating “visitors” from “users” of software tools. Moreover, given the information schema User ALE provides, adopters will be able to delve into usage metrics related to specific facets of their application. Given these insights, different sets of adopters—software developers, HCI/UX researchers, and project managers—may utilize The Software as a Sensor™ Project for a variety different use cases, which may include:

Differentiators

The Software as a Sensor™ Project is ultimately designed to address the wide gaps between current best practices in software user testing and trends toward agile software development practices. Like much of the applied psychological sciences, user testing methods generally borrow heavily from basic research methods. These methods are designed to make data collection systematic and remove extraneous influences on test conditions. However, this usually means removing what we test from dynamic, noisy—real-life—environments. The Software as a Sensor™ Project is designed to allow for the same kind of systematic data collection that we expect in the laboratory, but in real-life software environments, by making software environments data collection platforms. In doing so, we aim to not only collect data from more realistic environments, and use-cases, but also to integrate the test enterprise into agile software development process. Our vision for The Software as a Sensor™ Project is that it provides software developers, HCI/UX researchers, and project managers a mechanism for continuous, iterative usability testing for software tools in a way that supports the flow (and schedule) of modern software development practices—Iterative, Waterfall, Spiral, and Agile. This is enabled by a few discriminating facets:

Figure 6. Version to Version Testing for Agile, Iterative Software Development Methods. The Software as a Sensor™ Project enables new methods for collecting large amounts of data on software tools, deriving insights rapidly to inject into subsequent iterations

Current Status

All components of the Software as a Sensor™ Project were originally designed and developed by Draper as part of DARPA’s XDATA project, although User ALE is being used on other funded R&D projects, including DARPA RSPACE, AFRL project, and Draper internally funded projects. Currently, only User ALE is publically available, however, the Portal, Distill, and STOUT will be publically available in the May/June 2016 time-frame. The last major release of User ALE was May, 2015. All components are currently maintained in separate repositories through GitHub (github.com/draperlaboratory). Currently, only software tools developed with Javascript are supported. However, we are currently working on pythonQT implementations for User ALE that will support many desktop applications.

Meritocracy

The current developers are familiar with meritocratic open source development at Apache. Apache was chosen specifically because we want to encourage this style of development for the project.

Community

The Software as a Sensor™ Project is new and our community is not yet established. However, community building and publicity is a major thrust. Our technology is generating interest within industry, particularly in the HCI/UX community, both Aptima and Charles River Analytics, for example are interested in being adopters. We have also begun publicizing the project to software development companies and universities, recently hosting a public focus group for Boston, MA area companies. We are also developing communities of interested within the DoD and Intelligence community. The NGA Xperience Lab has expressed interest in becoming a transition partner as has the Navy’s HCIL group. We are also aggressively pursuing adopters at AFRL’s Human Performance Wing, Analyst Test Bed. During incubation, we will explicitly seek to increase our adoption, including academic research, industry, and other end users interested in usability research.

Core Developers

The current set of core developers is relatively small, but includes Draper full-time staff. Community management will very likely be distributed across a few full-time staff that have been with the project for at least 2 years. Core personnel can be found on our website: http://www.draper.com/softwareasasensor

Alignment

The Software as a Sensor™ Project is currently Copyright (c) 2015, 2016 The Charles Stark Draper Laboratory, Inc. All rights reserved and licensed under Apache v2.0.

Known Risks

Orphaned products

There are currently no orphaned products. Each component of The Software as a Sensor™ Project has roughly 1-2 dedicated staff, and there is substantial collaboration between projects.

Inexperience with Open Source

Draper has a number of open source software projects available through www.github.com/draperlaboratory.

Relationships with Other Apache Products

Software as a Sensor™ Project does not currently have any dependences on Apache Products. We are also interested in coordinating with other projects including Usergrid, and others involving data processing at large scales, time-series analysis and ETL processes.

Developers

The Software as a Sensor™ Project is primarily funded through contract work. There are currently no “dedicated” developers, however, the same core team does work will continue work on the project across different contracts that support different features. We do intend to maintain a core set of key personnel engaged in community development and maintenance—in the future this may mean dedicated developers funded internally to support the project, however, the project is tied to business development strategy to maintain funding into various facets of the project.

Documentation

Documentation is available through Github; each repository under the Software as a Sensor™ Project has documentation available through wiki’s attached to the repositories.

Initial Source

Current source resides at Github:

External Dependencies

Each component of the Software as a Sensor™ Project has its own dependencies. Documentation will be available for integrating them.

User ALE

STOUT

Distill

Portal

GNU GPL 2

LGPL 2.1

Apache 2.0

GNU GPL

Required Resources

Initial Committers

The following is a list of the planned initial Apache committers (the active subset of the committers for the current repository on Github).

Affiliations

Sponsors

Champion

Nominated Mentors

Sponsoring Entity

The Apache Incubator

References

Mariano, L. J., Poore, J. C., Krum, D. M., Schwartz, J. L., Coskren, W. D., & Jones, E. M. (2015). Modeling Strategic Use of Human Computer Interfaces with Novel Hidden Markov Models. \[Methods\]. Frontiers in Psychology, 6. doi: 10.3389/fpsyg.2015.00919 Poore, J., Webb, A., Cunha, M., Mariano, L., Chapell, D., Coskren, M., & Schwartz, J. (2016). Operationalizing Engagement with Multimedia as User Coherence with Context. IEEE Transactions on Affective Computing, PP(99), 1-1. doi: 10.1109/taffc.2015.2512867