CROSS-TOOL COLLABORATION DATA

Workplace Activity Graph

Name: Workplace Activity Graph
Creator: Gerra

Cross-tool operational data from collaboration systems. Entity-resolved into a unified schema for AI training and workflow evals.

40+ EVENT TYPESJSON · ParquetReal-time streaming or daily batch

Tools normalized

38.4M

Chat messages

11.2M

Emails

Resolved entity types

Download Package Contents Build process Verification Audit trail Use cases Delivery Further reading

01Download

Inspect a real sample

Representative records in the delivery format, ready to inspect before licensing the full dataset.

Chat mention that pings an owner about a task

Representative shape, not real customer data. actor and target are resolved user nodes; task_id links the chat event to the same work tracked elsewhere.

activity_graph.jsonlrepresentative

{
  "event_id": "cw-7f3a9c21",
  "org_id": "org_03",
  "tool_category": "chat",
  "tool": "slack",
  "action": "message.mention",
  "actor_user_id": "u_1182",
  "team_id": "t_44",
  "timestamp": "2026-03-11T14:22:08Z",
  "thread_id": "th_9920",
  "task_id": "k_3019",
  "target_user_id": ["u_2210"],
  "parent_event_id": "cw-7f3a9c10",
  "pii_status": "stripped_at_source"
}

Document comment resolved to the same task

Representative. Same task_id as the chat event above - cross-tool linkage through shared task and user nodes is the structural contribution.

activity_graph.jsonlrepresentative

{
  "event_id": "cw-7f3aa118",
  "org_id": "org_03",
  "tool_category": "files",
  "tool": "google_drive",
  "action": "document.comment_added",
  "actor_user_id": "u_2210",
  "team_id": "t_44",
  "timestamp": "2026-03-11T15:01:44Z",
  "document_id": "d_5567",
  "task_id": "k_3019",
  "target_user_id": ["u_1182"],
  "pii_status": "stripped_at_source"
}

Task status transition in a tracker

Representative. Closes the cross-tool chain: the work discussed in chat and a doc moves state in the project tool.

activity_graph.jsonlrepresentative

{
  "event_id": "cw-7f3ab004",
  "org_id": "org_03",
  "tool_category": "project_tracking",
  "tool": "jira",
  "action": "task.status_changed",
  "actor_user_id": "u_1182",
  "team_id": "t_44",
  "timestamp": "2026-03-11T16:48:12Z",
  "task_id": "k_3019",
  "pii_status": "stripped_at_source"
}

02Schema

Record shape

Every field, its type, whether it can be null, and a representative value.

Field	Type	Constraint	Description
event_id	string	required	Stable identifier for one normalized cross-tool activity event. e.g. cw-7f3a9c21
org_id	string	required	Partner org boundary. Entities and events are never merged across orgs. e.g. org_03
tool_category	string	required	One of 10 normalized categories (chat, email, files, project-tracking, source-control, crm, ...). e.g. chat
tool	string	required	Source product the event came from, within its category. e.g. slack
action	string	required	Normalized action verb in the unified event vocabulary (40+ types). e.g. message.mention
actor_user_id	string	required	Resolved user node - the same human across every tool they appear in. e.g. u_1182
team_id	string	nullable	Resolved team node, when the event is attributable to a working group. e.g. t_44
timestamp	string · ISO-8601	required	Event time, normalized to UTC for replay-safe point-in-time evals. e.g. 2026-03-11T14:22:08Z
thread_id	string	nullable	Conversation or thread node for chat and mail events. e.g. th_9920
document_id	string	nullable	Resolved document node when a file is created, edited, commented, or referenced. e.g. d_5567
task_id	string	nullable	Resolved task node linking the event to a unit of work across tracking and service tools. e.g. k_3019
target_user_id	string[]	nullable	Resolved recipients and mentions, as user nodes. e.g. ["u_2210"]
parent_event_id	string	nullable	Edge to a preceding event (reply-to, edit-of, status-change-from) for activity chains. e.g. cw-7f3a9c10
pii_status	string	required	PII handling marker. PII is stripped at source before records leave the partner environment. e.g. stripped_at_source

03What's included

Communication Fabric

Messages, threads, mentions, and meeting activity across chat and mail. Normalized author, thread, and channel identifiers.

Document Graph

Document creation, edits, comments, and share events across collaboration storage. Link-level traversal with access context.

Task & Project Graph

Task assignments, status transitions, and project movement across project-management tools. Timestamped for replay-safe evals.

04Methodology

How it is built

01
Consented collection
A point-in-time snapshot drawn from consented partner companies. Each partner provides explicit consent; this is a documented snapshot, not a continuous feed.
02
Normalization into an activity graph
Heterogeneous, tool-specific records are mapped into a single graph where nodes are entities and edges are activities, using one normalized action vocabulary across 10 categories and 38 tools.
03
Entity resolution
The same person, team, document, or task collapses to one node regardless of how many tools reference it, across five entity types. Resolution runs strictly within each org boundary; entities are never merged across partners.
04
PII stripping at source
Personally identifiable information is removed before records leave the partner environment, so PII is not present in the normalized graph.
05
Per-partner provenance and isolation
Each org boundary is held isolated so the lineage of every node and edge back to its consenting partner is unambiguous; counts are reported as collected, not deduplicated across partners.
06
Point-in-time replay
Events are timestamped to a common UTC clock so a slice of the graph can be reconstructed as of a moment in time for replay-safe evaluation.

05Evals

How we validate

What each evaluation measures and how it is run. Where no benchmark is published, we show the methodology and say so.

Cross-Tool Workflow Replay

Measures

Whether an agent, given a team resolved activity graph up to a point in time, would route work the way the real team did next - who to ping, which doc to reference, how a task should move.

Method

Reconstruct the entity-resolved graph as of a cut time; pose the in-flight situation; withhold subsequent events; compare the agent next-action to the real recorded events that followed.

Result

Methodology-stage. The source data-availability report explicitly states no signal, model, or predictive result exists yet; none is claimed.

Entity-Resolution Integrity

Measures

Whether the same human, team, document, or task is correctly unified into one node across all tools it appears in, within an org boundary.

Method

Validate that nodes resolved across tools refer to one real entity and that no entity is merged across partner org boundaries.

Result

Methodology-stage. No agreement figure is published at this stage.

06Graders

Ground truth

What correct means for this data, and how it is established.

Ground truth

The real, recorded cross-tool activity the team actually produced next - the events that genuinely followed in the partner own systems - reconstructed point-in-time from the resolved graph.

How it is established

Replay-based comparison: cut the graph at a timestamp, withhold subsequent events, and compare an agent proposed next action to what the org actually did. Integrity is anchored to consent and per-partner provenance, and to entity-resolution correctness within each org boundary.

Agreement

Correctness is anchored to the production record itself rather than a separate human-rater pass. No inter-rater agreement figure is published at this stage.

07Application

Grounded training data for workflow agents

Real team behavior - who pings whom, which docs get referenced, how tasks move - as ground truth for agents that have to operate inside real companies.

Workflow evals with verifiable outcomes

Point-in-time snapshots let evals check whether an agent would have matched what a real team did, not what a synthetic benchmark assumes.

Unified retrieval surface for internal AI

A single normalized activity graph across mail, chat, docs, and tasks. Drop-in substrate for enterprise copilots and org-level search.

08Environment & integration

How you load it

Delivery

S3, REST API, Webhook, Restricted training/eval license / data room

Formats

JSONL event records, JSON, Parquet

Auth

Restricted access under a signed license; per-recipient export. PII is stripped at source; each partner org is held as an isolated boundary. Source identities are anonymous; full provenance under NDA.

Cadence

One-time point-in-time snapshot as documented. The delivery surface also supports streaming or daily/hourly batch where a live feed is licensed.

09Related research

Operational TelemetryRead →

Request access.

Restricted-scope evaluation access for qualified teams. We share real samples, full schema, and provenance under a mutual NDA.

Talk to us team@gerra.com

Workplace Activity Graph

Inspect a real sample

Record shape

Communication Fabric

Document Graph

Task & Project Graph

How it is built

Consented collection

Normalization into an activity graph

Entity resolution

PII stripping at source

Per-partner provenance and isolation

Point-in-time replay

How we validate

Cross-Tool Workflow Replay

Entity-Resolution Integrity

Ground truth

Grounded training data for workflow agents

Workflow evals with verifiable outcomes

Unified retrieval surface for internal AI

How you load it

Request access.

Catalog

Research

Company

Contact