CROSS-TOOL COLLABORATION DATA

Collaboration Workflow Intelligence

Cross-tool operational data from collaboration systems. Entity-resolved into a unified schema for AI training and workflow evals.

MULTI-TOOLJSON · ParquetReal-time streaming or daily batch
38
Tools normalized
38.4M
Chat messages
11.2M
Emails
5
Resolved entity types
01Sample

See the data

Representative records in the exact shape we deliver. Real provenance and full slices are shared under license.

Chat mention that pings an owner about a task

Representative shape, not real customer data. actor and target are resolved user nodes; task_id links the chat event to the same work tracked elsewhere.

activity_graph.jsonlrepresentative
{
  "event_id": "cw-7f3a9c21",
  "org_id": "org_03",
  "tool_category": "chat",
  "tool": "slack",
  "action": "message.mention",
  "actor_user_id": "u_1182",
  "team_id": "t_44",
  "timestamp": "2026-03-11T14:22:08Z",
  "thread_id": "th_9920",
  "task_id": "k_3019",
  "target_user_id": ["u_2210"],
  "parent_event_id": "cw-7f3a9c10",
  "pii_status": "stripped_at_source"
}

Document comment resolved to the same task

Representative. Same task_id as the chat event above - cross-tool linkage through shared task and user nodes is the structural contribution.

activity_graph.jsonlrepresentative
{
  "event_id": "cw-7f3aa118",
  "org_id": "org_03",
  "tool_category": "files",
  "tool": "google_drive",
  "action": "document.comment_added",
  "actor_user_id": "u_2210",
  "team_id": "t_44",
  "timestamp": "2026-03-11T15:01:44Z",
  "document_id": "d_5567",
  "task_id": "k_3019",
  "target_user_id": ["u_1182"],
  "pii_status": "stripped_at_source"
}

Task status transition in a tracker

Representative. Closes the cross-tool chain: the work discussed in chat and a doc moves state in the project tool.

activity_graph.jsonlrepresentative
{
  "event_id": "cw-7f3ab004",
  "org_id": "org_03",
  "tool_category": "project_tracking",
  "tool": "jira",
  "action": "task.status_changed",
  "actor_user_id": "u_1182",
  "team_id": "t_44",
  "timestamp": "2026-03-11T16:48:12Z",
  "task_id": "k_3019",
  "pii_status": "stripped_at_source"
}
02Schema

Record shape

Every field, its type, whether it can be null, and a representative value.

FieldTypeConstraintDescription
event_idstringrequiredStable identifier for one normalized cross-tool activity event.
e.g. cw-7f3a9c21
org_idstringrequiredPartner org boundary. Entities and events are never merged across orgs.
e.g. org_03
tool_categorystringrequiredOne of 10 normalized categories (chat, email, files, project-tracking, source-control, crm, ...).
e.g. chat
toolstringrequiredSource product the event came from, within its category.
e.g. slack
actionstringrequiredNormalized action verb in the unified event vocabulary (40+ types).
e.g. message.mention
actor_user_idstringrequiredResolved user node - the same human across every tool they appear in.
e.g. u_1182
team_idstringnullableResolved team node, when the event is attributable to a working group.
e.g. t_44
timestampstring · ISO-8601requiredEvent time, normalized to UTC for replay-safe point-in-time evals.
e.g. 2026-03-11T14:22:08Z
thread_idstringnullableConversation or thread node for chat and mail events.
e.g. th_9920
document_idstringnullableResolved document node when a file is created, edited, commented, or referenced.
e.g. d_5567
task_idstringnullableResolved task node linking the event to a unit of work across tracking and service tools.
e.g. k_3019
target_user_idstring[]nullableResolved recipients and mentions, as user nodes.
e.g. ["u_2210"]
parent_event_idstringnullableEdge to a preceding event (reply-to, edit-of, status-change-from) for activity chains.
e.g. cw-7f3a9c10
pii_statusstringrequiredPII handling marker. PII is stripped at source before records leave the partner environment.
e.g. stripped_at_source
03What's included

Communication Fabric

Messages, threads, mentions, and meeting activity across chat and mail. Normalized author, thread, and channel identifiers.

Document Graph

Document creation, edits, comments, and share events across collaboration storage. Link-level traversal with access context.

Task & Project Graph

Task assignments, status transitions, and project movement across project-management tools. Timestamped for replay-safe evals.

04Methodology

How it is built

  1. 01

    Consented collection

    A point-in-time snapshot drawn from consented partner companies. Each partner provides explicit consent; this is a documented snapshot, not a continuous feed.

  2. 02

    Normalization into an activity graph

    Heterogeneous, tool-specific records are mapped into a single graph where nodes are entities and edges are activities, using one normalized action vocabulary across 10 categories and 38 tools.

  3. 03

    Entity resolution

    The same person, team, document, or task collapses to one node regardless of how many tools reference it, across five entity types. Resolution runs strictly within each org boundary; entities are never merged across partners.

  4. 04

    PII stripping at source

    Personally identifiable information is removed before records leave the partner environment, so PII is not present in the normalized graph.

  5. 05

    Per-partner provenance and isolation

    Each org boundary is held isolated so the lineage of every node and edge back to its consenting partner is unambiguous; counts are reported as collected, not deduplicated across partners.

  6. 06

    Point-in-time replay

    Events are timestamped to a common UTC clock so a slice of the graph can be reconstructed as of a moment in time for replay-safe evaluation.

05Evals

How we validate

What each evaluation measures and how it is run. Where no benchmark is published, we show the methodology and say so.

Cross-Tool Workflow Replay

Measures

Whether an agent, given a team resolved activity graph up to a point in time, would route work the way the real team did next - who to ping, which doc to reference, how a task should move.

Method

Reconstruct the entity-resolved graph as of a cut time; pose the in-flight situation; withhold subsequent events; compare the agent next-action to the real recorded events that followed.

Result

Methodology-stage. The source data-availability report explicitly states no signal, model, or predictive result exists yet; none is claimed.

Entity-Resolution Integrity

Measures

Whether the same human, team, document, or task is correctly unified into one node across all tools it appears in, within an org boundary.

Method

Validate that nodes resolved across tools refer to one real entity and that no entity is merged across partner org boundaries.

Result

Methodology-stage. No agreement figure is published at this stage.

06Graders

Ground truth

What correct means for this data, and how it is established.

Ground truth

The real, recorded cross-tool activity the team actually produced next - the events that genuinely followed in the partner own systems - reconstructed point-in-time from the resolved graph.

How it is established

Replay-based comparison: cut the graph at a timestamp, withhold subsequent events, and compare an agent proposed next action to what the org actually did. Integrity is anchored to consent and per-partner provenance, and to entity-resolution correctness within each org boundary.

Agreement

Correctness is anchored to the production record itself rather than a separate human-rater pass. No inter-rater agreement figure is published at this stage.

07Application

Grounded training data for workflow agents

Real team behavior - who pings whom, which docs get referenced, how tasks move - as ground truth for agents that have to operate inside real companies.

Workflow evals with verifiable outcomes

Point-in-time snapshots let evals check whether an agent would have matched what a real team did, not what a synthetic benchmark assumes.

Unified retrieval surface for internal AI

A single normalized activity graph across mail, chat, docs, and tasks. Drop-in substrate for enterprise copilots and org-level search.

08Environment & integration

How you load it

Delivery

S3, REST API, Webhook, Restricted training/eval license / data room

Formats

JSONL event records, JSON, Parquet

Auth

Restricted access under a signed license; per-recipient export. PII is stripped at source; each partner org is held as an isolated boundary. Source identities are anonymous; full provenance under NDA.

Cadence

One-time point-in-time snapshot as documented. The delivery surface also supports streaming or daily/hourly batch where a live feed is licensed.

Request access.

Restricted-scope evaluation access for qualified teams. We share real samples, full schema, and provenance under a mutual NDA.