CROSS-TOOL COLLABORATION DATA
Collaboration Workflow Intelligence
Cross-tool operational data from collaboration systems. Entity-resolved into a unified schema for AI training and workflow evals.
See the data
Representative records in the exact shape we deliver. Real provenance and full slices are shared under license.
Chat mention that pings an owner about a task
Representative shape, not real customer data. actor and target are resolved user nodes; task_id links the chat event to the same work tracked elsewhere.
{
"event_id": "cw-7f3a9c21",
"org_id": "org_03",
"tool_category": "chat",
"tool": "slack",
"action": "message.mention",
"actor_user_id": "u_1182",
"team_id": "t_44",
"timestamp": "2026-03-11T14:22:08Z",
"thread_id": "th_9920",
"task_id": "k_3019",
"target_user_id": ["u_2210"],
"parent_event_id": "cw-7f3a9c10",
"pii_status": "stripped_at_source"
}Document comment resolved to the same task
Representative. Same task_id as the chat event above - cross-tool linkage through shared task and user nodes is the structural contribution.
{
"event_id": "cw-7f3aa118",
"org_id": "org_03",
"tool_category": "files",
"tool": "google_drive",
"action": "document.comment_added",
"actor_user_id": "u_2210",
"team_id": "t_44",
"timestamp": "2026-03-11T15:01:44Z",
"document_id": "d_5567",
"task_id": "k_3019",
"target_user_id": ["u_1182"],
"pii_status": "stripped_at_source"
}Task status transition in a tracker
Representative. Closes the cross-tool chain: the work discussed in chat and a doc moves state in the project tool.
{
"event_id": "cw-7f3ab004",
"org_id": "org_03",
"tool_category": "project_tracking",
"tool": "jira",
"action": "task.status_changed",
"actor_user_id": "u_1182",
"team_id": "t_44",
"timestamp": "2026-03-11T16:48:12Z",
"task_id": "k_3019",
"pii_status": "stripped_at_source"
}Record shape
Every field, its type, whether it can be null, and a representative value.
| Field | Type | Constraint | Description |
|---|---|---|---|
| event_id | string | required | Stable identifier for one normalized cross-tool activity event. e.g. cw-7f3a9c21 |
| org_id | string | required | Partner org boundary. Entities and events are never merged across orgs. e.g. org_03 |
| tool_category | string | required | One of 10 normalized categories (chat, email, files, project-tracking, source-control, crm, ...). e.g. chat |
| tool | string | required | Source product the event came from, within its category. e.g. slack |
| action | string | required | Normalized action verb in the unified event vocabulary (40+ types). e.g. message.mention |
| actor_user_id | string | required | Resolved user node - the same human across every tool they appear in. e.g. u_1182 |
| team_id | string | nullable | Resolved team node, when the event is attributable to a working group. e.g. t_44 |
| timestamp | string · ISO-8601 | required | Event time, normalized to UTC for replay-safe point-in-time evals. e.g. 2026-03-11T14:22:08Z |
| thread_id | string | nullable | Conversation or thread node for chat and mail events. e.g. th_9920 |
| document_id | string | nullable | Resolved document node when a file is created, edited, commented, or referenced. e.g. d_5567 |
| task_id | string | nullable | Resolved task node linking the event to a unit of work across tracking and service tools. e.g. k_3019 |
| target_user_id | string[] | nullable | Resolved recipients and mentions, as user nodes. e.g. ["u_2210"] |
| parent_event_id | string | nullable | Edge to a preceding event (reply-to, edit-of, status-change-from) for activity chains. e.g. cw-7f3a9c10 |
| pii_status | string | required | PII handling marker. PII is stripped at source before records leave the partner environment. e.g. stripped_at_source |
Communication Fabric
Messages, threads, mentions, and meeting activity across chat and mail. Normalized author, thread, and channel identifiers.
Document Graph
Document creation, edits, comments, and share events across collaboration storage. Link-level traversal with access context.
Task & Project Graph
Task assignments, status transitions, and project movement across project-management tools. Timestamped for replay-safe evals.
How it is built
- 01
Consented collection
A point-in-time snapshot drawn from consented partner companies. Each partner provides explicit consent; this is a documented snapshot, not a continuous feed.
- 02
Normalization into an activity graph
Heterogeneous, tool-specific records are mapped into a single graph where nodes are entities and edges are activities, using one normalized action vocabulary across 10 categories and 38 tools.
- 03
Entity resolution
The same person, team, document, or task collapses to one node regardless of how many tools reference it, across five entity types. Resolution runs strictly within each org boundary; entities are never merged across partners.
- 04
PII stripping at source
Personally identifiable information is removed before records leave the partner environment, so PII is not present in the normalized graph.
- 05
Per-partner provenance and isolation
Each org boundary is held isolated so the lineage of every node and edge back to its consenting partner is unambiguous; counts are reported as collected, not deduplicated across partners.
- 06
Point-in-time replay
Events are timestamped to a common UTC clock so a slice of the graph can be reconstructed as of a moment in time for replay-safe evaluation.
How we validate
What each evaluation measures and how it is run. Where no benchmark is published, we show the methodology and say so.
Cross-Tool Workflow Replay
Measures
Whether an agent, given a team resolved activity graph up to a point in time, would route work the way the real team did next - who to ping, which doc to reference, how a task should move.
Method
Reconstruct the entity-resolved graph as of a cut time; pose the in-flight situation; withhold subsequent events; compare the agent next-action to the real recorded events that followed.
Result
Methodology-stage. The source data-availability report explicitly states no signal, model, or predictive result exists yet; none is claimed.
Entity-Resolution Integrity
Measures
Whether the same human, team, document, or task is correctly unified into one node across all tools it appears in, within an org boundary.
Method
Validate that nodes resolved across tools refer to one real entity and that no entity is merged across partner org boundaries.
Result
Methodology-stage. No agreement figure is published at this stage.
Ground truth
What correct means for this data, and how it is established.
Ground truth
The real, recorded cross-tool activity the team actually produced next - the events that genuinely followed in the partner own systems - reconstructed point-in-time from the resolved graph.
How it is established
Replay-based comparison: cut the graph at a timestamp, withhold subsequent events, and compare an agent proposed next action to what the org actually did. Integrity is anchored to consent and per-partner provenance, and to entity-resolution correctness within each org boundary.
Agreement
Correctness is anchored to the production record itself rather than a separate human-rater pass. No inter-rater agreement figure is published at this stage.
Grounded training data for workflow agents
Real team behavior - who pings whom, which docs get referenced, how tasks move - as ground truth for agents that have to operate inside real companies.
Workflow evals with verifiable outcomes
Point-in-time snapshots let evals check whether an agent would have matched what a real team did, not what a synthetic benchmark assumes.
Unified retrieval surface for internal AI
A single normalized activity graph across mail, chat, docs, and tasks. Drop-in substrate for enterprise copilots and org-level search.
How you load it
Delivery
S3, REST API, Webhook, Restricted training/eval license / data room
Formats
JSONL event records, JSON, Parquet
Auth
Restricted access under a signed license; per-recipient export. PII is stripped at source; each partner org is held as an isolated boundary. Source identities are anonymous; full provenance under NDA.
Cadence
One-time point-in-time snapshot as documented. The delivery surface also supports streaming or daily/hourly batch where a live feed is licensed.
Request access.
Restricted-scope evaluation access for qualified teams. We share real samples, full schema, and provenance under a mutual NDA.