Abstract
We describe a cross-tool operational dataset assembled from 5 consented partner companies, spanning 10 tool categories and 38 tools. The raw inventory includes 38.4M chat messages, 11.2M emails, 3.6M stored files, 312K Jira tickets, 248,300 Linear tickets, 2,140 repositories, 18.6M Hubspot database queries, 2.6M Salesforce contacts, and 81K service and workflow tickets. These records are normalized into a single activity graph and entity-resolved across user, team, org, document, and task.
This is a data-availability report. Its purpose is to document what the dataset contains, how it is structured, and how it is governed. It is the substrate we intend to use for grounded training data and workflow evaluations for agents that operate inside real companies. We are explicit up front: this paper reports no signal, model, or predictive result. None exist yet, and we do not claim any.
Motivation
Agents that are meant to work inside real organizations are mostly trained and tested against synthetic benchmarks: hand-written task suites, scripted tool environments, and toy company simulations. These are useful, but they do not capture how real teams actually behave. Real organizations are messy and cross-tool. A single decision touches a chat thread, an email, a document, a ticket, and a CRM record, authored by overlapping people across overlapping teams over time.
That cross-tool, multi-actor texture is exactly what synthetic benchmarks lack and what an agent must understand to be useful in production. The motivation for this dataset is to make that real organizational behavior observable in a structured, privacy-safe form, so that training and evaluation can be grounded in how work actually happens rather than in how we imagine it happens. This follows the model of Mercor-style enterprise data partnerships: consented, bounded, and built for downstream agent work.
The Data
The dataset is a one-time snapshot drawn across 10 tool categories from 5 partner companies. Each category contributes a primary record type. The aggregate inventory across all partners is summarized below.
| Tool Category | Primary Record | Count |
|---|---|---|
| Chat (Slack / Teams) | Messages | 38.4M |
| Emails | 11.2M | |
| Files (Google Drive / OneDrive) | Files | 3.6M |
| Project tracking (Jira) | Tickets | 312K |
| Project tracking (Linear) | Tickets | 248.3K |
| Source control | Repositories | 2,140 |
| CRM database (Hubspot) | Database queries | 18.6M |
| CRM (Salesforce) | Contacts | 2.6M |
| Service / workflow | Tickets | 81K |
Distinct organizations, each consented, kept isolated as separate org boundaries.
Chat, email, file storage, project tracking, source control, CRM, and service workflows.
Individual products spanning the 10 categories across the 5 partner organizations.
The inventory reflects raw record counts at the time of snapshot. Source control contributes repository metadata only; no code bodies are included. Counts are reported as collected and are not deduplicated across partners, since each partner is held as a separate org boundary.
Normalization and Entity Resolution
Records arrive in heterogeneous, tool-specific shapes. We normalize them into a single activity graph where the nodes are entities and the edges are activities between them. The graph is entity-resolved across five entity types, so that the same person, team, document, or task is represented as one node regardless of how many tools reference it.
| Entity | Resolution Scope |
|---|---|
| User | A single human actor, resolved across every tool they appear in. |
| Team | A working group, resolved across project, chat, and CRM tools. |
| Org | The partner company boundary, kept isolated per partner. |
| Document | A file or artifact, resolved across storage and reference. |
| Task | A unit of work, resolved across tracking and service tools. |
Resolution is performed within each partner org boundary; entities are never merged across partners. The result is a unified representation in which a chat message, an email, a ticket, and a CRM record that all concern the same task can be connected through shared user, team, and document nodes. This is the structural contribution of the dataset: not the individual records, which already exist inside each tool, but the resolved cross-tool linkage between them.
Privacy and Provenance
The dataset is governed by an explicit privacy and provenance model. Every partner company consented to inclusion. Collection is a one-time snapshot rather than an ongoing feed. Personally identifiable information is stripped at the source, before records leave the partner environment, so that PII is not present in the normalized graph.
All partner companies provided explicit consent for inclusion in the dataset.
A one-time point-in-time capture, not a continuous or live data feed.
Repository metadata only. No code bodies are collected or stored.
For source control, only repository metadata is retained; code bodies are excluded entirely. Provenance is preserved per partner: each org boundary is kept isolated, so the lineage of every node and edge back to its consenting partner is unambiguous. This model is a precondition of the data partnership, not an afterthought.
Use for Training and Workflow Evals
The intended use of the activity graph is twofold. First, as grounded training data for agents that operate inside real companies: the resolved cross-tool structure lets a model learn from how real organizations route work across chat, email, documents, tickets, and CRM, rather than from synthetic approximations of it. Second, as a substrate for workflow evaluations: realistic, company-grounded tasks that test whether an agent can navigate the same cross-tool reality that the data captures.
Both uses follow the Mercor-style enterprise data partnership model: consented data from real organizations, structured for downstream agent training and evaluation. We are describing the design intent of the dataset here. We are not, in this paper, reporting any trained model, any eval score, or any measured outcome from these uses.
Limitations and Conclusion
The central limitation is the scope of this paper itself. This is a data-availability report and reports no signal, model, or predictive result. We have not trained agents on the graph, have not run workflow evals against it, and make no claim about its predictive or downstream value. Any such claim would require separate work and separate evidence, which we do not present here.
Further limitations follow from the data model. The dataset is a one-time snapshot, so it captures a single point in time rather than longitudinal behavior. It spans 5 partner companies, which is a meaningful but bounded sample of organizational diversity. PII stripping at source, while necessary, removes information that some tasks might otherwise use. And source control is metadata only, so code-level reasoning is out of scope.
In conclusion, we document a normalized, entity-resolved, cross-tool activity graph drawn from 5 consented partner companies across 38 tools and 10 categories, governed by a strict consent, snapshot, and PII-stripping model. We present it as available infrastructure for grounded agent training and workflow evaluation. We make no performance claim of any kind, and we will hold ourselves to reporting results only when results actually exist.