Research Data Services

What if your data was ready
in days instead of months?

Research data preparation — organizing files, extracting identifiers, building the index your analysis tools need — typically consumes months of researcher time before the actual science can begin. That timeline is not inevitable.

A real Sunnybrook project

Case Study — Sunnybrook Health Sciences Centre, 2025

Neurosurgery MRI Study:
5 months compressed to 3 weeks

500
patients in the study cohort
~2,000
MRI intake forms to process
5 mo.
original timeline estimate
3 wks
actual time with systematic processing

A Sunnybrook neurosurgery research team needed to prepare data for a longitudinal MRI study comparing imaging results across a 500-patient cohort, each with 3 to 5 MRI sessions. The workflow required extracting patient identifiers and unique codes from approximately 2,000 paper intake forms, matching each form to the corresponding data folder in the research directory, organizing all files into a structured longitudinal format, and producing a master index spreadsheet linking patient identity, MRI sequence, and file location.

The original estimate using manual methods with limited student resources was 5 months. Using automated OCR for handwritten code extraction, file system scripting, and systematic validation, the same work was completed in under 3 weeks — with full processing logs, a validated master index, and zero manual errors in the final dataset.

The research team did not change their science. They changed who was handling the data infrastructure.

What I do

I am a data systems specialist, not a domain scientist. I do not need to understand your research to organize your data correctly. I need to understand your data structure, your naming conventions, and your analysis requirements — which I learn in a 20-minute conversation. The scientific interpretation stays with you. The data infrastructure comes from me.

Intake Form Processing

Extract identifiers from scanned paper forms using automated OCR. Match records to data directories. Handle handwritten codes, mixed formats, and inconsistent naming.

Directory Organization

Transform ad hoc file structures into clean, consistent, navigable datasets organized by subject, timepoint, and study arm. Rename, reorganize, and validate at scale.

Research Index Construction

Produce a master spreadsheet linking subject identity, data location, timepoints, and metadata. The index your analysis tools need to run, built correctly the first time.

Data Validation

Flag missing records, inconsistent naming, duplicate entries, and structural anomalies before they contaminate your analysis. Every processing run produces a validation report.

Longitudinal Structure

Group multi-session data correctly by subject and chronological order. Build the per-subject timeline your longitudinal analysis requires without manual sorting.

Processing Automation

Build reusable scripts for recurring data processing tasks. Run once manually, then automatically for every subsequent data collection cycle.

What data I work with

The organizational and structural layer of research data processing is separable from domain-specific scientific interpretation. I handle the infrastructure layer across all common research data types.

Medical Imaging

DICOM, MRI series, scan archives, intake form extraction, patient-to-image matching

Genomic & Molecular

FASTQ, BAM, VCF file organization, sample tracking, experimental condition mapping, metadata linking

Clinical Trial Data

REDCap exports, eCRF records, patient cohort organization, longitudinal visit structures

Preclinical / Mouse

Experimental cohort tracking, phenotype measurement aggregation, specimen records, treatment-to-outcome linking

Flow Cytometry

FCS file organization, sample metadata linking, batch processing and validation

Other Structured Data

Any research dataset where the problem is organization, indexing, extraction, or validation rather than scientific interpretation

How an engagement works

Every engagement follows the same structured process. You stay in control at every step. Nothing moves forward without your sign-off.

01
You

Book a discovery call

Use the Calendly link below to book a free 20-minute call. No preparation required.

02
Together

Discovery call — scope and estimate

We discuss your data structure, volume, and analysis requirements. I provide a time estimate and a fixed project price within 24 hours.

03
You

Receive secure upload link and sign contract

You receive a secure link to upload your data files. You sign a standard Data Processing Agreement covering confidentiality, data handling, and destruction on completion. Processing begins only after both are done.

04
Me

Processing begins

I process your data according to the agreed specification. A processing log records every transformation made.

05
You — optional

Request a mid-processing check

At any point you can request a pause to review the work in progress. Processing stops, you examine the current output, and confirm it matches your expectations before I continue.

06
Together — if check requested

Review and sign off on interim output

You examine the partial dataset. If adjustments are needed we discuss and I correct before resuming. Your sign-off releases the pause.

07
Me

Processing completes

The full dataset is processed, validated, and moved to the output directory. A complete processing log and validation report are included.

08
You

Final review and sign-off

You examine the complete output. Confirm the structure, the index, and the validation report meet your requirements.

09
You

Download your data

You download the complete processed dataset from the secure output directory.

10
Me

Data deleted — destruction certificate issued

All source and processed data is permanently deleted from my systems. I issue a signed data destruction certificate confirming deletion date and method.

11
You

Invoice payment

Invoice is issued on completion and payable within 30 days. First-time engagements are invoiced at 50% of the quoted project price.

Transparent, fixed-price engagements

Every project is quoted as a fixed price after the discovery call. No hourly billing surprises. No scope creep without a revised quote. The price you are quoted is the price you pay.

How pricing works

Discovery call Free — 20 minutes
Project quote Within 24 hours of call
Billing model Fixed price per project
Retainer option Available for ongoing lab needs
Payment timing On completion — net 30
First Engagement Offer

50% of the quoted project price for your first engagement. No conditions. The full price applies to all subsequent projects. This offer exists because the best way to understand what systematic data processing produces is to experience it on your own data.

Why a data systems specialist

I am not a bioinformatician or a domain scientist. I am a data systems specialist with 30 years of experience building data processing infrastructure for large financial institutions — Morgan Stanley, Citibank, CIBC, RBC, Canada Life.

Every research data problem has the same underlying structure when you strip away the domain specifics: source data in some format, a target structure the researcher needs for analysis, and a transformation required to get from one to the other. That transformation — parsing, matching, renaming, organizing, validating, indexing — is what I do. The domain knowledge required to do it correctly is shallow and learnable. The systems thinking required to do it reliably at scale is what I bring.

Researchers understand their science. I understand their data. Together we eliminate the months between data collection and analysis.

Book a free discovery call

20 minutes. No preparation required. You describe your data situation — I tell you what systematic processing would look like and what it would cost.

Book on Calendly