Research Data Services

Proof of Concept

A real Sunnybrook project

Case Study — Sunnybrook Health Sciences Centre, 2025

Neurosurgery MRI Study:
5 months compressed to 3 weeks

500

patients in the study cohort

~2,000

MRI intake forms to process

5 mo.

original timeline estimate

3 wks

actual time with systematic processing

A Sunnybrook neurosurgery research team needed to prepare data for a longitudinal MRI study comparing imaging results across a 500-patient cohort, each with 3 to 5 MRI sessions. The workflow required extracting patient identifiers and unique codes from approximately 2,000 paper intake forms, matching each form to the corresponding data folder in the research directory, organizing all files into a structured longitudinal format, and producing a master index spreadsheet linking patient identity, MRI sequence, and file location.

The original estimate using manual methods with limited student resources was 5 months. Using automated OCR for handwritten code extraction, file system scripting, and systematic validation, the same work was completed in under 3 weeks — with full processing logs, a validated master index, and zero manual errors in the final dataset.

The research team did not change their science. They changed who was handling the data infrastructure.

Services

What I do

I am a data systems specialist, not a domain scientist. I do not need to understand your research to organize your data correctly. I need to understand your data structure, your naming conventions, and your analysis requirements — which I learn in a 20-minute conversation. The scientific interpretation stays with you. The data infrastructure comes from me.

Intake Form Processing

Extract identifiers from scanned paper forms using automated OCR. Match records to data directories. Handle handwritten codes, mixed formats, and inconsistent naming.

Directory Organization

Transform ad hoc file structures into clean, consistent, navigable datasets organized by subject, timepoint, and study arm. Rename, reorganize, and validate at scale.

Research Index Construction

Produce a master spreadsheet linking subject identity, data location, timepoints, and metadata. The index your analysis tools need to run, built correctly the first time.

Data Validation

Flag missing records, inconsistent naming, duplicate entries, and structural anomalies before they contaminate your analysis. Every processing run produces a validation report.

Longitudinal Structure

Group multi-session data correctly by subject and chronological order. Build the per-subject timeline your longitudinal analysis requires without manual sorting.

Processing Automation

Build reusable scripts for recurring data processing tasks. Run once manually, then automatically for every subsequent data collection cycle.

Data Types

What data I work with

The organizational and structural layer of research data processing is separable from domain-specific scientific interpretation. I handle the infrastructure layer across all common research data types.

Medical Imaging

DICOM, MRI series, scan archives, intake form extraction, patient-to-image matching

Genomic & Molecular

FASTQ, BAM, VCF file organization, sample tracking, experimental condition mapping, metadata linking

Clinical Trial Data

REDCap exports, eCRF records, patient cohort organization, longitudinal visit structures

Preclinical / Mouse

Experimental cohort tracking, phenotype measurement aggregation, specimen records, treatment-to-outcome linking

Flow Cytometry

FCS file organization, sample metadata linking, batch processing and validation

Other Structured Data

Any research dataset where the problem is organization, indexing, extraction, or validation rather than scientific interpretation

Process

How an engagement works

Every engagement follows the same structured process. You stay in control at every step. Nothing moves forward without your sign-off.

You

Book a discovery call

Use the Calendly link below to book a free 20-minute call. No preparation required.

Together

Discovery call — scope and estimate

We discuss your data structure, volume, and analysis requirements. I provide a time estimate and a fixed project price within 24 hours.

You

Receive secure upload link and sign contract

You receive a secure link to upload your data files. You sign a standard Data Processing Agreement covering confidentiality, data handling, and destruction on completion. Processing begins only after both are done.

Processing begins

I process your data according to the agreed specification. A processing log records every transformation made.

You — optional

Request a mid-processing check

At any point you can request a pause to review the work in progress. Processing stops, you examine the current output, and confirm it matches your expectations before I continue.

Together — if check requested

Review and sign off on interim output

You examine the partial dataset. If adjustments are needed we discuss and I correct before resuming. Your sign-off releases the pause.

Processing completes

The full dataset is processed, validated, and moved to the output directory. A complete processing log and validation report are included.

You

Final review and sign-off

You examine the complete output. Confirm the structure, the index, and the validation report meet your requirements.

You

Download your data

You download the complete processed dataset from the secure output directory.

Data deleted — destruction certificate issued

All source and processed data is permanently deleted from my systems. I issue a signed data destruction certificate confirming deletion date and method.

You

Invoice payment

Invoice is issued on completion and payable within 30 days. First-time engagements are invoiced at 50% of the quoted project price.

About

Why a data systems specialist

I am not a bioinformatician or a domain scientist. I am a data systems specialist with 30 years of experience building data processing infrastructure for large financial institutions — Morgan Stanley, Citibank, CIBC, RBC, Canada Life.

Every research data problem has the same underlying structure when you strip away the domain specifics: source data in some format, a target structure the researcher needs for analysis, and a transformation required to get from one to the other. That transformation — parsing, matching, renaming, organizing, validating, indexing — is what I do. The domain knowledge required to do it correctly is shallow and learnable. The systems thinking required to do it reliably at scale is what I bring.

Researchers understand their science. I understand their data. Together we eliminate the months between data collection and analysis.

FAQ

Frequently asked questions

Any file type. DICOM, FASTQ, BAM, VCF, FCS, CSV, Excel, PDF, JPEG, PNG, ZIP, RAR, and any other format your research instruments or systems produce. If your data exists as a file, I can work with it. The organizational and structural processing I perform is format-agnostic — I work with your data at the file system level, not the content level.

Up to 10 GB uncompressed per project. If your dataset exceeds this, contact me before booking — larger datasets are handled on a case-by-case basis with adjusted pricing and transfer arrangements.

Output files are returned in the same format as the input — I do not convert file formats unless that is part of the agreed scope. What changes is the organization, naming, and structure. The master index is delivered as a CSV or Excel spreadsheet, whichever you prefer. The processing log is delivered as a plain text file.

Typically 1 to 5 business days from the date the signed contract and data upload are both confirmed. The exact timeline is provided in the project quote after the discovery call, based on your specific data volume and complexity. Urgent timelines can sometimes be accommodated — ask during the discovery call.

Payment is required before processing begins. For first-time engagements the fee is 50% of the quoted project price, payable upfront. For subsequent engagements the full quoted price is payable upfront.

Accepted payment methods: e-transfer and cheque. Credit card payments will be available in the near future.

An invoice is issued on project completion. Net 30 payment terms apply to repeat clients with an established relationship.

Every engagement is covered by a signed Data Processing Agreement before any data is uploaded. Your data is stored in an encrypted, private AWS S3 bucket accessible only to you and to me during active processing. No data is shared with any third party under any circumstances. On completion and your confirmation of download, all source and processed data is permanently deleted and a signed destruction certificate is issued confirming the deletion date and method.

For de-identified research data — where patient names and direct identifiers have been removed before transfer — institutional approval is typically not required. For data that includes personal health information, a Data Processing Agreement between PC4IT and your institution is required under PHIPA. Most research institutions have standard templates for this — your research administration office can facilitate it quickly. If you are unsure whether your data requires institutional approval, raise it during the discovery call and I can help you determine the right path.

Yes. At any point during processing you can request a pause to review the work in progress. Processing stops, you examine the current output in your workspace, confirm it matches your expectations, and sign off before I continue. This check step is built into the standard workflow and costs nothing extra. Most clients request one mid-processing check on their first engagement and none on subsequent ones.

I do not need to. I am a data systems specialist, not a domain scientist. The organizational and structural processing I perform — extracting identifiers, matching records, organizing files, building indexes, validating completeness — does not require understanding your science. It requires understanding your data structure and your analysis requirements, which I learn in the discovery call. The scientific interpretation stays with you entirely.

Every research dataset is different. The discovery call exists precisely to understand your specific situation before any commitment is made. If your data requires a non-standard approach I will tell you so during the call, along with what a custom solution would look like and what it would cost. There is no obligation to proceed after the discovery call.

Your source data is never modified — I work on a copy, not the original. Your original files remain intact in the upload folder throughout the engagement. If a processing error occurs I correct it at no additional cost before delivering the output. The processing log documents every transformation made, so any issue is fully traceable and correctable.

What if your data was ready
in days instead of months?

A real Sunnybrook project

Neurosurgery MRI Study:
5 months compressed to 3 weeks

What I do

Intake Form Processing

Directory Organization

Research Index Construction

Data Validation

Longitudinal Structure

Processing Automation

What data I work with

Medical Imaging

Genomic & Molecular

Clinical Trial Data

Preclinical / Mouse

Flow Cytometry

Other Structured Data

How an engagement works

Book a discovery call

Discovery call — scope and estimate

Receive secure upload link and sign contract

Processing begins

Request a mid-processing check

Review and sign off on interim output

Processing completes

Final review and sign-off

Download your data

Data deleted — destruction certificate issued

Invoice payment

Transparent, fixed-price engagements

How pricing works

Why a data systems specialist

Frequently asked questions

Book a free discovery call

What if your data was readyin days instead of months?

A real Sunnybrook project

Neurosurgery MRI Study:5 months compressed to 3 weeks

What I do

Intake Form Processing

Directory Organization

Research Index Construction

Data Validation

Longitudinal Structure

Processing Automation

What data I work with

Medical Imaging

Genomic & Molecular

Clinical Trial Data

Preclinical / Mouse

Flow Cytometry

Other Structured Data

How an engagement works

Book a discovery call

Discovery call — scope and estimate

Receive secure upload link and sign contract

Processing begins

Request a mid-processing check

Review and sign off on interim output

Processing completes

Final review and sign-off

Download your data

Data deleted — destruction certificate issued

Invoice payment

Transparent, fixed-price engagements

How pricing works

Why a data systems specialist

Frequently asked questions

Book a free discovery call

What if your data was ready
in days instead of months?

Neurosurgery MRI Study:
5 months compressed to 3 weeks