Research data preparation — organizing files, extracting identifiers, building the index your analysis tools need — typically consumes months of researcher time before the actual science can begin. That timeline is not inevitable.
A Sunnybrook neurosurgery research team needed to prepare data for a longitudinal MRI study comparing imaging results across a 500-patient cohort, each with 3 to 5 MRI sessions. The workflow required extracting patient identifiers and unique codes from approximately 2,000 paper intake forms, matching each form to the corresponding data folder in the research directory, organizing all files into a structured longitudinal format, and producing a master index spreadsheet linking patient identity, MRI sequence, and file location.
The original estimate using manual methods with limited student resources was 5 months. Using automated OCR for handwritten code extraction, file system scripting, and systematic validation, the same work was completed in under 3 weeks — with full processing logs, a validated master index, and zero manual errors in the final dataset.
The research team did not change their science. They changed who was handling the data infrastructure.
I am a data systems specialist, not a domain scientist. I do not need to understand your research to organize your data correctly. I need to understand your data structure, your naming conventions, and your analysis requirements — which I learn in a 20-minute conversation. The scientific interpretation stays with you. The data infrastructure comes from me.
Extract identifiers from scanned paper forms using automated OCR. Match records to data directories. Handle handwritten codes, mixed formats, and inconsistent naming.
Transform ad hoc file structures into clean, consistent, navigable datasets organized by subject, timepoint, and study arm. Rename, reorganize, and validate at scale.
Produce a master spreadsheet linking subject identity, data location, timepoints, and metadata. The index your analysis tools need to run, built correctly the first time.
Flag missing records, inconsistent naming, duplicate entries, and structural anomalies before they contaminate your analysis. Every processing run produces a validation report.
Group multi-session data correctly by subject and chronological order. Build the per-subject timeline your longitudinal analysis requires without manual sorting.
Build reusable scripts for recurring data processing tasks. Run once manually, then automatically for every subsequent data collection cycle.
The organizational and structural layer of research data processing is separable from domain-specific scientific interpretation. I handle the infrastructure layer across all common research data types.
DICOM, MRI series, scan archives, intake form extraction, patient-to-image matching
FASTQ, BAM, VCF file organization, sample tracking, experimental condition mapping, metadata linking
REDCap exports, eCRF records, patient cohort organization, longitudinal visit structures
Experimental cohort tracking, phenotype measurement aggregation, specimen records, treatment-to-outcome linking
FCS file organization, sample metadata linking, batch processing and validation
Any research dataset where the problem is organization, indexing, extraction, or validation rather than scientific interpretation
Every engagement follows the same structured process. You stay in control at every step. Nothing moves forward without your sign-off.
Use the Calendly link below to book a free 20-minute call. No preparation required.
We discuss your data structure, volume, and analysis requirements. I provide a time estimate and a fixed project price within 24 hours.
You receive a secure link to upload your data files. You sign a standard Data Processing Agreement covering confidentiality, data handling, and destruction on completion. Processing begins only after both are done.
I process your data according to the agreed specification. A processing log records every transformation made.
At any point you can request a pause to review the work in progress. Processing stops, you examine the current output, and confirm it matches your expectations before I continue.
You examine the partial dataset. If adjustments are needed we discuss and I correct before resuming. Your sign-off releases the pause.
The full dataset is processed, validated, and moved to the output directory. A complete processing log and validation report are included.
You examine the complete output. Confirm the structure, the index, and the validation report meet your requirements.
You download the complete processed dataset from the secure output directory.
All source and processed data is permanently deleted from my systems. I issue a signed data destruction certificate confirming deletion date and method.
Invoice is issued on completion and payable within 30 days. First-time engagements are invoiced at 50% of the quoted project price.
Every project is quoted as a fixed price after the discovery call. No hourly billing surprises. No scope creep without a revised quote. The price you are quoted is the price you pay.
50% of the quoted project price for your first engagement. No conditions. The full price applies to all subsequent projects. This offer exists because the best way to understand what systematic data processing produces is to experience it on your own data.
I am not a bioinformatician or a domain scientist. I am a data systems specialist with 30 years of experience building data processing infrastructure for large financial institutions — Morgan Stanley, Citibank, CIBC, RBC, Canada Life.
Every research data problem has the same underlying structure when you strip away the domain specifics: source data in some format, a target structure the researcher needs for analysis, and a transformation required to get from one to the other. That transformation — parsing, matching, renaming, organizing, validating, indexing — is what I do. The domain knowledge required to do it correctly is shallow and learnable. The systems thinking required to do it reliably at scale is what I bring.
Researchers understand their science. I understand their data. Together we eliminate the months between data collection and analysis.
20 minutes. No preparation required. You describe your data situation — I tell you what systematic processing would look like and what it would cost.
Book on Calendly