Synthetic Cancer Prediction Dataset for Research

The provided dataset is a synthetically generated collection of information aimed at simulating a cancer prediction scenario for research purposes. It comprises 10,000 pseudo-patients, each characterized by five distinct parameters, namely: Gender, Age, Smoking, Fatigue, and Allergy, along with a binary indicator denoting the presence or absence of cancer. This synthetic dataset serves as a tool for researchers to explore and experiment with predictive models for cancer detection.

The 'Gender' column is represented by binary values, where 0 corresponds to male and 1 corresponds to female. 'Age' spans a range from 18 to 100, reflecting the patient's age in years. 'Smoking' is a binary attribute, with 0 indicating non-smoker and 1 signifying a history of smoking. 'Fatigue' is similarly binary, with 0 denoting the absence of fatigue and 1 representing its presence. 'Allergy' is a binary variable indicating the presence or absence of allergies in the patient.

The 'Cancer' column is the key target variable, where 0 signifies the absence of cancer and 1 indicates a simulated case of cancer. It is important to emphasize that this dataset is entirely synthetic and not derived from actual clinical records. Researchers are encouraged to use this dataset for exploratory purposes, model development, and algorithm testing. However, it should be noted that results obtained from this dataset should not be extrapolated to real-world medical scenarios without validation on authentic clinical data. The synthetic nature of this dataset allows for controlled experimentation and serves as a valuable resource for preliminary research in the field of cancer prediction. - Safiul

Data and Resources

Additional Info

Field Value
Source https://www.kaggle.com/datasets/ohinhaque/synthetic-cancer-prediction-dataset-for-research
Author Safiul Haque Chowdhury
Last Updated October 8, 2024, 09:05 (UTC)
Created October 8, 2024, 09:05 (UTC)