Context

First, I am new to ML, and just in case I slip up, apologies in advance!! So, I am doing an online ML course and this is an assignment where we are supposed to practice scikit-learn's PCA routine. Since the course has been ARCHIVED - which means the discussion posts are not answered!! - hence my posting of the problem here.

What better way to learn than to get so many experts giving me feedback … right?

Content

The data was taken over a 2-month period in India with 25 features ( eg, red blood cell count, white blood cell count, etc). The target is the 'classification', which is either 'ckd' or 'notckd' - ckd=chronic kidney disease. There are 400 rows

The data needs cleaning: in that it has NaNs and the numeric features need to be forced to floats. Basically, we were instructed to get rid of ALL ROWS with Nans, with no threshold - meaning, any row that has even one NaN, gets deleted.

Part 1: We are asked to choose 3 features (bgr, rc, wc), visualize them, then run the PCA with n_components=2. the PCA is to be run twice: one with no scaling and the second run WITH scaling. And this is where my issue starts … in that after scaling I can hardly see any difference!

I will stop here for now till I get feedback and then move to Part 2.

Acknowledgements The dataset is available at: https://archive.ics.uci.edu/ml/datasets/Chronic_Kidney_Disease

Inspiration I would like to get an intuitive and a practical understanding of PCA.

Field	Value
Data last updated	September 21, 2024
Metadata last updated	September 21, 2024
Created	September 21, 2024
Format	CSV
License	Creative Commons Attribution
Datastore active	True
Has views	True
Id	6bd22f52-9b61-4c36-9984-168a4845ce31
Mimetype	text/csv
Package id	8ab5865a-ac21-4b6b-9031-bd203fe64331
Position	0
Size	47.4 KiB
State	active
Url type	upload

Chronic KIdney Disease dataset

Context

Content

Data Dictionary

Additional Information

Chronic KIdney Disease dataset

Context

Content

Embed resource view

Data Dictionary

1.id numeric

2.age numeric

3.bp numeric

4.sg numeric

5.al numeric

6.su numeric

7.rbc text

8.pc text

9.pcc text

10.ba text

11.bgr numeric

12.bu numeric

13.sc numeric

14.sod numeric

15.pot numeric

16.hemo numeric

17.pcv text

18.wc text

19.rc text

20.htn text

21.dm text

22.cad text

23.appet text

24.pe text

25.ane text

26.classification text

Additional Information