طرق إحصائية

Principal Component Analysis (PCA) is a statistical technique for dimensionality reduction. It transforms a large set of possibly correlated variables into a smaller set of uncorrelated variables called principal components.

The first component captures the maximum variance in the data.
The second component captures the next largest variance, subject to being orthogonal (uncorrelated) to the first.
The process continues until most of the variation in the data is explained.

📌 Uses of PCA

Dimensionality reduction: Simplifies high-dimensional datasets into fewer variables.
Visualization: Helps plot high-dimensional data in 2D or 3D.
Noise reduction: Removes less informative variability.
Pattern detection: Reveals hidden structure, clusters, or correlations.
Preprocessing: Improves efficiency of machine learning algorithms.

📌 Advantages

Reduces dataset size while keeping most information.
Removes multicollinearity by producing uncorrelated components.
Makes visualization of complex data possible.
Speeds up computation and improves performance of some algorithms.

📌 Disadvantages

Interpretability: Principal components are linear combinations, often hard to explain in real-world terms.
Information loss: Some variance is always lost when reducing dimensions.
Scaling sensitive: Results depend heavily on standardizing variables.
Assumption-driven: Assumes variance = importance, which may not always match domain needs.

تقرير عن هذه الطريقة باللغة العربية في هذا الرابط

RPubs - PCA