What is Machine Learning ?
Whats is Machine Learning ?
- Machine learning (ML) enables computers to learn from data rather than being explicitly programmed.
- It has become mainstream with applications such as spam filters, voice recognition, recommendation systems, and self-driving cars.
- A machine learns if its performance on a task improves with experience, as defined by Tom Mitchell (1997).
Why Use Machine Learning?
- Traditional rule-based programming is limited because it requires hardcoded rules that are difficult to maintain.
- ML is useful when:
- The problem is too complex for traditional programming.
- Rules change frequently (e.g., spam detection adapting to new tricks).
- Hidden patterns in data can be discovered (data mining).
Types of Machine Learning Systems
- Supervised Learning
- Uses labeled data (input-output pairs).
- Examples:
- Classification (e.g., spam detection).
- Regression (e.g., predicting house prices).
2. Unsupervised Learning
- Uses unlabeled data; the model finds patterns without predefined categories.
- Examples:
- Clustering (e.g., customer segmentation).
- Anomaly detection (e.g., fraud detection).
- Dimensionality reduction (e.g., PCA for visualization).
3. Semi-supervised Learning
- Uses a mix of labeled and unlabeled data (e.g., Google Photos face recognition).
4. Self-supervised Learning
- A special case of unsupervised learning where a model generates its own labels (e.g., language models like GPT).
5. Reinforcement Learning
- Uses agents that learn by interacting with an environment and receiving rewards or penalties (e.g., AlphaGo, robotic control).
Batch vs. Online Learning
- Batch Learning: The model is trained on a full dataset and deployed statically.
- Online Learning: The model learns incrementally from a stream of data, making it adaptable to changes.
Instance-Based vs. Model-Based Learning
- Instance-Based Learning: The system memorizes examples and classifies new instances based on similarity (e.g., k-nearest neighbors).
- Model-Based Learning: The system creates a mathematical model to generalize from training data (e.g., linear regression).
Challenges in Machine Learning
- Insufficient Training Data: More data generally leads to better models, especially in deep learning.
- Nonrepresentative Data: The training data must reflect the real-world scenarios the model will encounter.
- Poor-Quality Data: Noisy or incorrect labels can mislead the model.
- Irrelevant Features: Feature engineering is critical to improving performance.
Overfitting vs. Underfitting (Basic Terms)
- Overfitting: The model learns noise instead of patterns, performing well on training data but poorly on new data.
- Underfitting: The model is too simple to capture patterns in the data.
- Solutions: Use more data, simplify the model, or apply regularization techniques.