All about Artificial Intelligence, Machine Learning, Deep Learning and Data Science

Sunday, 16 November 2025

Logistic Regression Explained

Logistic regression is one of the supervised learning algorithm that is used for classification tasks.

Below are the contents that are discussed in the video

Introduction

Importance

Mathematics Behind Logistic Regression

Logistic Regression Workflow

Binary vs Multiclass Logistic Regression

Assumptions

Evaluation Metrics

Regularization in Logistic Regression

Practical Implementation in Python

Challenges and Best Practices

Summary and Q&A

Extension to Logistic Regression

Saturday, 15 November 2025

Linear Regression Explained

Below are the contents that are discussed in the video

Introduction
Key Concepts
Types of Linear Regression
How Linear Regression works
Assumptions
Code Example - Simple Linear Regression
Visualization of Results
Advantages and Limitations
Applications
Conclusion

Wednesday, 25 December 2024

Python for Data Science

Introduction

Why Python for Data Science?

Versatility and ease of use
Extensive libraries and tools
Community support

Python's Role in Data Science:

Data collection, cleaning, visualization, and machine learning

Key Features of Python

Open-source and cross-platform
Supports procedural, object-oriented, and functional programming
Rich ecosystem of libraries:

NumPy for numerical computing
Pandas for data manipulation
Matplotlib and Seaborn for visualization

Python Workflow for Data Science

Data Collection:

Sources: CSV, APIs, Databases
Tools: requests, pandas

Data Cleaning:

Handling missing data, duplicates
Libraries: pandas

Data Visualization:

Charts and plots using matplotlib and seaborn

Machine Learning:

Build predictive models using scikit-learn

Monday, 25 March 2024

Small Language Models (SLM) are the mini version of Large language Models. SLMs have less number of parameters compare to LLMs. These models are designed to perform tasks such as sentiment analysis, text generation, etc. similar to LLM but with less number of parameters. The less parameters helps to improve to computational efficiency, accessibility and adaptability.

Some examples of SLM are

Llama 2 7B - Released by Meta in July'23, it has 7 billion parameter
Phi2 - 2.7 billion parameter model, developed by Microsoft
Stable Beluga 7B - Developed by Stability AI , it is auto regressive language model fine tuned on llam2
Xgen - Developed by Salesforce. It is a smaller scale model that is customized for particular domains.
Alibaba's Qwen - Developed by Alibaba cloud
Alpaca 7B - Fine tuned on Meta Llama 7B model on 52 K instruction-following demonstration.
Falcon 7B – This model is 7B parameters causal decoder-only model developed by TII and trained on 1,500B token.
MPT - Developed by MosaicML Foundation
Zephyr - Zephyr 7B is a model created by the HuggingFace H4 (Helpful, Honest, Harmless, Huggy) team with an objective to create a smaller language model that is aligned with user intent and outperforms even bigger models.
MobileBERT
GPT-Neo and GPT-J
T5-Small - Text-To-Text Transfer Transformer (T5) is a pre-trained encoder-decoder model handling all NLP tasks as a unified text-to-text-format where the input and output are always text strings. T5-Small is the checkpoint with 60 million parameters.

Thursday, 28 December 2023

Fundamental and key aspects of building Streamlit App

Streamlit is an open source python library which is use to create and develop custom web applications for data science and machine learning.

Following are the steps one can follow to build streamlit app

Package installation

To use streamlit app, once has to install streamlit python package.Following command using pip can be use to install

pip install streamlit

Sample program - Hello World

Create a python script app.py and start with simple "Hello World" streamlit app

import streamlit as st

def main():

st.title("Hello Streamlit!")

st.write("This is a simple Streamlit app.")

if __name__ == "__main__":

main()

Following are the essential python components

1) Creating Widgets

Widgets are interactive components allowing user input.

Slider

import streamlit as st

value = st.slider('Select a value', min_value=0, max_value=100)

Text Input

text = st.text_input('Enter text')

Checkbox

option = st.checkbox('Show/hide')

2) Displaying Data

Presenting data in various formats

Display Dataframe

import pandas as pd

df = pd.DataFrame(data) # Your DataFrame st.dataframe(df)

Display Charts

import matplotlib.pyplot as plt

plt.plot(data)

st.pyplot()

3) Layouts and Styling

Providing structure to app and styling

Columns

col1, col2 = st.columns(2)

with col1:

st.write("Column 1")

with col2:

st.write("Column 2")

Markdown and HTML

st.markdown('**Bold** text')

st.write("This is a regular text")

st.write("<p style='color:red'>This is HTML</p>", unsafe_allow_html=True)

4) Handling User Inputs and Events

Button Click

if st.button('Click me'):

st.write('Button clicked!')

Event Handling

selection = st.selectbox('Choose an option', options)

if selection == 'Option 1':

st.write('Option 1 selected!')

5) File Upload and Download

Managing file uploads/downloads

File Upload

uploaded_file = st.file_uploader("Upload file", type=['csv', 'txt'])

File Download

download_button = st.download_button('Download', data=your_data, file_name='data.csv')

6) Deployment Configuration

Configuration settings for deployment

Page Title

st.set_page_config(page_title='My Streamlit App', layout='wide')

Caching

@st.cache

def expensive_computation(input):

# Perform costly computation

return result

result = expensive_computation(input_value)

Tuesday, 19 September 2023

Named Entity Recognition (NER) in Natural Language Processing

What is Named Entity Recognition (NER)?

Named entity recognition (NER) is a vital sub-task of natural language processing (NLP). The objective of NER is to identify and classify named entities in text data. NER is classified into predefined categories such as person, organization, location, dates, percentages, etc. NER helps with information extraction, text understanding, and document summarization. NER models empower organizations to extract valuable insights, automate information retrieval, improve search functionality, etc.

NER Categorization

Following are the primary categories of NER.

Persons: Names of people
Organizations: companies, government bodies, political groups
Locations: names of places, including cities, countries, monuments, etc.
Dates: Specific dates such as years, months, and dates
Numbers: numerical values such as percentages, currencies, measurements, etc.
Miscellaneous: miscellaneous named entities such as product names, event titles, skills, etc.

Importance of NER

Information Extraction: Extracting names of people, organizations, locations, etc.
Question Answering: Chatbots identify entities mentioned in user queries and retrieve relevant information.
Document Summarization: Helpful in identifying and highlighting key named entities
Sentiment Analysis: By understanding the organization and which products are discussed in customer reviews

Techniques in NER

Rule-based NER: These systems rely on predefined patterns, regular expressions, or dictionaries to identify named entities.
Statistical NER: These models use ML algorithms such as conditional random fields (CRF) and hidden Markov models (HMM). This model requires labelled training data for learning.
Deep learning-based RNNs and transformers have gained popularity for their ability to capture contextual information and achieve state-of-the-art results.

Challenges in NER

Ambiguity: Text data often contains ambiguous references to entities, which makes it difficult to define the correct category.
Named Entity Variability: Various forms of entities can exist, such as abbreviations, misspellings, synonyms, etc.
Domain Specificity: NER models perform differently based on domains with unique vocabularies and contexts.

Applications of NER

Healthcare: NER is used to extract medical entities like patient names, diseases, and treatment information from electronic health records.
Finance: NER is used to identify entities such as company names, stock symbols, and financial metrics from reports and news articles.
Legal: NER assists in recognizing legal entities, case names, and references to legal documents in legal texts.

Monday, 18 September 2023

Sentiment Analysis in Natural Language Processing

What is Sentiment Analysis ?

Sentiment analysis which is also known as opinion mining is used to extract sentiment or opinions from a text data, text data such as feedback, comment, tweet, etc. The objective of sentiment analysis is to classify the text data in terms of positive or negative.

Key steps involved in sentiment analysis

Text Preprocessing

Tokenization- Splitting the text data into individual words or tokens
Lowercasing - Transforming the text data to lower case so that there is consistency across complete data.
Stop word removal - Removing unnecessary words such as 'a','an','the', etc.
Stemming or Lemmatization - Transforming words to the root word such as playing to play, etc.

Feature Extraction

Bag of words - Text data is represented as frequency of words
Term Frequency-Inverse Document Frequency (TF-IDF) - Weight words basis importance of words in overall document
Word Embeddings - Pre-trained models can be use to create word vector, Ex.(Word2vec, Glove, etc.) to capture semantic meanings

Model Selection

Lexicon based - If sentiment lexicons are used
Machine learning models - Supervised or unsupervised based machine learning models such as Naive Bayes, support vector machine, LSTM or transformer based models such as BERT

Applications of Sentiment Analysis

Customer feedback analysis - Analyze customer reviews data
Social Media Monitoring - Track and analyze sentiment expressed on social media
Market Research - Understand public sentiment towards specific products