Sunday, 16 November 2025

Logistic Regression Explained

Logistic regression is one of the supervised learning algorithm that is used for classification tasks.

Below are the contents that are discussed in the video


  • Introduction
  • Importance
  • Mathematics Behind Logistic Regression
  • Logistic Regression Workflow
  • Binary vs Multiclass Logistic Regression
  • Assumptions
  • Evaluation Metrics
  • Regularization in Logistic Regression
  • Practical Implementation in Python
  • Challenges and Best Practices
  • Summary and Q&A
  • Extension to Logistic Regression


Saturday, 15 November 2025

Linear Regression Explained

 Below are the contents that are discussed in the video

  • Introduction 
  • Key Concepts
  • Types of Linear Regression
  • How Linear Regression works
  • Assumptions
  • Code Example - Simple Linear Regression
  • Visualization of Results
  • Advantages and Limitations
  • Applications
  • Conclusion


Wednesday, 25 December 2024

Python for Data Science


Introduction

Why Python for Data Science?
  • Versatility and ease of use

  • Extensive libraries and tools

  • Community support

Python's Role in Data Science:

  • Data collection, cleaning, visualization, and machine learning


Key Features of Python
  • Open-source and cross-platform

  • Supports procedural, object-oriented, and functional programming

  • Rich ecosystem of libraries:

    • NumPy for numerical computing

    • Pandas for data manipulation

    • Matplotlib and Seaborn for visualization




 

Python Workflow for Data Science

Data Collection:
  • Sources: CSV, APIs, Databases

  • Tools: requests, pandas

Data Cleaning:

  • Handling missing data, duplicates

  • Libraries: pandas

Data Visualization:

  • Charts and plots using matplotlib and seaborn

Machine Learning:

  • Build predictive models using scikit-learn


Monday, 25 March 2024

Small Language Models (SLM)

Small Language Models (SLM) are the mini version of Large language Models. SLMs have less number of parameters compare to LLMs. These models are designed to perform tasks such as sentiment analysis, text generation, etc. similar to LLM but with less number of parameters. The less parameters helps to improve to computational efficiency, accessibility and adaptability.

Some examples of SLM are

  1. Llama 2 7B - Released by Meta in July'23, it has 7 billion parameter
  2. Phi2 - 2.7 billion parameter model, developed by Microsoft
  3. Stable Beluga 7B - Developed by Stability AI , it is auto regressive language model fine tuned on llam2
  4. Xgen - Developed by Salesforce. It is a smaller scale model that is customized for particular domains.
  5. Alibaba's Qwen - Developed by Alibaba cloud
  6. Alpaca 7B - Fine tuned on Meta Llama 7B model on 52 K instruction-following demonstration.
  7. Falcon 7B – This model is 7B parameters causal decoder-only model developed by TII and trained on 1,500B token.
  8. MPT - Developed by MosaicML Foundation
  9. Zephyr - Zephyr 7B is a model created by the HuggingFace H4 (Helpful, Honest, Harmless, Huggy) team with an objective to create a smaller language model that is aligned with user intent and outperforms even bigger models.
  10. MobileBERT
  11. GPT-Neo and GPT-J
  12. T5-Small - Text-To-Text Transfer Transformer (T5) is a pre-trained encoder-decoder model handling all NLP tasks as a unified text-to-text-format where the input and output are always text strings. T5-Small is the checkpoint with 60 million parameters.

 

Thursday, 28 December 2023

Fundamental and key aspects of building Streamlit App

Streamlit is an open source python library which is use to create and develop custom web applications for data science and machine learning.

Following are the steps one can follow to build streamlit app

  • Package installation

To use streamlit app, once has to install streamlit python package.Following command using pip can be use to install

pip install streamlit

  • Sample program - Hello World

Create a python script app.py and start with simple "Hello World" streamlit app 

import streamlit as st 

def main():

    st.title("Hello Streamlit!")

    st.write("This is a simple Streamlit app.") 

if __name__ == "__main__":

    main()

 Following are the essential python components 

1) Creating Widgets

Widgets are interactive components allowing user input.

  • Slider

import streamlit as st

value = st.slider('Select a value', min_value=0, max_value=100)

  • Text Input

text = st.text_input('Enter text')

  • Checkbox

option = st.checkbox('Show/hide')

2) Displaying Data

 Presenting data in various formats 

  • Display Dataframe

import pandas as pd

df = pd.DataFrame(data) # Your DataFrame st.dataframe(df)

  • Display Charts

import matplotlib.pyplot as plt

plt.plot(data)

st.pyplot() 

3) Layouts and Styling

 Providing structure to app and styling

  • Columns

col1, col2 = st.columns(2)

with col1:

st.write("Column 1")

 with col2:

 st.write("Column 2") 


  • Markdown and HTML

st.markdown('**Bold** text')

st.write("This is a regular text")

st.write("<p style='color:red'>This is HTML</p>", unsafe_allow_html=True)

 4) Handling User Inputs and Events

  • Button Click

if st.button('Click me'):

    st.write('Button clicked!')

  • Event Handling

selection = st.selectbox('Choose an option', options)

if selection == 'Option 1':

    st.write('Option 1 selected!')

 5) File Upload and Download

 Managing file uploads/downloads

  • File Upload

uploaded_file = st.file_uploader("Upload file", type=['csv', 'txt'])

  • File Download

download_button = st.download_button('Download', data=your_data, file_name='data.csv')

 6) Deployment Configuration

 Configuration settings for deployment

  • Page Title

st.set_page_config(page_title='My Streamlit App', layout='wide')

  • Caching

@st.cache

def expensive_computation(input):

    # Perform costly computation

    return result

result = expensive_computation(input_value)

 

Tuesday, 19 September 2023

Named Entity Recognition (NER) in Natural Language Processing

What is Named Entity Recognition (NER)?

Named entity recognition (NER) is a vital sub-task of natural language processing (NLP). The objective of NER is to identify and classify named entities in text data. NER is classified into predefined categories such as person, organization, location, dates, percentages, etc. NER helps with information extraction, text understanding, and document summarization. NER models empower organizations to extract valuable insights, automate information retrieval, improve search functionality, etc.

NER Categorization

Following are the primary categories of NER.

  • Persons: Names of people
  • Organizations: companies, government bodies, political groups
  • Locations: names of places, including cities, countries, monuments, etc.
  • Dates: Specific dates such as years, months, and dates
  • Numbers: numerical values such as percentages, currencies, measurements, etc.
  • Miscellaneous: miscellaneous named entities such as product names, event titles, skills, etc.
Importance of NER

  • Information Extraction: Extracting names of people, organizations, locations, etc.
  • Question Answering: Chatbots identify entities mentioned in user queries and retrieve relevant information.
  • Document Summarization: Helpful in identifying and highlighting key named entities
  • Sentiment Analysis: By understanding the organization and which products are discussed in customer reviews
Techniques in NER
  • Rule-based NER: These systems rely on predefined patterns, regular expressions, or dictionaries to identify named entities.
  • Statistical NER: These models use ML algorithms such as conditional random fields (CRF) and hidden Markov models (HMM). This model requires labelled training data for learning.
  • Deep learning-based RNNs and transformers have gained popularity for their ability to capture contextual information and achieve state-of-the-art results.
Challenges in NER
  • Ambiguity: Text data often contains ambiguous references to entities, which makes it difficult to define the correct category.
  • Named Entity Variability: Various forms of entities can exist, such as abbreviations, misspellings, synonyms, etc.
  • Domain Specificity: NER models perform differently based on domains with unique vocabularies and contexts.
Applications of NER
  • Healthcare: NER is used to extract medical entities like patient names, diseases, and treatment information from electronic health records.
  • Finance: NER is used to identify entities such as company names, stock symbols, and financial metrics from reports and news articles.
  • Legal: NER assists in recognizing legal entities, case names, and references to legal documents in legal texts.
 

Monday, 18 September 2023

Sentiment Analysis in Natural Language Processing

What is Sentiment Analysis ?

Sentiment analysis which is also known as opinion mining is used to extract sentiment or opinions from a text data, text data such as feedback, comment, tweet, etc. The objective of sentiment analysis is to classify the text data in terms of positive or negative.

Key steps involved in sentiment analysis

  • Text Preprocessing
  1. Tokenization- Splitting the text data into individual words or tokens
  2. Lowercasing - Transforming the text data to lower case so that there is consistency across complete data.
  3. Stop word removal - Removing unnecessary words such as 'a','an','the', etc.
  4. Stemming or Lemmatization - Transforming words to the root word such as playing to play, etc.
  • Feature Extraction
  1. Bag of words - Text data is represented as frequency of words
  2. Term Frequency-Inverse Document Frequency (TF-IDF) - Weight words basis importance of words in overall document
  3. Word Embeddings - Pre-trained models can be use to create word vector, Ex.(Word2vec, Glove, etc.) to capture semantic meanings
  • Model Selection
  1. Lexicon based - If sentiment lexicons are used 
  2. Machine learning models - Supervised or unsupervised based machine learning models such as Naive Bayes, support vector machine, LSTM or transformer based models such as BERT
Applications of Sentiment Analysis

  • Customer feedback analysis - Analyze customer reviews data
  • Social Media Monitoring - Track and analyze sentiment expressed on social media
  •  Market Research - Understand public sentiment towards specific products