Thursday, 28 December 2023

Fundamental and key aspects of building Streamlit App

Streamlit is an open source python library which is use to create and develop custom web applications for data science and machine learning.

Following are the steps one can follow to build streamlit app

  • Package installation

To use streamlit app, once has to install streamlit python package.Following command using pip can be use to install

pip install streamlit

  • Sample program - Hello World

Create a python script app.py and start with simple "Hello World" streamlit app 

import streamlit as st 

def main():

    st.title("Hello Streamlit!")

    st.write("This is a simple Streamlit app.") 

if __name__ == "__main__":

    main()

 Following are the essential python components 

1) Creating Widgets

Widgets are interactive components allowing user input.

  • Slider

import streamlit as st

value = st.slider('Select a value', min_value=0, max_value=100)

  • Text Input

text = st.text_input('Enter text')

  • Checkbox

option = st.checkbox('Show/hide')

2) Displaying Data

 Presenting data in various formats 

  • Display Dataframe

import pandas as pd

df = pd.DataFrame(data) # Your DataFrame st.dataframe(df)

  • Display Charts

import matplotlib.pyplot as plt

plt.plot(data)

st.pyplot() 

3) Layouts and Styling

 Providing structure to app and styling

  • Columns

col1, col2 = st.columns(2)

with col1:

st.write("Column 1")

 with col2:

 st.write("Column 2") 


  • Markdown and HTML

st.markdown('**Bold** text')

st.write("This is a regular text")

st.write("<p style='color:red'>This is HTML</p>", unsafe_allow_html=True)

 4) Handling User Inputs and Events

  • Button Click

if st.button('Click me'):

    st.write('Button clicked!')

  • Event Handling

selection = st.selectbox('Choose an option', options)

if selection == 'Option 1':

    st.write('Option 1 selected!')

 5) File Upload and Download

 Managing file uploads/downloads

  • File Upload

uploaded_file = st.file_uploader("Upload file", type=['csv', 'txt'])

  • File Download

download_button = st.download_button('Download', data=your_data, file_name='data.csv')

 6) Deployment Configuration

 Configuration settings for deployment

  • Page Title

st.set_page_config(page_title='My Streamlit App', layout='wide')

  • Caching

@st.cache

def expensive_computation(input):

    # Perform costly computation

    return result

result = expensive_computation(input_value)

 

Tuesday, 19 September 2023

Named Entity Recognition (NER) in Natural Language Processing

What is Named Entity Recognition (NER)?

Named entity recognition (NER) is a vital sub-task of natural language processing (NLP). The objective of NER is to identify and classify named entities in text data. NER is classified into predefined categories such as person, organization, location, dates, percentages, etc. NER helps with information extraction, text understanding, and document summarization. NER models empower organizations to extract valuable insights, automate information retrieval, improve search functionality, etc.

NER Categorization

Following are the primary categories of NER.

  • Persons: Names of people
  • Organizations: companies, government bodies, political groups
  • Locations: names of places, including cities, countries, monuments, etc.
  • Dates: Specific dates such as years, months, and dates
  • Numbers: numerical values such as percentages, currencies, measurements, etc.
  • Miscellaneous: miscellaneous named entities such as product names, event titles, skills, etc.
Importance of NER

  • Information Extraction: Extracting names of people, organizations, locations, etc.
  • Question Answering: Chatbots identify entities mentioned in user queries and retrieve relevant information.
  • Document Summarization: Helpful in identifying and highlighting key named entities
  • Sentiment Analysis: By understanding the organization and which products are discussed in customer reviews
Techniques in NER
  • Rule-based NER: These systems rely on predefined patterns, regular expressions, or dictionaries to identify named entities.
  • Statistical NER: These models use ML algorithms such as conditional random fields (CRF) and hidden Markov models (HMM). This model requires labelled training data for learning.
  • Deep learning-based RNNs and transformers have gained popularity for their ability to capture contextual information and achieve state-of-the-art results.
Challenges in NER
  • Ambiguity: Text data often contains ambiguous references to entities, which makes it difficult to define the correct category.
  • Named Entity Variability: Various forms of entities can exist, such as abbreviations, misspellings, synonyms, etc.
  • Domain Specificity: NER models perform differently based on domains with unique vocabularies and contexts.
Applications of NER
  • Healthcare: NER is used to extract medical entities like patient names, diseases, and treatment information from electronic health records.
  • Finance: NER is used to identify entities such as company names, stock symbols, and financial metrics from reports and news articles.
  • Legal: NER assists in recognizing legal entities, case names, and references to legal documents in legal texts.
 

Monday, 18 September 2023

Sentiment Analysis in Natural Language Processing

What is Sentiment Analysis ?

Sentiment analysis which is also known as opinion mining is used to extract sentiment or opinions from a text data, text data such as feedback, comment, tweet, etc. The objective of sentiment analysis is to classify the text data in terms of positive or negative.

Key steps involved in sentiment analysis

  • Text Preprocessing
  1. Tokenization- Splitting the text data into individual words or tokens
  2. Lowercasing - Transforming the text data to lower case so that there is consistency across complete data.
  3. Stop word removal - Removing unnecessary words such as 'a','an','the', etc.
  4. Stemming or Lemmatization - Transforming words to the root word such as playing to play, etc.
  • Feature Extraction
  1. Bag of words - Text data is represented as frequency of words
  2. Term Frequency-Inverse Document Frequency (TF-IDF) - Weight words basis importance of words in overall document
  3. Word Embeddings - Pre-trained models can be use to create word vector, Ex.(Word2vec, Glove, etc.) to capture semantic meanings
  • Model Selection
  1. Lexicon based - If sentiment lexicons are used 
  2. Machine learning models - Supervised or unsupervised based machine learning models such as Naive Bayes, support vector machine, LSTM or transformer based models such as BERT
Applications of Sentiment Analysis

  • Customer feedback analysis - Analyze customer reviews data
  • Social Media Monitoring - Track and analyze sentiment expressed on social media
  •  Market Research - Understand public sentiment towards specific products

Topic Modelling in Natural Language Processing

What is topic modelling?

Topic modelling is a machine learning technique in which the objective is to find out the underlying pattern and hidden topics in unstructured text data. Due to the exponential growth of data, it has become significantly more important to identify meaningful insights from the data and use them to understand business.

What are topics in the context of topic modelling?

Topics are underlying patterns or themes that represent a group of words that frequently occur together in the document. For example, in a news article, there can be an underlying topic such as entertainment, politics, foreign relations, etc., or a combination of these topics.

Steps in Topic Modelling

  •  Data Preparation 

This step involves cleaning and preprocessing text data. The tasks associated with preprocessing include tokenization, removing punctuation, removing stop words, stemming, and lemmatization.
  •  Building a Document Term Matrix (DTM)
DTM is a format that is used to feed in models. In this format, rows represent documents, while columns represent words. Values in the associated row and column represent the frequency of a word if the term frequency method is used; otherwise, other values are used depending on the algorithm applied.
  • Selecting Topic Modelling Algorithm
Different topic modelling algorithms can be used, such as NMF (Non-Negative Matrix Factorization) or LDA (Latent Dirichlet Allocation). LDA is more common compared to NMF.
  •  Model Training
The model is trained post-train and test split. The DTM matrix is used as an input to discover the topics within the corpus. Hyperparameter tuning is also done to identify the number of topics.
 
  • Topic interpretation and evaluation
Understand the associated words from each topic to interpret the theme of that topic.

 Applications of Topic Modelling

  • Content recommendations: recommending content on the same topic to users based on their history
  • Customer Insights: Identify the topics from reviews shared by customers.
  • Text Categorization: Categorization of text data based on topics interpreted from topic modelling

 

Friday, 16 June 2023

What is Generative AI ?

Generative AI is a branch of artificial intelligence that is used to generate new text, videos, audio, images, code, or other synthetic data. The technology was initially introduced to automate repetitive tasks in digital audio and image correction.

How does Generative work?

Generative AI algorithm is trained using a neural network, through which it identifies patterns and structures in the existing data. The objective behind Gen AI is to create new data.

Currently, there are two of the most widely used Gen AI models.

  • Generative Adversarial Networks (GANs): GANs are a class of machine learning models that are used to generate new data samples that resemble a given training dataset. They were first introduced by Ian Goodfellow and his colleagues in 2014.GAN consists of two major components: a generator and a discriminator. The generator generates synthetic data such as text, video or images, while discriminator task is to identify real and fake samples.
  • Transformer-Based Models: These models follow the concept of encoder and decoder based architecture. It works on the concept of "Attention is All You Need," a paper published by Google in 2017.
Applications of Generative AI
  • Text Generation: Gen AI platforms such as ChatGPT are used to generate text for articles, blogs, and content creation for marketing.
  • Code Generation: Code completion, code generation, test case generation, bug fixing, model integration
  • Visual Content: Image Generation and Enhancement, Video Creation, 3D Shape Generation
  • Audio Generation: Creating Music, Text-to-Speech Generators, and Speech-to Text Convertors



Sunday, 23 April 2023

Reinforcement Learning : Agent - Environment Interaction

In the last RL article, we learned about different terms associated with RL, such as action, environment, state, etc.

Today we will learn about how an agent observes the environment , then takes an action, and the environment rewards that action in terms of a positive or negative reward.

Supervised vs Unsupervised vs Reinforcement learning 

In supervised learning, labelled data is present and a model is trained based on that, while in unsupervised learning, there is no labelled data and clusters or segments are created after the modelling.

In reinforcement learning, there is no labelled data, and the model learns things based on its own experiences and actions. The objective in RL is to maximize the cumulative rewards based on the sequence of actions.

There are two types of tasks

  • Continuous: - Task which do not have a definite end. ( Ex. Learning to walk, driving a car)
  • Episodic: - Task that have a definite end ( Ex. Games, Chess, Ludo etc.) - Here in the end outcome comes in terms of win or loss.

Monday, 27 February 2023

Bias Variance Tradeoff

Before we get into bias and variance, it's important to understand the different types of error in machine learning.

  • Reducible error 
These errors can be reduced to improve the model's accuracy. This error can be classified as follows:
    • Bias
    • Variance
  • Irreducible error 
The errors that cannot be reduced and will always be present in models
 
Bias and variance explained
 
Bias is defined as the difference between actual and predicted values; a model with a high bias is oversimplified and less complex and doesn't perform better on test or training data.
 
Variance is defined as the amount of variation in prediction if different training data is used; ideally, there should not be much variation when the model is changed. High variance implies that the model was not able to map input and output variables.


Tuesday, 14 February 2023

Best Practices: How to Write Clean Python Code

Best Practices: How to Write Clean Python Code

Following are things one should consider while writing machine learning code.

1) Right names 

One should provide correct names to variables, functions, data frames, etc. so that one can easily interpret the meaning.

2) Consistent with naming conventions

Always follow the same writing convention across the code.

3) Proper documentation and comments

One should add comments to provide a description of the task to be performed by code.

4) Avoid using redundant text while writing code or providing any description; -Text / name must be easily interpretable 

5) Avoid Duplication: Avoid duplication in code by using the same function for one task instead of writing different functions, also applies to other things 

Monday, 6 February 2023

What is Natural Language Processing ?

Natural Language Processing (NLP) is a field of study that focuses on the interactions between human natural language and computers. The goal of NLP is to enable computers to understand, interpret, and generate human language. It is an interdisciplinary field that encompasses computer science, artificial intelligence, linguistics, and cognitive psychology. NLP techniques use a combination of machine learning, deep learning, and computational linguistic methods to process and analyze natural language text or speech.

These applications allow computers to understand and interpret human language, making it possible for machines to interact with humans in a more natural way.

Applications of NLP 

  1. Text to speech: Transforming text data to speech. Example -Reading articles and generating the article (text) in speech form.
  2. Speech to text : Transforming speech to text . Example - Subtitles in movies.
  3. Language translation : Translating text in one language to other. Example - Google translator 
  4. Sentiment analysis : Analyzing sentiments of text to identify the emotional tone in text. Example - Analysis of product reviews and finding sentiments in terms of positive, negative and neutral
  5. Text summarization: Breaking large text into small so that it can be summarized in small text. Example - Book summary in few sentences
  6. Named Entity Recognition: Identify key information from the text and classify into pre-defined categories. Example- Shahrukh Khan belongs to India (Location- India)
  7. Question Answering: Providing answers to questions asked. Example :Q- Capital of India, A- Delhi
  8. Dialogue Systems: System that interacts with human in natural language. Example -ChatGPT
  9. Language Generation- Natural Language Generation (NLG) using machine learning and artificial intelligence. Example - Product descriptions
  10. Language Understanding : Natural Language Understanding (NLU)  is used to understand users input either from text or speech by understanding intent. Example - Need to travel to USA-NLU understand intent i.e. travel ,location -USA



Thursday, 19 January 2023

What is Reinforcement Learning ?

Reinforcement learning is a machine learning method in which the machine learns by experimenting with positive and negative rewards. If an experiment is performed and the results are fruitful, then it is a positive reward; if the results are not as expected, then it is a negative reward.

Let's understand this with an example: When humans learn to ride a bicycle, they pedal the bicycle and it moves, which gives a positive experience, but when balance is not maintained, they fall, which is a negative experience. Hence,  learning from experiences is known as "reinforcement learning."

Terms associated with reinforcement learning

  • Agent : Entity that explores the environment and then act on it.
  • Environment : Defined as surroundings in which the agent is surrounded by, which is random in general
  • Action : Moves that are taken by agent within the environment
  • State : Situation which is returned by environment after agent takes the action
  • Reward : Feedback received from environment after agent has taken the action
  • Policy : Strategy applied by agent for the next action basis current state
  • Value :  Expected long terms return of agent considering discount factor
  • Q-Value : Similar to value but it also takes current action in account