All about Artificial Intelligence, Machine Learning, Deep Learning and Data Science: 2023

Thursday, 28 December 2023

Fundamental and key aspects of building Streamlit App

Streamlit is an open source python library which is use to create and develop custom web applications for data science and machine learning.

Following are the steps one can follow to build streamlit app

Package installation

To use streamlit app, once has to install streamlit python package.Following command using pip can be use to install

pip install streamlit

Sample program - Hello World

Create a python script app.py and start with simple "Hello World" streamlit app

import streamlit as st

def main():

st.title("Hello Streamlit!")

st.write("This is a simple Streamlit app.")

if __name__ == "__main__":

main()

Following are the essential python components

1) Creating Widgets

Widgets are interactive components allowing user input.

Slider

import streamlit as st

value = st.slider('Select a value', min_value=0, max_value=100)

Text Input

text = st.text_input('Enter text')

Checkbox

option = st.checkbox('Show/hide')

2) Displaying Data

Presenting data in various formats

Display Dataframe

import pandas as pd

df = pd.DataFrame(data) # Your DataFrame st.dataframe(df)

Display Charts

import matplotlib.pyplot as plt

plt.plot(data)

st.pyplot()

3) Layouts and Styling

Providing structure to app and styling

Columns

col1, col2 = st.columns(2)

with col1:

st.write("Column 1")

with col2:

st.write("Column 2")

Markdown and HTML

st.markdown('**Bold** text')

st.write("This is a regular text")

st.write("<p style='color:red'>This is HTML</p>", unsafe_allow_html=True)

4) Handling User Inputs and Events

Button Click

if st.button('Click me'):

st.write('Button clicked!')

Event Handling

selection = st.selectbox('Choose an option', options)

if selection == 'Option 1':

st.write('Option 1 selected!')

5) File Upload and Download

Managing file uploads/downloads

File Upload

uploaded_file = st.file_uploader("Upload file", type=['csv', 'txt'])

File Download

download_button = st.download_button('Download', data=your_data, file_name='data.csv')

6) Deployment Configuration

Configuration settings for deployment

Page Title

st.set_page_config(page_title='My Streamlit App', layout='wide')

Caching

@st.cache

def expensive_computation(input):

# Perform costly computation

return result

result = expensive_computation(input_value)

Tuesday, 19 September 2023

Named Entity Recognition (NER) in Natural Language Processing

What is Named Entity Recognition (NER)?

Named entity recognition (NER) is a vital sub-task of natural language processing (NLP). The objective of NER is to identify and classify named entities in text data. NER is classified into predefined categories such as person, organization, location, dates, percentages, etc. NER helps with information extraction, text understanding, and document summarization. NER models empower organizations to extract valuable insights, automate information retrieval, improve search functionality, etc.

NER Categorization

Following are the primary categories of NER.

Persons: Names of people
Organizations: companies, government bodies, political groups
Locations: names of places, including cities, countries, monuments, etc.
Dates: Specific dates such as years, months, and dates
Numbers: numerical values such as percentages, currencies, measurements, etc.
Miscellaneous: miscellaneous named entities such as product names, event titles, skills, etc.

Importance of NER

Information Extraction: Extracting names of people, organizations, locations, etc.
Question Answering: Chatbots identify entities mentioned in user queries and retrieve relevant information.
Document Summarization: Helpful in identifying and highlighting key named entities
Sentiment Analysis: By understanding the organization and which products are discussed in customer reviews

Techniques in NER

Rule-based NER: These systems rely on predefined patterns, regular expressions, or dictionaries to identify named entities.
Statistical NER: These models use ML algorithms such as conditional random fields (CRF) and hidden Markov models (HMM). This model requires labelled training data for learning.
Deep learning-based RNNs and transformers have gained popularity for their ability to capture contextual information and achieve state-of-the-art results.

Challenges in NER

Ambiguity: Text data often contains ambiguous references to entities, which makes it difficult to define the correct category.
Named Entity Variability: Various forms of entities can exist, such as abbreviations, misspellings, synonyms, etc.
Domain Specificity: NER models perform differently based on domains with unique vocabularies and contexts.

Applications of NER

Healthcare: NER is used to extract medical entities like patient names, diseases, and treatment information from electronic health records.
Finance: NER is used to identify entities such as company names, stock symbols, and financial metrics from reports and news articles.
Legal: NER assists in recognizing legal entities, case names, and references to legal documents in legal texts.

Monday, 18 September 2023

Sentiment Analysis in Natural Language Processing

What is Sentiment Analysis ?

Sentiment analysis which is also known as opinion mining is used to extract sentiment or opinions from a text data, text data such as feedback, comment, tweet, etc. The objective of sentiment analysis is to classify the text data in terms of positive or negative.

Key steps involved in sentiment analysis

Text Preprocessing

Tokenization- Splitting the text data into individual words or tokens
Lowercasing - Transforming the text data to lower case so that there is consistency across complete data.
Stop word removal - Removing unnecessary words such as 'a','an','the', etc.
Stemming or Lemmatization - Transforming words to the root word such as playing to play, etc.

Feature Extraction

Bag of words - Text data is represented as frequency of words
Term Frequency-Inverse Document Frequency (TF-IDF) - Weight words basis importance of words in overall document
Word Embeddings - Pre-trained models can be use to create word vector, Ex.(Word2vec, Glove, etc.) to capture semantic meanings

Model Selection

Lexicon based - If sentiment lexicons are used
Machine learning models - Supervised or unsupervised based machine learning models such as Naive Bayes, support vector machine, LSTM or transformer based models such as BERT

Applications of Sentiment Analysis

Customer feedback analysis - Analyze customer reviews data
Social Media Monitoring - Track and analyze sentiment expressed on social media
Market Research - Understand public sentiment towards specific products

Topic Modelling in Natural Language Processing

What is topic modelling?

Topic modelling is a machine learning technique in which the objective is to find out the underlying pattern and hidden topics in unstructured text data. Due to the exponential growth of data, it has become significantly more important to identify meaningful insights from the data and use them to understand business.

What are topics in the context of topic modelling?

Topics are underlying patterns or themes that represent a group of words that frequently occur together in the document. For example, in a news article, there can be an underlying topic such as entertainment, politics, foreign relations, etc., or a combination of these topics.

Steps in Topic Modelling

Data Preparation

This step involves cleaning and preprocessing text data. The tasks associated with preprocessing include tokenization, removing punctuation, removing stop words, stemming, and lemmatization.

Building a Document Term Matrix (DTM)

DTM is a format that is used to feed in models. In this format, rows represent documents, while columns represent words. Values in the associated row and column represent the frequency of a word if the term frequency method is used; otherwise, other values are used depending on the algorithm applied.

Selecting Topic Modelling Algorithm

Different topic modelling algorithms can be used, such as NMF (Non-Negative Matrix Factorization) or LDA (Latent Dirichlet Allocation). LDA is more common compared to NMF.

Model Training

The model is trained post-train and test split. The DTM matrix is used as an input to discover the topics within the corpus. Hyperparameter tuning is also done to identify the number of topics.

Topic interpretation and evaluation

Understand the associated words from each topic to interpret the theme of that topic.

Applications of Topic Modelling

Content recommendations: recommending content on the same topic to users based on their history
Customer Insights: Identify the topics from reviews shared by customers.
Text Categorization: Categorization of text data based on topics interpreted from topic modelling

Friday, 16 June 2023

What is Generative AI ?

Generative AI is a branch of artificial intelligence that is used to generate new text, videos, audio, images, code, or other synthetic data. The technology was initially introduced to automate repetitive tasks in digital audio and image correction.

How does Generative work?

Generative AI algorithm is trained using a neural network, through which it identifies patterns and structures in the existing data. The objective behind Gen AI is to create new data.

Currently, there are two of the most widely used Gen AI models.

Generative Adversarial Networks (GANs): GANs are a class of machine learning models that are used to generate new data samples that resemble a given training dataset. They were first introduced by Ian Goodfellow and his colleagues in 2014.GAN consists of two major components: a generator and a discriminator. The generator generates synthetic data such as text, video or images, while discriminator task is to identify real and fake samples.

Transformer-Based Models: These models follow the concept of encoder and decoder based architecture. It works on the concept of "Attention is All You Need," a paper published by Google in 2017.

Applications of Generative AI

Text Generation: Gen AI platforms such as ChatGPT are used to generate text for articles, blogs, and content creation for marketing.

Code Generation: Code completion, code generation, test case generation, bug fixing, model integration

Visual Content: Image Generation and Enhancement, Video Creation, 3D Shape Generation

Audio Generation: Creating Music, Text-to-Speech Generators, and Speech-to Text Convertors

Sunday, 23 April 2023

Reinforcement Learning : Agent - Environment Interaction

In the last RL article, we learned about different terms associated with RL, such as action, environment, state, etc.

Today we will learn about how an agent observes the environment , then takes an action, and the environment rewards that action in terms of a positive or negative reward.

Supervised vs Unsupervised vs Reinforcement learning

In supervised learning, labelled data is present and a model is trained based on that, while in unsupervised learning, there is no labelled data and clusters or segments are created after the modelling.

In reinforcement learning, there is no labelled data, and the model learns things based on its own experiences and actions. The objective in RL is to maximize the cumulative rewards based on the sequence of actions.

There are two types of tasks

Continuous: - Task which do not have a definite end. ( Ex. Learning to walk, driving a car)
Episodic: - Task that have a definite end ( Ex. Games, Chess, Ludo etc.) - Here in the end outcome comes in terms of win or loss.

Monday, 27 February 2023

Bias Variance Tradeoff

Before we get into bias and variance, it's important to understand the different types of error in machine learning.

Reducible error

These errors can be reduced to improve the model's accuracy. This error can be classified as follows:

Bias
Variance

Irreducible error

The errors that cannot be reduced and will always be present in models

Bias and variance explained

Bias is defined as the difference between actual and predicted values; a model with a high bias is oversimplified and less complex and doesn't perform better on test or training data.

Variance is defined as the amount of variation in prediction if different training data is used; ideally, there should not be much variation when the model is changed. High variance implies that the model was not able to map input and output variables.

Tuesday, 14 February 2023

Best Practices: How to Write Clean Python Code

Best Practices: How to Write Clean Python Code

Following are things one should consider while writing machine learning code.

1) Right names

One should provide correct names to variables, functions, data frames, etc. so that one can easily interpret the meaning.

2) Consistent with naming conventions

Always follow the same writing convention across the code.

3) Proper documentation and comments

One should add comments to provide a description of the task to be performed by code.

4) Avoid using redundant text while writing code or providing any description; -Text / name must be easily interpretable

5) Avoid Duplication: Avoid duplication in code by using the same function for one task instead of writing different functions, also applies to other things

Monday, 6 February 2023

What is Natural Language Processing ?

Natural Language Processing (NLP) is a field of study that focuses on the interactions between human natural language and computers. The goal of NLP is to enable computers to understand, interpret, and generate human language. It is an interdisciplinary field that encompasses computer science, artificial intelligence, linguistics, and cognitive psychology. NLP techniques use a combination of machine learning, deep learning, and computational linguistic methods to process and analyze natural language text or speech.

These applications allow computers to understand and interpret human language, making it possible for machines to interact with humans in a more natural way.

Applications of NLP

Text to speech: Transforming text data to speech. Example -Reading articles and generating the article (text) in speech form.
Speech to text : Transforming speech to text . Example - Subtitles in movies.
Language translation : Translating text in one language to other. Example - Google translator
Sentiment analysis : Analyzing sentiments of text to identify the emotional tone in text. Example - Analysis of product reviews and finding sentiments in terms of positive, negative and neutral
Text summarization: Breaking large text into small so that it can be summarized in small text. Example - Book summary in few sentences
Named Entity Recognition: Identify key information from the text and classify into pre-defined categories. Example- Shahrukh Khan belongs to India (Location- India)
Question Answering: Providing answers to questions asked. Example :Q- Capital of India, A- Delhi
Dialogue Systems: System that interacts with human in natural language. Example -ChatGPT
Language Generation- Natural Language Generation (NLG) using machine learning and artificial intelligence. Example - Product descriptions
Language Understanding : Natural Language Understanding (NLU) is used to understand users input either from text or speech by understanding intent. Example - Need to travel to USA-NLU understand intent i.e. travel ,location -USA

Thursday, 19 January 2023

What is Reinforcement Learning ?

Reinforcement learning is a machine learning method in which the machine learns by experimenting with positive and negative rewards. If an experiment is performed and the results are fruitful, then it is a positive reward; if the results are not as expected, then it is a negative reward.

Let's understand this with an example: When humans learn to ride a bicycle, they pedal the bicycle and it moves, which gives a positive experience, but when balance is not maintained, they fall, which is a negative experience. Hence, learning from experiences is known as "reinforcement learning."

Terms associated with reinforcement learning

Agent : Entity that explores the environment and then act on it.

Environment : Defined as surroundings in which the agent is surrounded by, which is random in general

Action : Moves that are taken by agent within the environment

State : Situation which is returned by environment after agent takes the action

Reward : Feedback received from environment after agent has taken the action

Policy : Strategy applied by agent for the next action basis current state

Value : Expected long terms return of agent considering discount factor

Q-Value : Similar to value but it also takes current action in account