Machine Learning Made Easy

Thursday, 4 August 2016

Paper Review: Knowledge Representation In Sanskrit And Artificial Intelligence - Author: Rick Briggs

This is an interesting paper which discusses on, how Sanskrit could be a best natural language for computer processing/Artificial Intelligence (AI). Over the decades scientific community is trying hard to identify and design systems which can represent and process natural language. English is widely spoken language and we intend machines to learn English and process the data. However, one cannot program systems using natural English language, we have to reframe or rephrase in a systematic way so that system can understand. In this paper author explains how Sanskrit is significantly advanced than English and can be made use in the field of artificial intelligence. The paper has three parts where first; knowledge representation scheme is discussed using semantic nets. In second part author outlines methods used by ancient Indian Grammarians to analyze sentence unambiguously. Finally, equivalence is established between Sanskrit language analysis to the techniques used in applications of AI.

When attempts of machine translation failed to teach a computer to understand natural language AI turned to knowledge representation. When we try to teach a machine any natural language, it should not be always a word to word mapping. One has to overcome ambiguity of words in natural language and interference of syntax. To overcome the ambiguity of words, there should be a representation of meaning independent of words used. Author takes three sentences as examples to demonstrate a prototypical semantic net system.

1. “John gave the book to Mary”

The grammatical information can be transformed into an arc and a node. The above sentence can be

stored as triples.

give, agent, John

give, object, ball

give, recipient, mary

give,time, past

This can be schematically represented as below:

Figure 1: Schematic Representation of sentence “John gave the book to Mary” (Rick Briggs, 1985).

2. “John told Mary that the train moved out of the station at 3 o’ clock.”

As the below figure shows there was a change in state in which the train moved to unspecified location from the station. It went to the former at 3:00 and from latter at 3:00. We can now covert this to triples like previous example. Here Verb is given significance and is considered the focus and distinguishing aspect of the sentence.

Figure 2: Schematic Representation of sentence “John told Mary that the train moved out of the station

at 3 o’ clock.” (Rick Briggs, 1985).

There are other sentences when drawn as above nets will represent only a state of a thing or an event.

3. “John, a programmer living at Maple St., gives a book to Mary, who is a lawyer.”

The above statement if read as semantic net it would give an awkward and cumbersome representation. The degree to which a semantic net is cumbersome and odd-sounding in a natural language is the degree to which that language is “natural” and deviates from the precise or “artificial.” Refer to the below image which explains the same.

Figure 3: Schematic Representation of sentence “John, a programmer living at Maple St., gives a book to Mary, who is a lawyer.” (Rick Briggs, 1985).

Author gives brief history of Sanskrit grammarians, Panini who lived during 4th century BCE gave a strong foundation to the Sanskrit grammar. Panini’s successors like Bhartrhari gave algebraic formulation for grammar and tried to improve upon them. During the 16^th century Kaundabhatta and Bhattoji Dikshita gave new touch to the existing grammar with their publication of Bhattoji Dikshita’s Vaiyakarana-bhusanasara. Similarly during 17^th century Nagesha contributed to the language with his major work on Vaiyakaranasiddhantamanjusa, or Treasury of definitive statements of grammarians. Author sites these grammarians and makes a strong point that the Sanskrit is not only a simple spoken language but has a scientific and mathematical backbone to it.

Part 2: Sanskrit Language Analysis and Its Equivalence with Techniques Used In Applications of AI

Sanskrit is unique and advanced because unlike other linguistic theories, it does not work Noun-Phrase model. In Indian analysis sentence expresses an action that is conveyed by verb and set of auxiliaries. The verbal action is represented by the root of the verbal form, the auxiliary activities by nominal (noun, adjectives etc.) and their case endings.

Meaning of verb in Sanskrit is Vyapara (Action) + Phala (Result)

In general verb is defined as “to do”. However, Sanskrit language is architected in such a way that the sentence provides not only the action but, also the other details as well such as tense, quality of the agent involved (Singular, Double, Plural) and the degree of the agent (First, Second, Third).

Ex: Gramam Gacchati Chaitra (Chaitra is going to village), - “An act of going taking place in the present of which the agent is no one other than Chaitra qualified by singularity and here object something not different from village.”

“John Gave the Ball To Mary” – this sentence has verbal meaning “to give” but has many auxiliary activities such as, John holding the ball, an act of movement starting from John, an act of giving, act of receiving etc. It is important for one to know where to stop the splits. While defining the verb Sanskrit clarifies that the name ‘action’ cannot be applied to solitary point reached by extreme sub-division. In these types of sentences, auxiliary activities become subordinated to the main sentence meaning. These auxiliary activities will be represented by case endings in Sanskrit. There are seven types of case endings in Sanskrit out of which six are definable representation of auxiliary activities (Agent, Object, Instrument, Recipient, Point of Departure and Locality), seventh is genitive which is not represented by other six.

The case endings are explained by taking below sentence as example:

“Out of friendship, Maitra cooks rice for Devadatta in a pot, over a fire.”

Here the total process of cooking is rendered by the verb form “cooks” as well as a number of auxiliary actions:

1. An Agent represented by the person Maitra

2. An Object by the “rice”

3. An Instrument by the “fire”

4. A Recipient by the person Devadatta

5. A Point of Departure (which includes the causal relationship) by the “friendship” (which is between Maitra and Devadatta)

6. The Locality by the “pot”

This explanation shows how Sanskrit is advanced and stands out from other languages.

Author gives another example to show how Sanskrit sentence formation is detailed when compared to English. Consider the below sentence in accordance with Sanskrit.

“Because of the wind, a leaf falls from a tree to the ground.” – Here wind is the instrument bringing leaf. Tree is point of departure. Ground is locality and Leaf is agent.

When we consider the same sentence in accordance with English the above sentence can be written as “The wind blows a leaf from the tree” here wind becomes agent and leaf will be considered as object. This sentence is transitive whereas the earlier one was intransitive.

In the final section author tries to establish equivalence between Sanskrit language and techniques used in AI (semantic nets). Both these systems stands on extensive degree of specification which is crucial in understanding the real meaning of the sentence to the extent that it will allow inferences to be made about the facts not explicitly stated in the sentence.

“Out of friendship, Maitra cooks rice for Devadatta in a pot over a fire” – This sentence when represented in semantic nets, it will have triples as below

cause, event, friendship

friendship, objectl, Devadatta

friendship, object2, Maitra

cause, result cook

cook, agent, Maitra

cook, recipient, Devadatta

cook, instrument, fire

cook, object, rice

cook, on-lot, pot.

The same sentence in Sanskrit can be rendered as

cook, agent, Maitra

cook, object, rice

cook, instrument, fire

cook, recipient, Devadatta

cook, because-of, friendship

friendship, Maitra, Devadatta

cook, locality, pot.

Author makes a point that, to make AI more improved one has to adopt Phala/Vyapara distinction which is in Sanskrit. This helps is elaborating sentence, in the above case we can include the process of “heating” and the process of “making platable”. These comparisons reveal that Sanskrit is closest language which can be represented by systems. Also below is an easy semantic net for the above sentence.

Figure 4: Schematic Representation of sentence “Out of friendship, Maitra cooks rice for Devadatta in a pot over a fire.” (Rick Briggs, 1985).

My Views On This Paper: This is an quite old but very interesting paper where author tries to bring equilibrium between AI techniques and Sanskrit grammar. All the industries and scientific community would have huge advantage, if we will be able to represent a natural language for the system processing. To enjoy the content of the paper one should have idea about Sanskrit language (Being said that, I have studied Sanskrit in my school and college). The idea of implementing Sanskrit as a natural language to the systems is very nicely laid out in the paper. Author was able to signify how cumbersome is to represent the semantic nets and how it can be made much simpler using Sanskrit. According to the paper, it is evident that Sanskrit as a language is very descriptive and beats English in the AI race. However, very minimal research is done in this area. Another hurdle would be how many of us would be willing to adopt for Sanskrit as English is widely spoken. A suggestion would be, if we will make two layered system where we can input any natural language and system will process it in terms of Sanskrit, this would be crazy but could be wonderful if we succeed. Author states that Sanskrit has relativity with Mathematics which is true – in Sanskrit there is a way of analyzing words with “Sandhi”. Using Sandhi we can break any word technically and group them under pre-defined category. Also there is a scoring system for each letter in a sentence and grouping them. I see this kind of approach will be useful in AI area. It requires huge research on the concept of Sanskrit being used as natural language for systems/AI. It would be a worth of a research, as it will enlighten us in making easier way to design the system representation. Overall author makes us think in a different direction with his research and views.

References:

1. Rick Briggs (1985) Knowledge Representation In Sanskrit And Artificial Intelligence.

2. Bhatta, Nagesha (1963) Vaiyakarana-Siddhanta-Laghu-Manjusa, Benares (Chowkhamba Sanskrit Series Office).

3. Nilsson, Nils J. Principles of Artificial Intelligence. Palo Alto: Tioga Publishing Co

4. Bhatta, Nagesha (1974) Parama-La&u-Manjusa

Tuesday, 28 June 2016

Plotting Heat Map Using Python

For Machine Learning, practitioners commonly use Python and R because they are open source languages. I was able to learn Python using "Pycharm" which I would strongly recommend for any Python beginners.

I was given a challenge to create
(i) 2-dimensional array of size 100x100 and populate it with random floating points between 0 and 1, inclusive (i.e., [0,1]);
(ii) plot the 2d array using any python library, to create a visual “heat map” representation of the data;
(iii) write a loop that refreshes the numbers in the array and replots the heatmap each time the array is repopulated.

Stretch assignment: Create a movie of the changing heat maps by playing each heat map frame by frame in a sequence.

I was able to generate a heat map as shown in the picture with the following code:

import numpy as np
import matplotlib.pyplot as plt
import matplotlib.animation as animation

def animate(data, im):
im.set_data(data)

def step():
while True:
data = np.random.rand(100, 100)
yield data

fig, ax = plt.subplots()
im = ax.imshow(np.random.rand(100, 100), interpolation='nearest')
ani = animation.FuncAnimation(
fig, animate, step, interval=100, repeat=True, fargs=(im, ))
plt.show()

Thursday, 16 June 2016

Paper Review: A Few Useful Things to Know about Machine Learning - Author: Pedro Domingos

In this paper author summarizes twelve key lessons that will be useful for Machine Learning researchers and professionals. In the last decade the use of Machine Learning has spread rapidly from Spam filters to Drug designing. The purpose of this paper is to provide folk knowledge on Machine Learning, which is not available in any of the current Machine Learning textbooks. The author focuses on the classification type of machine learning in this paper as it is most mature and widely used. A classifier is a system that inputs a vector of discrete and/or continuous feature values and outputs a single discrete value, the class (ex: Spam filters for e-mails).

1. Learning = Representation + Evaluation + Optimization: In the first lesson author highlights on criteria for selecting best algorithms. There are thousands of algorithms available however one has to consider three vital aspects such as Representation, Evaluation and Optimization while choosing the algorithms. It is important to represent a classifier with a formal language that computer can handle similarly, it is also important to choose a representation for a learner and this set is called hypothesis space. If a classifier is not in hypothesis space it cannot be learned. Evaluation function plays important role in distinguishing between good classifiers from bad ones. Optimization plays a key role in the enhancement of efficiency of learner. In addition author provides examples for each of these three components like K-nearest neighbor, Hyper Plane, Decision Trees etc., .

2. It Generalization That Counts: In this section author emphasizes on the need of keeping training and test data separate. It is important to generalize beyond the examples in the training set; this is because we might not encounter the same exact examples during the test time. The classifier may get contaminated if one uses the test data to tune the parameters. To mitigate such issues we can consider cross validation, i.e., randomly dividing test data into ten subsets holding out each one while training on the rest, testing the each learned classifier on the examples that it did not see and averaging the results to see how well does a particular parameter setting does. In flexible classifiers (ex: decision trees) or with linear classifiers one has to follow the separation of data.

3. Data Alone Is Not Enough: Only the availability of huge data will not help in Machine learning. One has to apply the general assumptions like smoothness, similar examples having similar classes, limited dependences or limited complexity. Author infers that learning is inverse of deduction (Induction is learning knowledge and Deduction is going general rules to specific). The most useful learns are those that don’t just have assumptions loaded into them but also makes some space for us to tweak them. Author in this section author compares Machine Learning to farming. Farmers combine seeds with nutrient to grow crops whereas learners combine knowledge with data to grow programs.

4. Overfitting Has May Faces: Insufficient knowledge and data to determine the correct classifier is termed as Overfitting. When classifier outputs 100% correct results in training and 50% in test, it could have rather thrown 75% in each step. Best way to understand Overfitting is by splitting generalization error into bias and variance. Author explains bias and variance with reference to throwing darts at the board. A linear learner high bias and decision tree has low bias, similarly in optimization beam search has lower bias than the greedy search but has higher variance. In Machine learning strong false assumptions can be better than week true ones, because learner with latter needs more data to avoid over fitting. Overfitting can be overcome by Cross-Validation, regularization term and using statistical tests like chi-square. Also, Overfitting (variance) can be overcome by underfitting (bias). The problem of multiple testing is closely related to over fitting.

5. Intuition Fails In High Dimensions: Many algorithms that work fine in low dimensions input however fails in high dimensions, this is termed as curse of dimensionality. In high dimensions all examples look alike, if x_t examples laid out on a d-dimensional grid then its 2d examples are all at same distance from it. So as dimension increases example becomes nearest to neighbor (x_t). Our intuitions come from a three dimensional world, often do not apply in-high dimensional ones. In high dimensions, most of the mass of a multivariate Gaussian distribution is not near the mean but in an increasingly distinct shell around it. There is an effect that counteracts this curse called “blessing of non-uniformity”, in most applications examples are not spread uniformly throughout the instance space but are concentrated on or near a lower-dimensional manifold.

6. Theoretical Guarantees Are Not What They Seem: Learning is a complex phenomenon and we cannot always justify it by theoretical guarantees. We can accept the induction results if we are settling for its probabilistic guarantees. Author provides a probabilistic hypothesis to choose a consistent classifier, unfortunately this type of hypothesis are theoretically accepted but need not work in reality as it lacks accuracy in learning. Given a large enough training set, there will be high probability that the learner would either return a hypothesis that generalizes well or be unable to find a consistent hypothesis. Another type of theoretical guarantee that author mentions is “asymptotic” (if A is better in learning infinite data than B, then B is often better at finite data).

7. Feature Engineering Is The Key: The ease of Machine Learning is determined by the features that it carries. Learning is easy when independent features correlate well with the class. Often, the raw data will not be in the form that is suitable for reading, one has to spend time in constructing features that makes learning easy. Major chunk of the time in Machine Learning project will be spent on feature designing not on learning. It is important to concentrate on how we gather data, integrate it, pre-process it and how much trial and error can go into feature design. Feature engineering is considered to be difficult as it is domain specific. It is also important to automate more and more feature engineering process. Sometimes, features that look irrelevant in isolation may look relevant in combination, so one has to master the art of feature engineering.

8. More Data Beats A Cleverer Algorithm: In this section author discuss on the importance of gathering data, a dumb algorithm with lots and lots of data beats a clever one with modest amount of it. However, challenge is to design a classifier which works smarter with larger data in small amount of time. One should try simple learner first before going to the sophisticated (Naive Bayes before Logistic Regression, k-nearest neighbor before support vector machines). Sophisticated ones are harder to use as they have more knobs that we need to turn to get good results. Often, learners can be divided into; those whose representation has fixed size (linear classifiers) and those whose representation grows with data (decision trees).Variable sized learners in practice have loads of limitations with respect to algorithms, cost effectiveness and curse of dimensionality. Hence clever algorithms that make most out of the data and computing resource often pay off in Machine Learning.

9. Learn Many Models, Not Just One: Author highlights the importance of combining many variations of learner which produces better results. Creating model ensembles is now standard in Machine Learning space. There are techniques like Bagging, where we can simply generate random variations of the training set by resampling, learn a classifier on each and combine the results by voting. This greatly reduces variance while only slightly increasing bias. Similarly we have Boosting which works on varied weights as training examples and Stacking where output of individual classifier becomes input of high level learner that figures out how best to combine them. In addition to this author mentions on how teams combined their learners to get best results in Netflix prize. Model ensembles should not be confused with Bayesian model averaging (BMA). Ensembles change the hypothesis space and can take a wide variety form. BMA assigns weight to hypotheses in the original space according to fixed formula.

10. Simplicity Does Not Imply Accuracy: If we consider two classifiers with the same training error, the simpler of two will likely have the lowest test errors. There are evidences to prove this also, there are counter examples. One of the counter examples is the generalization error of a boosted ensemble, which continues to improve by adding classifier even after zero. Sophisticated view implies complexity with relation to the hypothesis space where as simpler spaces allow hypothesis to be represented in smaller codes. If we make hypothesis and prefer simpler design, and if they work accurately it’s because of the accuracy in the preference not because of the simple hypothesis. Author concludes this section by mentioning the importance of choosing simpler hypothesis.

11. Representable Does Not Imply Learnable: Just because a function can be represented does not mean it can be learned. Standard decision tree learners cannot learn trees with more leaves than there are in training examples. Given finite data, time and memory standard learners can learn only a few subsets of all possible functions, and these subsets are different for learners with different representations. If representations are exponentially compact then they may require exponentially less data to learn functions. Finding methods to learn the deeper representations is one major research area in the Machine Learning Space.

12. Correlation Does Not Imply Causation: This is the last lesson of this paper where author brings up an interesting topic of correlation and causation. In machine learning whatever the system correlates need not be a true causal. Suppose in the retail data from a super market, if we have beer and diapers brought together then perhaps putting beer and diapers section together will increase the sales. This example can be a Machine learning observation however, but it’s hard to accept it. Some learning algorithms can potentially extract causal information from observational data, but their applicability is restricted. Machine Learning researchers should be aware of their action in predicting the causal not just the correlation between the variables.

Author concludes paper by providing various resources which will help in developing skills in Machine learning

Paper Review: The Discipline of Machine Learning - Author: Tom M. Mitchell

This is a best paper to start understanding Machine Learning. In this paper author explains about Machine learning, its current progress, future long-term research and real world applications. After reading this paper one can understand the basics of Machine Learning and its application in various areas. Machine Learning is about how systems automatically learns algorithms to improves its performance (P ) at task (T) with its experience of (E). The learning task might of various types like data mining, data base updating, programming by example etc. The Machine Learning couples the fields of Computer Science and Statistics, computer science focuses on programming whereas statistics helps in getting best inference from the data.

Currently Machine learning being used for Speech recognition, Medical and Biological sciences, Robot control, Bio-surveillance, Image identification and classification etc. While talking about medical data (structured data), Machine Learning can be used to predict outcome of patient with particular treatment. Speech recognition and face recognition technologies are widely used in mobile, computer applications and in social media (Facebook face tagging), this can help greatly in surveillance systems as well. Machine learning is adopted in US Post Office to automatically sort the letters containing hand written address. Machine Learning is being used to learn the models of gene expression to the astronomical data.

Machine Learning methods play key role in computer science, as it makes us think beyond normal programming. There will be a shift in thinking from “how to program computers” to “how they program themselves”, this way it will help in self-diagnose and self-repair. Current challenges in Machine Learning are how to reduce the supervised learning with the help of unlabeled data, can machine get best training data by itself and how can we make system to understand the relationship between different algorithms. Another challenge would be of maintaining data privacy; while we can train a medical diagnosis system on data from all hospitals in the world it should also maintain the privacy of each subject. On the other hand there are researches on machine learning which are of long run like how would we build a never ending learner. Theories and algorithms using Machine Learning are used to understand the human learning.

Author concludes the paper discussing on ethical issues which may arise from the Machine learning technology. The Machine Learning will be useful in clinical research and medical fields but question arises how would we protect data privacy of an individual in such studies? Similarly one should have enough understanding to maintain privacy of data collected from law enforcement or for marketing purpose.