ID3 is an algorithm to generate a decision tree from a set of examples. It is used in machine learning.

Many legal codes are some sort of function of several parameters and are of a nature that can be defined as a decision tree. For example, one is making the gun control law. Some of the parameters might be the type of gun. (Is this a BB-gun, an antique gun, a non-functional gun, an assault gun, a pistol, etc?) Is the weopon concealed? And there are the characteristics of the owner of the possessor. Are they mentally ill, convicted of a crime, have an order of protection against them, etc. Was the person's life threatened? Does the person job need it, e. g. security guard. And lastly, what is the place: a school, bar, a place where beer is sold such as some supermarkets.

The answer could be a simple boolean, "permitted" or "not permitted" Or we could get a classification such that the possession might be a misdemeanor or a felony. Many states have categories for felonies. For example, in the State of New York, a class A felonly, the term is life imprisonment, for class B, the term for a class B felony does not exceed twenty-five years, etc. (There are other special rules for dispositions, such as those who have prior convictions or are youthful offendors.) Then the bulk of the penal code defines offenses to be a class D felony, a class B felony, etc.

A possible decision tree

Gun Type	Location	Age	Personal Status	Penalty, if any
antique				No Offense
BB Gun				N Offense
Pistol		15-20		A Class Misdemeanour
Pistol		20-30		B Class Felony
Pistol		30-50		Felony
Pistol		50-70		A Class Misdemeanour
Rifle	Bar			Class B Felony
Rifle	Schol			Class C Felony
Rifle	Place where Beer is Sold			Class A Misdemeanour
Rifle	other			No Offensse
Assualt				Class B felony
Rifle	Other		mentally ill	Class D Felonly
Rifle	Other		Convicted of a Felony	Class C Felony
Rifle	Other		Other	Class B Misdemeanor

An ID3 algorithm would take a series of examples such as the one in the table above. It would create a decision tree that would give the same answers as the examples. It needs to pick which parameter to use for the root node--in the above tree, it was Gun Type. It picks one that divides a lot of examples between thee categories. Often the formula from information theory is used. In the above example, the tree was precisely determined from the input data. That is not always true. In a diagnosis application, there would likely be lines that contradict each other. Or there would might be several lines with very similar attributes. All might have the same classification or penalty, except one. Is that due to a real difference, or just a random error?

A	B	C	D	Result
1	2	3	44	ii
1	2	4	44	ii
1	2	5	44	ii
1	2	3	45	ii
1	2	4	45	ii
1	2	5	45	ii
1	2	3	46	ii
1	2	4	46	ii
1	2	5	46	iii

There is a danger of overfitting when there are lots of examples. Thus, a pruning process removes those divisions that don't have much information gain.

Mr. Yeluri set up an example to illustrate these ideas. Since it is long, I put it at the end in an appendix.

In participatory democracy, we do it a little bit differently. For each node in the tree, the users vote on what the next parameter should be. Thus, at the start the individuals would vote on which of the above questions regarding place, type of gun and person should be asked first. Then, the group for each resulting category would be classified again. The individuals can vote to decide for any node when it comes up to classify it once and for all. E. G., if the category of assault weopon shows up, individuals might vote to always ban it and for the category of BB-gun, they may vote not to consider any other factors and always permit it. This is a leaf node and the people will then vote on the classification from the possibilities available.

We established the following terminology, for the input to our experiment:

title: a thing on which one chooses, like gun type
attribute: one of the possibilities for a title thus gun type could be antique, bb, assualt, rifle, assault or shot gun
node: this corresponds to series of pairs of parameter and type.
for example, the voting might bring us to making a decision on what to do with those caught with an antique guns, a history of mental illness, and who are sixteen to twenty years old. This is a list of title (nature of gun) and type, antique, mental illness status, and type of history of mental ilness and age, which is divided into types or ranges. this is defined as
of root node: age top of the tree which coresponds to no titles and parameters selected so far.

The voters first vote on which title to choose as the root parameter. Then, they vote for each attribute of that title, how next to classify it. as ech of thse achieve quorum, each of these nodes becomes accessible, that is, the voters might vote on what title to use for gun-type= antique, choosing from the remaining titles of personal status, age range, location. then, they vote on what title to use for gun-type="bb" gun-type="pistol" gun-type="rifle" gun-hattype="assault." Each creates a new node which is marked accessible; and individuals vote for each in turn. One of Masters students, Mr. Chaitanya Yeluri, just implemented this. More details are in his masters project.

After the root node, the users can assign a penalty. Observe in the figure that we immidetely assign "no offense" to those possessing antique guns. on the other hand, there arthree levels until we get to gun type ="rifle",m location="other" and "mental health status" = "felony conviction." at each node, the users have the option to "vote now" and if that gets more votes than any title, the users vote on a penalty for same.

How would we apply this to tax law? The ID3 algorithm is not designed to deal with cases that produce a continuous function, like the amount of tax that the taxpayer should payer. (Note, ID3 is used when there are input parameters that are numeric. The algorithm sorts the examples in numeric order of the parameter. It then finds the decision pont that gives the greatest information theory gain.)

Thus, for a tax system by ID3, at each leaf node, there would per be a formula. The titles in a tax problem might be various income ranghe atues, the number of children, marital status. And then the users might vote for those unmarried head-of-household with three children and in the range $20,000 to $40,000 to make the tax = $500.00 + 0.10*(Income - $20,000). That is $500.00 on the first $20,000 plus ten percent of the remaining income. They might put other components of the formulae in for the amount of state tax they paid, the mortgage interest, etc.

Reference

, Quinlan Ross, Programs for Machine Learning, Morgan Kaufman, San Mateo CA 1993

Appendix, The Information Theory Formula

We have three titles, A, B and C. The possible choices, or parameters for each of these are:

A	`F`, `G`, `H`
B	`I`, `J`, `K`
C	`M`, `N`, `P`

Assume the input examples are:

A	B	C	Classification Yes /No
F	I	M	N
F	I	N	Y
F	I	P	N
G	I	M	N
G	I	P	Y
G	I	M	N
G	I	P	Y
G	I	N	Y
G	J	M	Y
G	J	P	Y
F	J	M	N
F	J	N	Y
F	J	P	Y
H	I	N	N
H	I	N	N
H	J	N	N
H	I	P	Y
H	J	M	Y
H	J	N	N
H	J	P	Y
H	K	M	N
H	K	N	Y
H	K	P	Y
F	K	M	N
F	K	N	Y
F	K	P	Y

The information content of the entire sest is 0.3149. We try classifying it on the basis of A, and we get an information content 1.3328. And it happens by coincidence that if we tried B for the root node, we also get 1.3228. (The three nodes for a would have been:

A	B	C	Classification Yes /No
F	I	M	N
F	I	N	Y
F	I	P	N
F	J	M	N
F	J	N	Y
F	J	P	Y
F	K	M	N
F	K	N	Y
F	K	P	Y

giving an information content of 4/9 log ₂ 4/9 + 5/9 log ₂ 5/9.

A	B	C	Classification Yes /No
G	I	M	N
G	I	P	Y
G	I	N	Y
G	J	M	Y
G	I	M	N
G	I	P	Y
G	J	P	Y

giving an information content of 5/7 log ₂ 5/7 + 2/7 log ₂ 2/7.

A	B	C	Classification Yes /No
H	I	M	N
H	I	N	N
H	J	N	N
H	I	P	Y
H	J	M	Y
H	J	N	N
H	J	P	Y
H	K	M	N
H	K	N	Y
H	K	P	Y

giving on an information content of: 5/10 log ₂ (5/10) + 5/10 log ₂ (5/10) .

By comparison, the sample sets for classifying by C are:

C="M"

A	B	C	Classification Yes /No
F	I	M	N
G	I	M	N
G	J	M	Y
F	J	M	N
H	I	M	N
H	I	N	N
H	J	M	Y
H	K	M	N
F	K	M	N
G	I	M	N

And, if we tried to split this by A, we would have:

A	B	C	Classification Yes /No
F	I	M	N
F	J	M	N
F	K	M	N

A	B	C	Classification Yes /No
G	I	M	N
G	J	M	Y
G	I	M	N

A	B	C	Classification Yes /No
H	J	M	Y
H	K	M	N

giving a total information content of 0.6438. If we split C="N" by B

A	B	C	Classification Yes /No
F	I	N	Y
G	I	N	Y
F	J	N	Y
H	I	N	N
H	J	N	N
H	K	N	Y
F	K	N	Y

Then, there are two possibilities breaking it down by A

A	B	C	Classification Yes /No
F	I	N	Y
F	J	N	Y
F	K	N	Y

A	B	C	Classification Yes /No
G	I	N	Y

A	B	C	Classification Yes /No
H	I	N	N
H	J	N	N
H	K	N	Y

This gives us an information value of 1.5579.

Or we can split it by B, giving us:

A	B	C	Classification Yes /No
F	I	N	Y
G	I	N	Y
G	I	M	N
H	I	N	N

A	B	C	Classification Yes /No
F	J	N	Y
G	J	P	Y
H	J	N	N

A	B	C	Classification Yes /No
H	K	N	Y
F	K	N	Y

This gives us an information value of 1.6253.

The resulting tree chosen would be given by the below:

Participatory Democracy

Thursday, January 28, 2010

Thoughtful Thursday: The ID3 Machine Learing Algorithm

Reference

Appendix, The Information Theory Formula

No comments:

Post a Comment

Blog and Posting List

About Me

Followers

Blog Archive