ID3 is an algorithm to generate a decision tree from a set of examples.
It is used in machine learning.
Many legal codes are some sort of function of several parameters and
are of a nature that can be defined as a decision tree.
For example,
one is making the gun control law. Some of the parameters might
be the type of gun. (Is this a BB-gun, an antique gun, a non-functional
gun, an assault gun, a pistol, etc?) Is the weopon concealed?
And there are the characteristics of the owner of the possessor. Are they
mentally ill, convicted of a crime, have an order of protection against
them, etc.
Was the person's life threatened? Does the person job need it, e. g.
security guard.
And lastly, what is the place: a school, bar, a place where beer is sold
such as some supermarkets.
The answer could be a simple boolean, "permitted" or "not permitted"
Or we could get a classification such that the possession might be
a misdemeanor or a felony. Many states have categories for felonies.
For example, in the State of New York, a class A felonly, the term is life imprisonment,
for class B, the term for a class B felony does not exceed twenty-five years, etc.
(There are other special rules for dispositions, such as those who have
prior convictions or are youthful offendors.)
Then the bulk of the penal code defines offenses to be a class D felony,
a class B felony, etc.
A possible decision tree
Gun Type | Location | Age | Personal Status | Penalty, if any |
antique | | | | No Offense |
BB Gun | | | | N Offense |
Pistol | | 15-20 | | A Class Misdemeanour |
Pistol | | 20-30 | | B Class Felony |
Pistol | | 30-50 | | Felony |
Pistol | | 50-70 | | A Class Misdemeanour |
Rifle | Bar | | | Class B Felony |
Rifle | Schol | | | Class C Felony |
Rifle | Place where Beer is Sold | | | Class A Misdemeanour |
Rifle | other | | | No Offensse |
Assualt | | | | Class B felony |
Rifle | Other | | mentally ill | Class D Felonly |
Rifle | Other | | Convicted of a Felony | Class C Felony |
Rifle | Other | | Other | Class B Misdemeanor |
An ID3 algorithm would take a series of examples such as the
one in the table
above. It would
create a decision tree that would give the same answers as the examples.
It needs to pick which parameter to use for the root node--in the above
tree, it was Gun Type. It picks one that divides a lot of examples
between thee categories.
Often the formula
from information theory is used.
In the above example, the tree was precisely determined from the input data.
That is not always true. In a diagnosis application, there would likely be
lines that contradict each other. Or there would might be several lines with
very similar attributes. All might have the same classification or penalty,
except one. Is that due to a real difference, or
just a random error?
A | B | C | D | Result |
1 | 2 | 3 | 44 | ii |
1 | 2 | 4 | 44 | ii |
1 | 2 | 5 | 44 | ii |
1 | 2 | 3 | 45 | ii |
1 | 2 | 4 | 45 | ii |
1 | 2 | 5 | 45 | ii |
1 | 2 | 3 | 46 | ii |
1 | 2 | 4 | 46 | ii |
1 | 2 | 5 | 46 | iii |
There is a danger of overfitting when there
are lots of examples. Thus, a
pruning process removes those divisions that
don't have much information gain.
Mr. Yeluri set up an example to illustrate these ideas. Since it
is long, I put it at the end in an appendix.
In participatory democracy, we do it a little bit differently. For each
node in the tree, the
users vote on what the next parameter should be. Thus, at the start
the individuals would vote on which of the above questions regarding
place, type of gun and person should be asked first. Then, the group for each
resulting category would be classified again. The individuals can
vote to decide for any node when it comes up to classify it once and for all.
E. G., if the category of assault weopon shows up, individuals might vote
to always ban it and for the category of BB-gun, they may vote not to
consider any other factors and always permit it.
This is a leaf node and the people will then vote on the classification
from the possibilities available.
We established the following terminology, for the input to our experiment:
- title
- a thing on which one chooses, like gun type
- attribute
- one of the possibilities for a title
thus gun type could be
antique
,
bb
, assualt
, rifle
, assault
or shot gun
-
node
- this corresponds to series of pairs of parameter
and type.
for example, the voting might bring us to making a decision on
what to do with those caught with an antique guns, a history of
mental illness, and who are sixteen to twenty years old.
This is a list of title (nature of gun) and type, antique,
mental illness status, and type of history of mental ilness
and age, which is divided into types or ranges.
this is defined as
- of
root node
- age
top of the tree which coresponds to no titles and parameters selected so far.
The voters first vote on which title to choose as the root parameter. Then,
they vote for each attribute of that title, how next to classify it.
as ech of thse achieve quorum,
each of these nodes becomes accessible, that is, the voters might vote on
what title to use for gun-type= antique, choosing from the remaining
titles of personal status, age range, location.
then, they vote on what title to use for gun-type="bb" gun-type="pistol"
gun-type="rifle" gun-hattype="assault."
Each creates a new node which is marked accessible;
and individuals vote for each in turn.
One of
Masters students, Mr. Chaitanya Yeluri, just
implemented this. More details are in his masters project.
After the root node, the users can assign a penalty. Observe in the figure
that we immidetely assign "no offense" to those possessing antique guns.
on the other hand, there arthree levels until we get to gun type ="rifle",m
location="other" and "mental health status" = "felony conviction."
at each node, the users have the option to "vote now" and if that gets more
votes than any title, the users vote on a penalty for same.
How would we apply this to tax law? The ID3 algorithm is not designed to
deal with cases that produce a continuous function, like the amount
of tax that the taxpayer should payer.
(Note, ID3 is used when there are input parameters that are numeric.
The algorithm sorts the examples in numeric order of the parameter.
It then finds the decision pont that gives the greatest information theory gain.)
Thus, for a tax system by ID3,
at each leaf node, there would per be a formula. The titles
in a tax problem might be various income ranghe atues, the number of children,
marital status. And then the users might vote for
those unmarried head-of-household with three children and in the range $20,000
to
$40,000 to make the tax = $500.00 + 0.10*(Income - $20,000). That is
$500.00 on the first $20,000 plus ten percent of the remaining income.
They might put other components of the
formulae in for the amount of state tax they paid,
the mortgage interest, etc.
Reference
,
Quinlan Ross,
Programs for Machine Learning, Morgan
Kaufman, San Mateo CA 1993
Appendix, The Information Theory Formula
We have three
titles,
A
,
B
and
C
.
The possible choices, or parameters for each of these
are:
A | F , G , H
|
B | I , J , K |
C | M , N , P |
Assume the input examples are:
A | B | C | Classification Yes
/No |
F | I | M | N |
F | I | N | Y |
F | I | P | N |
G | I | M | N |
G | I | P | Y |
G | I | M | N |
G | I | P | Y |
G | I | N | Y |
G | J | M | Y |
G | J | P | Y |
F | J | M | N |
F | J | N | Y |
F | J | P | Y |
H | I | N | N |
H | I | N | N |
H | J | N | N |
H | I | P | Y |
H | J | M | Y |
H | J | N | N |
H | J | P | Y |
H | K | M | N |
H | K | N | Y |
H | K | P | Y |
F | K | M | N |
F | K | N | Y |
F | K | P | Y |
The information content of the entire sest is 0.3149.
We try classifying it on the basis of
A,
and we get an information content 1.3328. And it happens by coincidence that if we tried
B for the root node, we
also get 1.3228.
(The three nodes for
a would have
been:
A | B | C | Classification Yes
/No |
F | I | M | N |
F | I | N | Y |
F | I | P | N |
F | J | M | N |
F | J | N | Y |
F | J | P | Y |
F | K | M | N |
F | K | N | Y |
F | K | P | Y |
giving an information content of 4/9 log
2 4/9 + 5/9 log
2 5/9.
A | B | C | Classification Yes
/No |
G | I | M | N |
G | I | P | Y |
G | I | N | Y |
G | J | M | Y |
G | I | M | N |
G | I | P | Y |
G | J | P | Y |
giving an information content of 5/7 log
2 5/7 + 2/7 log
2 2/7.
A | B | C | Classification Yes
/No |
H | I | M | N |
H | I | N | N |
H | J | N | N |
H | I | P | Y |
H | J | M | Y |
H | J | N | N |
H | J | P | Y |
H | K | M | N |
H | K | N | Y |
H | K | P | Y |
giving on an information content of:
5/10 log
2 (5/10) +
5/10 log
2 (5/10) .
By comparison, the sample sets for classifying by C are:
C="M"
A | B | C | Classification Yes
/No |
F | I | M | N |
G | I | M | N |
G | J | M | Y |
F | J | M | N |
H | I | M | N |
H | I | N | N |
H | J | M | Y |
H | K | M | N |
F | K | M | N |
G | I | M | N |
And, if we tried to
split this by
A, we would have:
A | B | C | Classification Yes
/No |
F | I | M | N |
F | J | M | N |
F | K | M | N |
A | B | C | Classification Yes
/No |
G | I | M | N |
G | J | M | Y |
G | I | M | N |
A | B | C | Classification Yes
/No |
H | J | M | Y |
H | K | M | N |
giving a total information content of 0.6438.
If we split C="N" by B
A | B | C | Classification Yes
/No |
F | I | N | Y |
G | I | N | Y |
F | J | N | Y |
H | I | N | N |
H | J | N | N |
H | K | N | Y |
F | K | N | Y |
Then, there are two possibilities breaking it down by A
A | B | C | Classification Yes
/No |
F | I | N | Y |
F | J | N | Y |
F | K | N | Y |
A | B | C | Classification Yes
/No |
G | I | N | Y |
A | B | C | Classification Yes
/No |
H | I | N | N |
H | J | N | N |
H | K | N | Y |
This gives us an information value of 1.5579.
Or we can split it by B, giving us:
A | B | C | Classification Yes
/No |
F | I | N | Y |
G | I | N | Y |
G | I | M | N |
H | I | N | N |
A | B | C | Classification Yes
/No |
F | J | N | Y |
G | J | P | Y |
H | J | N | N |
A | B | C | Classification Yes
/No |
H | K | N | Y |
F | K | N | Y |
This gives us an information value of 1.6253.
The resulting tree chosen would be given by the below: