Transcription

1 Decision Rule Extrction from Trined Neurl Networks Using Rough Sets Alin Lzr nd Ishwr K. Sethi Vision nd Neurl Networks Lbortory Deprtment of Computer Science Wyne Stte University Detroit, MI 48 ABSTRACT The bility of rtificil neurl networks to lern nd generlize complex reltionships from collection of trining exmples hs been estblished through numerous reserch studies in recent yers. The knowledge cquired by neurl networks, however, is considered incomprehensible nd not trnsferble to other knowledge representtion schemes such s expert or rule-bsed systems. Furthermore, the incomprehensibility of knowledge cquired by neurl network prevents users to gin better understnding of clssifiction tsk lerned by the network. The im of the present pper is to describe method tht cn help to mke the knowledge embedded in trined neurl network comprehensible, nd thus trnsform neurl networks into powerful knowledge cquisition tool. Our method is bsed on rough sets, which offer useful frmework to reson bout clssifiction knowledge but lck in generliztion cpbilities. Unlike mny existing methods tht require trining exmples s well s the trined network to extrct the knowledge embedded in numericl weights, our method works only with the weight mtrix of trined network. No trining exmples re required. The suggested method hs been pplied to severl trined neurl networks with gret success. Keywords: Rough Sets nd Dt Mining, Artificil Neurl Network, Rule Extrction.. Introduction Knowledge cn be seen s orgnized dt sets, with the bility to perform clssifiction. Hence forml frmework cpble of resoning bout clssifictions nd delivering implicit fcts from explicit knowledge would be very helpful. Artificil neurl networks nd expert system cn be combined to obtin such frmework. For neurl networks the knowledge dtbse is the neurl network which is utomticlly creted, from the trining exmples, when the lerning lgorithm is run. The knowledge is represented by the weights of the connections between the neurons, the threshold vlues nd the ctivtion function. Becuse of the implicit knowledge representtion, it is not possible to identify for problem the description t the neuron level. This is why neurl networks re often nmed "blck boxes". On the other side in the conventionl expert systems knowledge representtion is bsed on rules. Becuse of the explicit knowledge representtion, these systems hve explntory bilities, re redble nd interpretble but they re difficult to construct. The dvntges nd disdvntges of the neurl network systems nd rule-bsed systems re complementry, so it is desirble to integrte the dvntges of both types of systems. One wy is to extrct input-output decision rules from trined feedforwrd neurl network. The extrcted rules cn be used to understnd better the internl representtion developed by the network, to extend neurl network systems to "sfetycriticl" problem domins, to improve the generliztion of neurl networks solutions, for dt mining, for knowledge cquisition for symbolic rtificil intelligent systems. Severl methods to perform rule extrction hve been suggested in the literture. Sethi nd Yoo described n pproch for symbolic mpping 9, of neurons in neurl networks, bsed on the ssumption tht hidden units often tke on extreme vlues nd thus cn be pproximted s perceptrons. This ssumption llows ech neuron nd the whole network to be represented in symbolic Boolen form or in symbolic multivlued form. A bcktrcking serch procedure is used to perform the symbolic mpping. The im of this pper is to improve Sethi nd Yoo's pproch 9, by using the rough logic insted the bcktrcking serch for the symbolic mpping. The proposed method consists of three min stges. During the first stge, completely specified decision tble is constructed using the trined neurl network nd knowledge bout the domin of ech the neurl network

2 inputs. The second stge consists of pplying rough logic to the completely specified decision tble to find reduced decision tble. The third stge consists of pplying greedy lgorithm to select the miniml set of decision rules exhibiting the clssifiction behvior of the trined neurl networks. The orgniztion of the pper is s follows. Section briefly describes the symbolic mpping pproch to rule extrction; Section 3 presents some theoreticl fcts bout rough sets; Section 4 describes our new method for rule extrction; Section 5 contins experimentl comprison using two dt sets for the two symbolic mpping methods. Finlly, Section 6 contins the summry of the work, the discussion nd the future work.. Symbolic Rule Extrction The symbolic pproch 9, for rule extrction uses trined neurl network, the neurl network weights nd the threshold vlues in order to generte n equivlent propositionl logic rule set. First, n dmissible trnsformtion of threshold logic llows ll negtive weights to be converted to positive weights for unified tretment. Then, the trnsformed weights re used s inputs for bcktrcking serch procedure. The extension from Boolen logic representtion to multiple-vlued logic representtion (MVL) llows integer multiple-vlues for ech input neuron not only binry vlues. An encoding scheme is used to represent the multiple-vlued vribles in [,.] intervl... Symbolic Representtion of Neuron (SM) The symbolic mpping of neuron is bsed on the ssumption tht the neurons in feedforwrd neurl network, fter trining, tke on extreme vlues nd effectively implement threshold function, even if they hve either logistic or hyperbolic tngent function. Given the input weights nd the threshold vlue long with the encoding scheme, the Boolen function represented by perceptron cn be obtined s follows: () list ll possible input combintions nd corresponding summtions of weights; () for ech summtion greter or equl to the threshold, set the Boolen function vlue to nd form product term represent the input combintion; (3) express the product terms in simplified disjunctive norml form. When the number of neurons increses this kind of exhustive enumertion procedure becomes unmngeble. A bcktrcking serch procedure seems to be good solution, both for binry inputs 9 nd MVL inputs... Bcktrcking Serch for Symbolic Mpping The bcktrcking serch mkes use of two functions: the bounding function nd the solution function. The bounding function elimintes the possibility of generting ny unnecessry subpth tht cnnot led to solution, while the solution function ensures tht no unnecessry subpth, which cnnot led to solution is generted. The lgorithm first converts ny negtive weight to positive weight using n dmissible trnsformtion. For the binry cse, the inputs re sorted by the corresponding weight vlues. For the multivlued inputs the ordering is done bsed not only on the corresponding weights but lso on the encoding of ech multivlued vrible, s different vribles my not hve the sme encoding representtion. Then bcktrcking serch tree is grown using the bounding nd the solution functions. From the tree binry or multivlued Boolen function is obtined. Becuse it is bcktrcking serch the mximum time is exponentil. 3. Rough Sets Rough set theory 8 ws designed to nlyze the clssifiction of imprecise, uncertin or incomplete dt. This theory is bsed on finite sets, equivlence reltions nd crdinlities. It cn be used for decision mking or clssifiction, dt nlysis, discovering dependencies in dt, nd dt reduction or informtion extrction. 3.. Bsic Notions An informtion system is dt tble where ech column is lbeled by n ttribute, nd n object lbels ech row. Ech row represents some piece of informtion bout the corresponding object.

3 Formlly, n informtion system cn be defined s pir (U, A) where U is non empty, finite set clled universe nd A is nonempty, finite set of primitive ttributes. A decision tble (Tble ) is simple wy of looking t n informtion system. Let S=(U, A) be n informtion system nd let C, D be two subsets of ttributes. C nd D re clled condition nd decision ttributes, respectively. A decision tble is n informtion system with distinguished condition nd decision ttributes. Usully the set of decision ttributes hs the crdinllity equl with one. One of the most importnt concepts in rough set theory is the indiscernbility reltion. For ech possible subset of condition ttributes B A, n indiscernbility reltion is n equivlence reltion, where two objects re in the sme prtition if nd only if they cn not be discerned from ech other on the bsis of ttributes B. Subsets of interest re sets of objects with the sme vlue for the decision ttribute. We cn define the lower pproximtion of set, s the set formed from the objects tht "certinly" belong to the subset of interest nd the upper pproximtion of set, s the set formed from the objects tht "possibly" belong to the subset of interest. Now, rough set is ny subset defined through its lower nd upper pproximtion. 3.. Reducts All the subsets B A, tht preserve the indiscernbility reltion nd re miniml re clled reducts. We cn define the reducts reltive to the decision ttribute d or reltive to n object. A decision tble relted reduct is sid to be reltive with the decision ttribute d if it preserves the prtition relted with the decision clsses, but not necessrily the full indiscernbility reltion. A x -reltive reduct keeps the minimum mount of informtion needed to discern the object x. k This kind of reduction preserves the object-relted discernbility. x k k 3.3. Decision Systems (Tble ) p s A decision rule is n impliction, of type. For ech object, certin vlues of the condition ttributes determine the vlue of the decision ttribute. We define decision system s finite collection or set of decision rules. The order of the rules does not mtter s in the cse of n lgorithm. Any subset of ny decision system is decision system, itself Decision System Minimiztion In order to obtin decision system with minimum number of rules, superfluous decision rules ssocited with the sme decision clss should be eliminted without disturbing the decision mking process. Let D be bsic decision system nd let S=(U, A) be n informtion system. The set of ll decision rules in D hving the sme successor s, s decision ttribute vlue, is denoted by D s. We cn sy tht D s is the decision subsystem ssocited with clss s. The set of ll predecessors, condition ttributes vlues, of decision rules belonging to D s, is denoted by P s. We sy tht decision rule p s from D is dispensble in D if the disjunction of ll elements in P s is equivlent with the disjunction of ll elements in P s \{s}. Otherwise the rule is indispensble. If every decision rule belonging to D s, is indispensble the decision system D s is sid to be independent. A subset D' of D, is reduct decision system, if D' is independent nd the disjunction of ll elements in P s is equivlent with the disjunction of ll elements in P s '. A decision subsystem D s ssocited with clss s is reduced, if the reduct of D s, it is D s itself. For decision system, if ll its decision subsystems D s re reduced, we sy tht it is miniml. Let consider the decision tble from Tble. The ssocited decision system is shown in Tble. The bove theory implies tht there re miniml decision subsystems for clss.. b. b d d

4 The set of objects from clss is {, }. Rule number one covers the set {,}. Rule number two lso, covers the set {,}, where the set {} is covered by the condition bd nd the set {} is covered by d. We hve to choose between the two miniml decision systems tht re shown below: b b d d d b d d b d 3.5 Minimiztion Algorithm for Decision Systems (DSM) Severl miniml decision systems cn be derived from decision tble. For generl pplictions it is necessry to compute ll the decision systems. However, if we re interested in compct decision system we will look not only t the number of decision rules but lso t the number of conditions for ech rule. We wnt our decision system to be miniml in terms of the number of rules nd of the number of conditions. For the exmple given bove the first decision system hs less conditions thn the second one. The steps of the Decision System Minimiztion Algorithm re:. Hving the reduced decision tble for set of objects U, estblish the initil decision subsystems for ech clss;. For ech decision subsystem or for ech clss repet steps 3 to 8; 3. Initilize n empty set S; 4. Compute the union R of set S with the set of objects covered by ech decision rule for the initil decision system; 5. Find the decision rule, which hs the gretest crdinlity for the set R ssocited. This mens tht we re looking for the decision rule tht covers the gretest number of objects in set S. If there re severl such rules choose the rule with minimum number of conditions. Memorize the decision rule; 6. Mke the union between S nd the set of objects covered by the chosen decision rule; 7. If S is not equl with the set of objects U go to step 3; 8. Generte the symbolic representtion in disjunctive norml form for the miniml decision system for this clss. 4. Extrction of Knowledge Hving in mind the rough set pproch nd hving the input weights nd the threshold vlues for the given trined feedforwrd neurl network, the decision rules represented by network of perceptrons cn be obtined in the following wy: Complete the decision tble, by listing ll input dt nd the corresponding summtion weights. For ech summtion tht is greter thn or equl to the threshold vlue set the Boolen function to, else to. Simplify the decision tble. First, find the reducts for the condition ttribute set (full discernbility). Remove duplicte rows. Find vlue-reducts of condition ttributes (object discernbility). Agin, remove redundnt decision rules. We used Rosett system 8, 9 for the computtion of the reducts nd for the rule genertion (Tble ). Apply the Decision System Minimiztion Algorithm(DSM) nd find miniml decision rule set. 5. Lupus Exmple 5. Exmples In the Lupus decision tble every object corresponds to ptient's record nd every ttribute is dignostic criteri, used to determine the presence of Systemtic Lupus Erthemtosus (SLE). There re symptoms tht re used s dignostic criteri. The presence of four or more of these criteri in ptient implies the dignosis of SLE. ptient cses were considered, hlf with the dignosis nd hlf without, under the ssumption of independence of ech symptom. 5 cses were used for trining the feedforwrd neurl network nd obtining the weights. We considered two representtions. First, ech ptient is represented s n -componet binry vector X ' = [ x, x,... x]. Applying the binry SM procedure 9 we found n expression of 5 terms, 56 terms of 3 conditions, 67 terms of 4 conditions nd terms of conditions, totl of 44 conditions. Applying the new DSM we found n expression of 5 terms, 3 terms of 5 conditions, 64 terms of 4 conditions, nd 59 terms of conditions, totl of 439 conditions.

5 For this first representtion the performnce for the two methods is lmost the sme. The rule expression is big nd therefore, hrd to hndle. The second representtion consisted in 4-component compct MVL representtion Y = [ y, y, y, ] for ech ptient. The reltionship between the two representtions is: nd is {, }, for y is {,,, 3, 4} nd for y is {,,, 3, 4, 5}. y y4 3 y 5 = x, y = xi, y3 = x i y4 = i= i= 6 ' 3 y4, x. The input domin for Applying MVL symbolic mpping 7 terms expression with totl of 5 conditions ws found. For the DSM 7 terms expression nd totl of 4 conditions ws obtin. The rules obtined for the clss, re shown in Tble 3. Putting together ll these rules we obtin the following expression: {3,4} {5} {} {} {} {,3,4} {} {3,4} {} {,,3,4} {} {} {} d = y y y y y y y y y y y y y3 In this cse the number of rules in the decision system is much smller thn for the binry representtion, no mtter which of the two methods is used. Shorter decision systems re better from the prcticl point of view, becuse n expert cn use them much esier. Now, compring the two methods, the expression generted by our method DMS is shorter thn the expression obtined with the MLV SM pproch. 5. User Exmple User dt 7 consists from set of 49 two-dimensionl continuous exmples nd is shown in Figure. This dt belongs to two clsses of computer usge t University of London Computer Center. Like for the MVL SM pproch, in order to pply the DMS method we hve to quntize the dt first. After quntiztion 7 the decision tble from Tble 4. ws obtined. The following rules were generted pplying MVL SM pproch: {3} {,3} {,} {} {} {} {3} c = c = {} {,} {,3} {} {} {} {} Applying our DMS method we obtined better results in terms of the length of the expression. The generted rules re shown in Tble 5. The expressions for both clsses re: {3} {,,3} {} {} {} c = c = {} {,,3} {} {} {} Agin, we cn observe tht our method, the DMS method, genertes shorter expressions or decision systems even in this cse when the dt size is smll. 6. Summry This pper presented briefly the symbolic mpping pproch to rule extrction nd then the rough set pproch nd the DSM lgorithm. Both pproches revel the knowledge embedded in the numericl weights of feedforwrd neurl network with binry, MVL, or continuous inputs. We hve shown on two exmples tht our method genertes better results in terms of number of rules nd number of conditions. For the binry exmple, the binry Lupus dt, the results re lmost the sme. The second exmple, the MVL Lupus dt, gives us much shorter expression, which cn be used esier by n expert. Also, the expression generted by the DSM is shorter thn the one generted by the MVL SM. The sme thing ws observed in the cse of the continuous exmple, on User dt. In this cse for both pproches the continuous inputs were first trnsformed in MVL inputs through qutiztion process. The rules generted by the rough sets pproch nd DSM lgorithm covered ll the objects from the initil decision tble. The only problem is how ccurte ws the output generted by the neurl network. Our method cn be viewed s rule-bsed clssifier system bsed on neurl networks, rough set theory nd Boolen resoning. Neurl networks provide good generliztion for dt. Rough sets theory mnges very well MVL dt in order to perform clssifictions nd extrct rules. Further work will be done in order to show the importnce of the neurl networks in this hybrid system. We wnt to generte two decision systems: one from the decision tble tht contins ll combintions of input vlues nd the other from the

Binry Representtion of Numbers Autr Kw After reding this chpter, you should be ble to: 1. convert bse- rel number to its binry representtion,. convert binry number to n equivlent bse- number. In everydy

Helicopter Theme nd Vritions Or, Some Experimentl Designs Employing Pper Helicopters Some possible explntory vribles re: Who drops the helicopter The length of the rotor bldes The height from which the

Algebr Review How well do you remember your lgebr? 1 The Order of Opertions Wht do we men when we write + 4? If we multiply we get 6 nd dding 4 gives 10. But, if we dd + 4 = 7 first, then multiply by then

Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

DlNBVRGH + + THE CITY OF EDINBURGH COUNCIL Sickness Absence Monitoring Report Executive of the Council 8fh My 4 I.I...3 Purpose of report This report quntifies the mount of working time lost s result of

The nlysis of vrince (ANOVA) Although the t-test is one of the most commonly used sttisticl hypothesis tests, it hs limittions. The mjor limittion is tht the t-test cn be used to compre the mens of only

5.2. LINE INTEGRALS 265 5.2 Line Integrls 5.2.1 Introduction Let us quickly review the kind of integrls we hve studied so fr before we introduce new one. 1. Definite integrl. Given continuous rel-vlued

1 Exmple A rectngulr box without lid is to be mde from squre crdbord of sides 18 cm by cutting equl squres from ech corner nd then folding up the sides. 1 Exmple A rectngulr box without lid is to be mde

Plotting nd Grphing Much of the dt nd informtion used by engineers is presented in the form of grphs. The vlues to be plotted cn come from theoreticl or empiricl (observed) reltionships, or from mesured

Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd business. Introducing technology

CHAPTER 1 Mtrix Algebr PREAMBLE Tody, the importnce of mtrix lgebr is of utmost importnce in the field of physics nd engineering in more thn one wy, wheres before 1925, the mtrices were rrely used by the

Lerning Objectives Loci nd Conics Lesson 3: The Ellipse Level: Preclculus Time required: 120 minutes In this lesson, students will generlize their knowledge of the circle to the ellipse. The prmetric nd

Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

Why network is n essentil productivity tool for ny smll business Effective technology is essentil for smll businesses looking to increse the productivity of their people nd processes. Introducing technology

mth ppliction: volumes of revolution, prt ii Volumes of Revolution: The Disk Method One of the simplest pplictions of integrtion (Theorem ) nd the ccumultion process is to determine so-clled volumes of

Fctoring Polynomils Some definitions (not necessrily ll for secondry school mthemtics): A polynomil is the sum of one or more terms, in which ech term consists of product of constnt nd one or more vribles

Unleshing the Power of Cloud A Joint White Pper by FusionLyer nd NetIQ Copyright 2015 FusionLyer, Inc. All rights reserved. No prt of this publiction my be reproduced, stored in retrievl system, or trnsmitted,

19. The Fermt-Euler Prime Number Theorem Every prime number of the form 4n 1 cn be written s sum of two squres in only one wy (side from the order of the summnds). This fmous theorem ws discovered bout

Vendor Presented By DATE Using the scores of 0, 1, 2, or 3, plese rte the vendor's presenttion on how well they demonstrted the functionl requirements in the res below. Also consider how efficient nd functionl

Bsic Anlysis of Autrky nd Free Trde Models AUTARKY Autrky condition in prticulr commodity mrket refers to sitution in which country does not engge in ny trde in tht commodity with other countries. Consequently

Smll Business Cloud Services Summry. We re thick in the midst of historic se-chnge in computing. Like the emergence of personl computers, grphicl user interfces, nd mobile devices, the cloud is lredy profoundly

Vectors 2. Recp of vectors Vectors re directed line segments - they cn be represented in component form or by direction nd mgnitude. We cn use trigonometry nd Pythgors theorem to switch between the forms

Helth insurnce mrketplce Wht to expect in 2014 33096VAEENBVA 06/13 The bsics of the mrketplce As prt of the Affordble Cre Act (ACA or helth cre reform lw), strting in 2014 ALL Americns must hve minimum

Project 6 Aircrft sttic stbility nd control The min objective of the project No. 6 is to compute the chrcteristics of the ircrft sttic stbility nd control chrcteristics in the pitch nd roll chnnel. The