Class Notes of DATA 525 (Fall 2023) =================================== Monday, October 23 ------------------ Slide 9.4: ---------- (5 - 1)**2 + (6 - 4)**2 = 20 (5 - 6)**2 + (6 - 5)**2 = 2 (5 - 1)**2 + (6 - 5)**2 = 17 .... Monday, October 16 Friday, October 13 Wednesday, October 11 --------------------- Entropy = Σ[-pj(log2pj)] for all j Gini Index = 1 – Σ(pj2) for all j Classification Error = 1 – max{pj} P(Bus) = 0.4 P(Car) = 0.3 P(Train) = 0.3 Entropy = -0.4xlog2(0.4) - 2x0.3xlog2(0.3) = -0.4xlog(0.4)/log(2) - 2x0.3xlog(0.3)/log(2) = 0.159/log(2) + 0.31/log(2) = 0.469 / 0.3 = 1.57 Gini = 1 - 0.4x0.4 - 0.3x0.3 x 2 = 0.66 Class = 1 - 0.4 = 0.6 log2(x) = log10(x) / log10(2) Travel Cost / Km: Cheap: P(Bus) = 0.8 P(Car) = 0.0 P(Train) = 0.2 Entropy = -0.8xlog2(0.8) - 0.2xlog2(0.2) = 0.72 Gini = 1 - 0.8x0.8 -0.2x0.2 = 0.32 Class = 1 - 0.8 = 0.2 Standard: P(Bus) = 0.0 P(Car) = 0.0 P(Train) = 1.0 Entropy = -1.0xlog2(1.0) = 0 Gini = 1 - 1.0x1.0 = 0 Class = 1 - 1.0 = 0 Expensive: P(Bus) = 0.0 P(Car) = 1.0 P(Train) = 0.0 Entropy = -1.0xlog2(1.0) = 0 Gini = 1 - 1.0x1.0 = 0 Class = 1 - 1.0 = 0 Information gain( Travel ) = Entropy of parent table D – Σ( |k|/|n| × Entropy of each value k of subset table Si ) = 1.57 - ( 5/10 x 0.72 ) = 1.21 Gini: Class: Monday, October 09 Friday, October 05 -------------------- PageRank 1. Web Hyperlink Matrix 0 1 0 0 H = [ 0 0 1 0 ] 0.5 0 0 0.5 0 0 0 0 2. Dangling Node Fix S = H + d x w 0 1 0 0 0 S = [ 0 0 1 0 ] + [ 0 ] x [ 1/4 1/4 1/4 1/4 ] 0.5 0 0 0.5 0 0 0 0 0 1 0 1 0 0 0 0 0 0 = [ 0 0 1 0 ] + [ 0 0 0 0 ] 0.5 0 0 0.5 0 0 0 0 0 0 0 0 1/4 1/4 1/4 1/4 0 1 0 0 = [ 0 0 1 0 ] 0.5 0 0 0.5 0.25 0.25 0.25 0.25 3. G (Google Matrix) = αS + (1 - α)Iv 0 1 0 0 = 0.85 x [ 0 0 1 0 ] + 0.5 0 0 0.5 0.25 0.25 0.25 0.25 1 0.15 x [ 1 ] x [ 0.25 0.25 0.25 0.25 ] 1 1 0 0.85 0 0 = [ 0 0 0.85 0 ] + 0.425 0 0 0.425 0.21 0.21 0.21 0.21 0.0375 0.0375 0.0375 0.0375 [ 0.0375 0.0375 0.0375 0.0375 ] 0.0375 0.0375 0.0375 0.0375 0.0375 0.0375 0.0375 0.0375 3/80 71/80 3/80 3/80 = [ 3/80 3/80 71/80 3/80 ] 37/80 3/80 3/80 37/80 1/4 1/4 1/4 1/4 4. PageRack vector = Pi**0 x G = Pi**1 3/80 71/80 3/80 3/80 = [ 1/4 1/4 1/4 14 ] x [ 3/80 3/80 71/80 3/80 ] 37/80 3/80 3/80 37/80 1/4 1/4 1/4 1/4 (1 x 4) x (4 x 4) = (1 x 4) = [ (3/80+3/80+37/80+20/80)/4 ... ] = [ 63/320 ... ] = [ 0.197 ... ] = Pi**1 Pi**1 x G = Pi**2 3/80 71/80 3/80 3/80 = [ 0.197 0.303 0.303 0.197 ] x [ 3/80 3/80 71/80 3/80 ] 37/80 3/80 3/80 37/80 1/4 1/4 1/4 1/4 = [ (3/80)x0.197+(3/80)x0.303+0.303x(37/80)+0.197x(1/4) ... ] = Pi**3 Slide 7.5 ---------- 0.023 + 0.023 = 0.046 =~ 0.045 0.166 =~ 0.061 + 0.035 + 0.071 = 0.167 0.304 / 5 = 0.0608 =~ 0.061 Friday, September 29 -------------------- Slide 6.17 ---------- Age: 40 Salary: 90 Class: G Wednesday, September 27 ----------------------- Slide 6.7 --------- 1: the->of->and 2: by->0n 3: not->must->each 4: ... Slide 6.5 --------- a a b b aabb: start => 0 => 0 => 1 => 2 => 3 (accepted) a a a aaab: start => 0 => 0 => 1 => (failed) a a b b aabb: start => 0 => 1 => 1 => 2 => 3 (accepted) Friday, September 08 -------------------- rwx rwx rwx --- --- --- me g 3rd --- --- --- 111 001 110 --- --- --- 7 1 6 Wednesday, August 30 -------------------- Page 1 Page 2 Page 3 ... Page 500 Search for HTML on the 500 pages. _________________________________ Inverted list: C: 210, 390 HTML: 10, 15, 400, 460 Java: 4, 28, 300 ... XML: 70, 31, 300, 400, 410 Search for HTML on the keywords. _________________________________ Web page structure as a tree: und.edu | | | | | DS sports research CEM ... | | | class degree ... ...