# Datawearhousing

1729 words
7 pages

COMP9318 (12S1) ASSIGNMENT 1DUE ON 23:59 21 APR, 2012 (SAT)

Q1. (30 marks)

Consider the restaurant section of http://www.eatability.com.au. One of the basical functionality is that a registered user can rate restaurants. E.g., look at the \Add a Review for 168_" part at http://www.eatability.com.au/au/sydney/168/.

You were asked to help the company to design a star schema to analyze various ratings on restaurants. You also need to list or draw hierarchies associated with each dimension. You only need to show up to three hierarchies for any dimension.1

Write down an MDX query that lists the top-5 restaurants in NSW in terms of food scores in 2011; for each of the restaurant listed,

*…show more content…*

Sort the data:

13 15 16 16 19 20 20 21 22 22 25 25 25 25 33 33 35 35 35 35 36 40 45 46 52 70

Smallet value: A=13

Biggest value: B=70

Total number of values: N=26

Partition into (equi-depth) bins: Smoothing by bin means:

(2) Use the MaxDiff method to obtain the bin boundaries (using bin means to smooth the data within a bin and using a total of 9 bins). Also calculate its SSE (Sum of Square Error).

You need to illustrate your steps.

Unsort the data: 15, 19, 22, 25, 25, 45, 13, 16, 16, 33, 20, 20, 21, 22, 52, 70, 25, 25, 40, 46, 35,

35, 35, 33, 36, 35.

Sort the data:

13 15 16 16 19 20 20 21 22 22 25 25 25 25 33 33 35 35 35 35 36 40 45 46 52 70

using bin means to smooth the data within a bin

Q4. (20 marks)

Consider the following database of strings.

ID

String

1 ababa 2 abcaba 3 aaacabc 4 abcdefgh (1) What is the edit distance between the first and the third string? You need to show the complete D(i; j) matrix.

int LevenshteinDistance(char s[1..m], char t[1..n])

{

declare int d[0..m, 0..n] for i from 0 to m d[i, 0] := i for j from 0 to n d[0, j] := j for j from 1 to n { for i from 1 to m { if s[i] = t[j] then d[i, j] := d[i-1, j-1] // no operation required else d[i, j] :=

13 15 16 16 19 20 20 21 22 22 25 25 25 25 33 33 35 35 35 35 36 40 45 46 52 70

Smallet value: A=13

Biggest value: B=70

Total number of values: N=26

Partition into (equi-depth) bins: Smoothing by bin means:

(2) Use the MaxDiff method to obtain the bin boundaries (using bin means to smooth the data within a bin and using a total of 9 bins). Also calculate its SSE (Sum of Square Error).

You need to illustrate your steps.

Unsort the data: 15, 19, 22, 25, 25, 45, 13, 16, 16, 33, 20, 20, 21, 22, 52, 70, 25, 25, 40, 46, 35,

35, 35, 33, 36, 35.

Sort the data:

13 15 16 16 19 20 20 21 22 22 25 25 25 25 33 33 35 35 35 35 36 40 45 46 52 70

using bin means to smooth the data within a bin

Q4. (20 marks)

Consider the following database of strings.

ID

String

1 ababa 2 abcaba 3 aaacabc 4 abcdefgh (1) What is the edit distance between the first and the third string? You need to show the complete D(i; j) matrix.

int LevenshteinDistance(char s[1..m], char t[1..n])

{

declare int d[0..m, 0..n] for i from 0 to m d[i, 0] := i for j from 0 to n d[0, j] := j for j from 1 to n { for i from 1 to m { if s[i] = t[j] then d[i, j] := d[i-1, j-1] // no operation required else d[i, j] :=