Vat ly dai cuong

Chia sẻ bởi Trần Văn Hậu | Ngày 18/03/2024 | 26

Chia sẻ tài liệu: Vat ly dai cuong thuộc Vật lý 11

Nội dung tài liệu:

1
Rule Discovery from Time Series
Vo Thi Ngoc Chau
2009
Seminar on Time Series Data Mining
Faculty of Computer Science and Engineering
Ho Chi Minh City University of Technology
2
Content
Introduction
Rule Discovery Process
Review on Rule Discovery from Time Series
Problem Statement
Conclusion
3
We have seen the Classic Time Series Data Mining Tasks…
Clustering
Classification
Query by Content
4
Let’s look at some problems which are more interesting…
Rule Discovery
10

s = 0.5
c = 0.3

Novelty
Detection
Motif Discovery
5
Introduction
Time Series
Temporal aspect (history orientation)
Huge data volume
A broad range of application domains
Rule Discovery from Time Series
Interesting
But tough
6
Rule Discovery Process
Raw Data
Items of Interest
Relationships among Items
(Rules)
User
Pre-processing
Mining
Post-processing
7
Association
Rules
Items
Transactional
Data
Rule Discovery Process - Traditional
Raw Data
Items of Interest
Relationships among Items
(Rules)
User
Pre-processing
Mining
Post-processing
Transaction Items_bought
---------------------------------
2000 A, B, C
1000 A, C
4000 A, D
5000 B, E, F
…
A, B, C, D, F, …
A  C (50%, 66.6%)
C  A (50%, 100%)
…
8
Time Series
Patterns
Rules
Rule Discovery Process – Time Series
Raw Data
Items of Interest
Relationships among Items
(Rules)
User
Pre-processing
Mining
Post-processing
10

s = 0.5
c = 0.3
…
…
…
9
Review on Rule Discovery from Time Series
I. Patterns from Time Series
II. Rules from Time Series
III. Algorithms of Rule Discovery from Time Series
IV. User Support
10
Time Series
Patterns
Rules
I. Patterns from Time Series
Raw Data
Items of Interest
Relationships among Items
(Rules)
User
Pre-processing
Mining
Post-processing
10

s = 0.5
c = 0.3
…
…
…
I.
11
I. Patterns from Time Series
Meaning
Length
Shape
Overlapping
Temporal aspect
Extraction (Preprocessing Phase)
12
Time Series
Patterns
Rules
II. Rules from Time Series
Raw Data
Items of Interest
Relationships among Items
(Rules)
User
Pre-processing
Mining
Post-processing
10

s = 0.5
c = 0.3
…
…
…
II.
13
II. Rules from Time Series
Point/Interval-based rules
Inter/Intra-transaction rules
Predictive rules
Single/Multiple time series rules
Association (dependence, correlation, casual, classification) rules
14
Time Series
Patterns
Rules
III. Algorithms of Rule Discovery from Time Series
Raw Data
Items of Interest
Relationships among Items
(Rules)
User
Pre-processing
Mining
Post-processing
10

s = 0.5
c = 0.3
…
…
…
III.
15
III. Algorithms of Rule Discovery from Time Series
In-memory algorithms
Index structures
Measures
Algorithm analysis
16
Time Series
Patterns
Rules
IV. User Support
Raw Data
Items of Interest
Relationships among Items
(Rules)
User
Pre-processing
Mining
Post-processing
10

s = 0.5
c = 0.3
…
…
…
IV.
17
IV. User Support
Rule interpretation
Rule visualization
End-users/Data mining application developers
18
Review on Rule Discovery from Time Series
[Das98] G. Das, K-I. Lin, H. Mannila, G. Renganathan, P. Smyth, Rule discovery from time series, 1998.
[Au04] W. H. Au, K. C. C. Chan, Mining fuzzy rules for time series classification, 2004.
[Daf05] P. Dafas, A. S. Garcez, Applied temporal rule mining to time series, 2005.
[Sac07] L. Sacchi, C. Larizza, C. Combi, R. Bellazzi, Data mining with temporal abstractions: learning rules from time series, 2007.
[Nam08] H. Nam, K. Lee, D. Lee, Identification of temporal association rules from time-series microarray data set, 2008.
[Ha09] Y. Ha, S. Park, S. Kim, J. Won, J. Yoon, A stock recommendation system exploiting rule discovery in stock databases, 2009.
19
Review on [Das98] - Rule discovery from time series, 1998
20
Rule format:

The frequency F(A) = the number of occurrences of A =

The frequency F(A, B, T) = the number of occurrences of A that are followed by B within T units =

The confidence c of the rule = the fraction of occurrences of A that are followed by B within T units = F(A, B, T)/F(A)
J-measure for rule-ranking:
If A occurs,
then B occurs within time T.
Review on [Das98] - Rule discovery from time series, 1998
21
Review on [Das98] - Rule discovery from time series, 1998
22
Review on [Das98] - Rule discovery from time series, 1998
23
Review on [Das98] - Rule discovery from time series, 1998
24
Review on [Au04] - Mining fuzzy rules for time series classification, 2004
Given
n time series: S1, …, Sn
Si = (si1, si2, …, siw)
m predefined classes: C1, …, Cm
h fuzzy sets: F1, …, Fh
Given by human experts or generated from data
Membership function μFk of Fk, k=1..h

where [l1, l2]  R
25
Rule format: Fp  Cq [wFp,Cq]
Fp: a set of fuzzy sets F1, …, Fh w.r.t. , a set of points, where |F| >= 1 and || >= 1

Fp = {Fk | (Fk, μijk)  gij  j    k  {1, …, h}}

Cq: a class label where q = 1..m
wFp,Cq : weight of evidence  the importance of fuzzy rules
Given time series Si  fuzzy time series Gi
Si = (si1, si2, …, siw)
Gi = (gi1, gi2, …, giw)
where gij = {(F1, μij1), …, (Fh, μijh)},
μij1 = μF1(sij), …, μijh = μFh(sij), j = 1..w
Review on [Au04] - Mining fuzzy rules for time series classification, 2004
26
The fuzzy rule mining algorithm
Review on [Au04] - Mining fuzzy rules for time series classification, 2004
27
First-order fuzzy rule: a rule involving one fuzzy set in its antecedent
Interesting (Fp, Cq) = dFpCq
A normal distribution
|dFpCq|>1.96  the association between Fp and Cq is interesting.
dFpCq > + 1.96  the presence of Fp implies the presence of Cq.
dFpCq < -1.96  the presence of Fp implies the absence of Cq.
Review on [Au04] - Mining fuzzy rules for time series classification, 2004
28
Review on [Daf05] - Applied temporal rule mining to time series, 2005
Patterns
Subsequences of arbitrary length
Clustering these subsequences
K-means technique
K is predefined.
Each cluster C is associated with a symbol a.
In the frequency domain
Subsequences are transformed using the discrete cosine transform.
Noise reduction
Not overlapping or a little (user-dependent)
29
Review on [Daf05] - Applied temporal rule mining to time series, 2005
Rule format:
Ai: a pattern for time i, Bj: a pattern for time j
Interpretation: if Ai occurs, then Bi occurs in exactly T time units.
The frequency F(Ai) = the number of occurrences of Ai divided by n
The confidence c of a rule = the fraction of occurrences of Ai which are followed by another shape Bj in T time units

The J-measure for rule ranking
30
Review on [Daf05] - Applied temporal rule mining to time series, 2005
A time series of the cylinder, bell, funnel patterns and its corresponding discrete version FFBCFFFBC… using subsequence clustering on the DCT domain (pattern length = 100, k = 3)
31
Review on [Daf05] - Applied temporal rule mining to time series, 2005
Theoretical rules
Discovered rules
32
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Pattern
A behavior or property that we may want to distinguish in the data, e.g. an increasing trend, an up/down behavior
Associated with a time interval in which such behavior occurs
Related to a qualitative representation of the property that we are looking for, which may be interesting in the problem domain
Temporal abstraction
A description of a (set of) time series through sequences of temporal intervals corresponding to relevant patterns detected in their time courses
33
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Basic temporal abstractions (TAs)
State TAs extract the intervals in which values are within a predefined range.
State TAs correspond to expressions like “high arterial/blood pressure” or “low temperature”.
Trend TAs represent increase, decrease, and stationary patterns in a numerical time series.
Common parameters
Granularity
Minimal extent: the permitted minimal time-span of an episode
34
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
35
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Complex temporal abstractions
Used to detect patterns characterized by behaviors which can’t be represented by basic TAs
Corresponding to patterns whose time intervals are based on temporal relationships defined in Allen’s algebra (13 temporal operators)
Originating from the same and from different time series
36
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
37
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Allen’s temporal operators on two periods of time
BEFORE (period1, period2) 11111 22222

EQUAL (period1, period2) 11111
22222

MEETS (period1, period2) 1111122222

OVERLAPS (period1, period2) 11111
22222

DURING (period1, period2) 11111
22222222

STARTS (period1, period2) 11111
22222222

FINISHES (period1, period2) 11111
22222222
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
TRUE
38
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Temporal association rule
Temporal relationships (OVERLAPS, FINISHED-BY, MEETS, BEFORE, EQUALS, STARTS  PRECEDES) between the complex temporal patterns
Rule format: A  pC
A: the set {a1, a2, a3, …, an}  AoI (set of abstractions of interest)
p: a triple (LS-left shift, G-gap, RS-right shift)
C: an abstraction  AoI
Interpretation: a temporal association rule holds when all the patterns in the antecedent intersect and when the relation PRECEDES holds between the intervals of this intersection and an episode of the pattern in the consequent.
39
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Definition of the set AoI of abstractions of interest
40
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
41
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
42
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
43
Support = RTS / TSO
RTS: the rule time span corresponding to the union of the episodes in which both the antecedent and the consequent of the rule occur
TSO: the time span, i.e. the total duration, of the observation period over which the rule is derived
Interpretation: a measure of how the rule is ‘spread’ over the observation time span
Confidence = NARTS / NAT
NARTS: the number of times (episodes) the antecedent occurs during RTS
NAT: the number of time (episodes) the antecedent occurs during TSO
Interpretation: the frequency of the rule over the total amount of episodes of the antecedent
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
44
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
45
Review on [Sac07] - Data mining with temporal abstractions: learning rules from time series, 2007
Algorithm for extracting temporal association rules
46
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
47
Discrete values: up, down, and none
Temporal item: a discrete value + its position in rules + time stamp
Example: g1L is an “up” discrete value at the time point “1” expected to appear on the left hand side of temporal association rules.
Temporal item set: a non-empty set of temporal items
Temporal frequent item set: a temporal item set whose support is greater than or equal to a support threshold (MinSup)
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
48
Rule format: LHS  (∆) RHS
LHS & RHS: disjoint temporal frequent item sets.
∆ (association interval): the interval of different two time stamps of the LHS and RHS. ∆ = 0..n.
Interpretation: RHS follows LHS after ∆.
Temporal transaction set: a combination of the transaction at the time t (t=0..n-∆) and the transaction at the time t + ∆.
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
49
Support of a temporal item set = the number of occurrences of the temporal item set / the number of temporal transaction sets
Confidence of a temporal association rule = support (LHS  RHS)/support (LHS)
Precision = (# of matched rules)/(# of extracted rules)
A measure of correctness of the extracted rules
Precision  [0, 1]
The KEGG cell cycle regulation path (known)
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
50
Review on [Nam08] - Identification of temporal association rules from time-series microarray data set, 2008
a temporal transaction set
# of temporal transaction sets = 5
Support ({g1L}) = 3/5 = 60%
Support ({g2R}) = 3/5 = 60%
Support ({g3R}) = 3/5 = 60%
Support ({g2L}) = 3/5 = 60%
Support ({g1L, g2R}) = 3/5 = 60%
Support ({g1L, g3R}) = 1/5 = 20%
Confidence (g1  (2) g2) =
Support ({g1L, g2R}) / support({g1L})
Confidence (g1  (2) g2) = 60%/60% Confidence (g1  (2) g2) = 1
51
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
Rule format:
H: an event corresponding to an appearance of a pattern P within the time range of the rule head
B: an event corresponding to the characteristics of stock prices within the time range of the rule body. Investor-specified rule body conditions decide the characteristics of stock prices (INCREASE, DECREASE, UNCHANGED).
s = support (H) = (number of occurrences of a pattern P corresponding to H)*100/(number of occurrences of patterns whose lengths are same as that of the pattern P corresponding to H)
c = confidence (H, B) = (number of occurrences of patterns that are matched to H and also satisfied with the condition on B)*100/(number of occurrences of patterns that are matched to H)
Interpretation: B happens after time t since H has occurred.
52
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
Changing patterns of stock prices
53
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
Process flow for finding frequent patterns
54
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
55
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
Categorization
 A symbol sequence S’’ =
56
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
The Apriori algorithm to discover frequent patterns
Discover a set of frequent patterns F1 of length 1 by scanning S’’
Perform the self-join of Fk-1 (2<=k<=length(S’’)) to produce a set of candidate patterns Ck of length k
For each pattern  in Ck, scan S’’ to decide if the support of  is greater than or equal to MinSup
If yes, then  is inserted into Fk.
57
Review on [Ha09] - A stock recommendation system exploiting rule discovery in stock databases, 2009
58
Review – Summary on Patterns
59
Review – Summary on Rules
60
Review – Summary on Algorithms
61
Review – Summary on User Support
62
Problem Statement
Temporal aspect investigation for rules
Items (Patterns) of interest
Rule model
Rule discovery technique evaluation (analysis)
Choice of measures (support, confidence, j-measure, weight of evidence, …)
User support
63
Conclusion
Rule discovery from time series
Rule discovery process
Several issues regarding time series rule discovery
Patterns
Rules
Rule discovery algorithms
User support
Problem statement
64
Thank you.

* Một số tài liệu cũ có thể bị lỗi font khi hiển thị do dùng bộ mã không phải Unikey ...

Người chia sẻ: Trần Văn Hậu

Dung lượng: | Lượt tài: 1

Loại file:

Nguồn : Chưa rõ

(Tài liệu chưa được thẩm định)

Tải tài liệu

Báo cáo sai sót

Đề suất chỉnh sửa , phân lại mục