#LyX 1.4.3 created this file. For more info see http://www.lyx.org/
\lyxformat 245
\begin_document
\begin_header
\textclass article
\language english
\inputencoding auto
\fontscheme default
\graphics default
\paperfontsize default
\papersize default
\use_geometry false
\use_amsmath 1
\cite_engine basic
\use_bibtopic false
\paperorientation portrait
\secnumdepth 3
\tocdepth 3
\paragraph_separation indent
\defskip medskip
\quotes_language english
\papercolumns 1
\papersides 1
\paperpagestyle default
\tracking_changes false
\output_changes false
\end_header
\begin_body
\begin_layout Title
FireStats scalability solution
\end_layout
\begin_layout Section
Problem
\end_layout
\begin_layout Standard
As users are entering the site, FireStats collects data, the more data in
the database, the slower the admin page gets and the and the larger the
database size becomes.
\end_layout
\begin_layout Section
Solution requirements
\end_layout
\begin_layout Standard
Basically, the idea is to aggregate data in a way that will allow us to
support existing needs and current API's but will use less database storage.
the aggregated, or 'compressed' data will
\emph on
not
\emph default
be as detailed as the recent data, but this is acceptable.
\newline
The solution should
support the following future features as well:
\end_layout
\begin_layout Itemize
Dynamic graphs : Dynamic graphs will need to query the database for hits
in a time range (between X and Y).
the time range can be completly outisde of the aggregation, completly inside
it or partially inside.
all those modes should be considered.
\end_layout
\begin_layout Itemize
The archive design should be flexible enough to allow addition of more data,
example of such data is the user-agents, or the number of hits per URL.
\end_layout
\begin_layout Itemize
the design should make it easy to set a baseline number for variables, for
example - to set the initial number of hits/unique hits per site.
\end_layout
\begin_layout Section
New tables schema
\end_layout
\begin_layout Standard
NOTE: THE REST OF THE DOCUMENT IS OUTDATED.
\end_layout
\begin_layout Standard
The main idea is to create an archive based on time ranges.
the granularity of each range is not yet decided, but it will be no smaller
than one day.
each such range will have several data elements associated with it.
like page views, unique visitors number of views of each page during that
date and so on.
\end_layout
\begin_layout Subsection
archive_ranges
\end_layout
\begin_layout Standard
This table holds the information about the ranges of archive elements.
\end_layout
\begin_layout Standard
\begin_inset Tabular
|
\begin_inset Text
\begin_layout Standard
Name
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Type
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Description
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
range_id
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Key
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Range ID
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
range_start
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
DateTime
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
The time this range begins
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
range_end
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
DateTime
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
The time this range ends
\end_layout
\end_inset
|
\end_inset
\end_layout
\begin_layout Subsection
archive_data
\end_layout
\begin_layout Standard
This table holds the archive data.
\end_layout
\begin_layout Standard
\begin_inset Tabular
|
\begin_inset Text
\begin_layout Standard
Name
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Type
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Description
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
range_id
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Key
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Range ID matching a row in the archive_ranges
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
site_id
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Key
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
The ID of the site this row data belongs to
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
datype
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
int
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
the type of the data
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
key_value
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
int
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Optional data type (-1 for unused), its meaning depends on the value of
the datype field in the row.
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
value
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
int
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
the actual value of the data, its meaning depends on the datype and the
key_value fields.
\end_layout
\end_inset
|
\end_inset
\newline
\end_layout
\begin_layout Standard
This table describes possible types for the
\emph on
datype
\emph default
column and the meaning of the key_value column for each one.
\end_layout
\begin_layout Description
\begin_inset Tabular
|
\begin_inset Text
\begin_layout Standard
Constnat(Value)
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
key_value
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Description
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
SITE_VIEWS(1)
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Unused (-1)
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Number of page viewes for this site in the matching time range
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
SITE_VISITORS(2)
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Unused (-1)
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Number of times a particular page have been viewed.
when this is the type, the key_value will contain the
\emph on
url_id of the relevant page.
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
PAGE_VIEWS(3)
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
url_id
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
Number of times a particular page have been viewed, the key_value is the
url_id of the page
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
PAGE_VISITORS(4)
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
url_id
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
number of visitors for this particular page
\end_layout
\end_inset
|
|
\begin_inset Text
\begin_layout Standard
FILE_DOWNLOAD(5)
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
TBD
\end_layout
\end_inset
|
\begin_inset Text
\begin_layout Standard
an event type specifying a file download, the key_value is be a value that
specifies which file this refers to (exact meaning will be defined when
the download-counter mechnism is designed)
\end_layout
\end_inset
|
\end_inset
\end_layout
\begin_layout Section
Implementation notes
\end_layout
\begin_layout Enumerate
The first range will act as the base-line.
its start time will be 0-0-0000 00:00 and its end time will be the FireStats
installation time.
this will allow us to set initial value for archived data elements (for
example - to start the pages hit count at a certain value).
\end_layout
\begin_layout Enumerate
initially the ranges length (besides the first one) will be of a single
day each.
but it will be possible to merge ranges by summing all their values and
collpasing them into a single range that covers a larger period of time.
\end_layout
\begin_layout Enumerate
With this design, adding additional data types should be easy.
\end_layout
\begin_layout Enumerate
When archiving data, the following actions will be made withing an atomic
transaction (all or nothing):
\end_layout
\begin_deeper
\begin_layout Enumerate
summarizing all the archive data elements that are gathered into ranges
\end_layout
\begin_layout Enumerate
storing the data in the archive, creating ranges if need to
\end_layout
\begin_layout Enumerate
Delete the source data from the storage
\end_layout
\end_deeper
\begin_layout Enumerate
Graphs will be supported by merging data from the live storage and the archives
when needed to.
(when the query overlaps the archive).
\end_layout
\end_body
\end_document