#LyX 1.4.3 created this file. For more info see http://www.lyx.org/ \lyxformat 245 \begin_document \begin_header \textclass article \language english \inputencoding auto \fontscheme default \graphics default \paperfontsize default \papersize default \use_geometry false \use_amsmath 1 \cite_engine basic \use_bibtopic false \paperorientation portrait \secnumdepth 3 \tocdepth 3 \paragraph_separation indent \defskip medskip \quotes_language english \papercolumns 1 \papersides 1 \paperpagestyle default \tracking_changes false \output_changes false \end_header \begin_body \begin_layout Title FireStats scalability solution \end_layout \begin_layout Section Problem \end_layout \begin_layout Standard As users are entering the site, FireStats collects data, the more data in the database, the slower the admin page gets and the and the larger the database size becomes. \end_layout \begin_layout Section Solution requirements \end_layout \begin_layout Standard Basically, the idea is to aggregate data in a way that will allow us to support existing needs and current API's but will use less database storage. the aggregated, or 'compressed' data will \emph on not \emph default be as detailed as the recent data, but this is acceptable. \newline The solution should support the following future features as well: \end_layout \begin_layout Itemize Dynamic graphs : Dynamic graphs will need to query the database for hits in a time range (between X and Y). the time range can be completly outisde of the aggregation, completly inside it or partially inside. all those modes should be considered. \end_layout \begin_layout Itemize The archive design should be flexible enough to allow addition of more data, example of such data is the user-agents, or the number of hits per URL. \end_layout \begin_layout Itemize the design should make it easy to set a baseline number for variables, for example - to set the initial number of hits/unique hits per site. \end_layout \begin_layout Section New tables schema \end_layout \begin_layout Standard NOTE: THE REST OF THE DOCUMENT IS OUTDATED. \end_layout \begin_layout Standard The main idea is to create an archive based on time ranges. the granularity of each range is not yet decided, but it will be no smaller than one day. each such range will have several data elements associated with it. like page views, unique visitors number of views of each page during that date and so on. \end_layout \begin_layout Subsection archive_ranges \end_layout \begin_layout Standard This table holds the information about the ranges of archive elements. \end_layout \begin_layout Standard \begin_inset Tabular \begin_inset Text \begin_layout Standard Name \end_layout \end_inset \begin_inset Text \begin_layout Standard Type \end_layout \end_inset \begin_inset Text \begin_layout Standard Description \end_layout \end_inset \begin_inset Text \begin_layout Standard range_id \end_layout \end_inset \begin_inset Text \begin_layout Standard Key \end_layout \end_inset \begin_inset Text \begin_layout Standard Range ID \end_layout \end_inset \begin_inset Text \begin_layout Standard range_start \end_layout \end_inset \begin_inset Text \begin_layout Standard DateTime \end_layout \end_inset \begin_inset Text \begin_layout Standard The time this range begins \end_layout \end_inset \begin_inset Text \begin_layout Standard range_end \end_layout \end_inset \begin_inset Text \begin_layout Standard DateTime \end_layout \end_inset \begin_inset Text \begin_layout Standard The time this range ends \end_layout \end_inset \end_inset \end_layout \begin_layout Subsection archive_data \end_layout \begin_layout Standard This table holds the archive data. \end_layout \begin_layout Standard \begin_inset Tabular \begin_inset Text \begin_layout Standard Name \end_layout \end_inset \begin_inset Text \begin_layout Standard Type \end_layout \end_inset \begin_inset Text \begin_layout Standard Description \end_layout \end_inset \begin_inset Text \begin_layout Standard range_id \end_layout \end_inset \begin_inset Text \begin_layout Standard Key \end_layout \end_inset \begin_inset Text \begin_layout Standard Range ID matching a row in the archive_ranges \end_layout \end_inset \begin_inset Text \begin_layout Standard site_id \end_layout \end_inset \begin_inset Text \begin_layout Standard Key \end_layout \end_inset \begin_inset Text \begin_layout Standard The ID of the site this row data belongs to \end_layout \end_inset \begin_inset Text \begin_layout Standard datype \end_layout \end_inset \begin_inset Text \begin_layout Standard int \end_layout \end_inset \begin_inset Text \begin_layout Standard the type of the data \end_layout \end_inset \begin_inset Text \begin_layout Standard key_value \end_layout \end_inset \begin_inset Text \begin_layout Standard int \end_layout \end_inset \begin_inset Text \begin_layout Standard Optional data type (-1 for unused), its meaning depends on the value of the datype field in the row. \end_layout \end_inset \begin_inset Text \begin_layout Standard value \end_layout \end_inset \begin_inset Text \begin_layout Standard int \end_layout \end_inset \begin_inset Text \begin_layout Standard the actual value of the data, its meaning depends on the datype and the key_value fields. \end_layout \end_inset \end_inset \newline \end_layout \begin_layout Standard This table describes possible types for the \emph on datype \emph default column and the meaning of the key_value column for each one. \end_layout \begin_layout Description \begin_inset Tabular \begin_inset Text \begin_layout Standard Constnat(Value) \end_layout \end_inset \begin_inset Text \begin_layout Standard key_value \end_layout \end_inset \begin_inset Text \begin_layout Standard Description \end_layout \end_inset \begin_inset Text \begin_layout Standard SITE_VIEWS(1) \end_layout \end_inset \begin_inset Text \begin_layout Standard Unused (-1) \end_layout \end_inset \begin_inset Text \begin_layout Standard Number of page viewes for this site in the matching time range \end_layout \end_inset \begin_inset Text \begin_layout Standard SITE_VISITORS(2) \end_layout \end_inset \begin_inset Text \begin_layout Standard Unused (-1) \end_layout \end_inset \begin_inset Text \begin_layout Standard Number of times a particular page have been viewed. when this is the type, the key_value will contain the \emph on url_id of the relevant page. \end_layout \end_inset \begin_inset Text \begin_layout Standard PAGE_VIEWS(3) \end_layout \end_inset \begin_inset Text \begin_layout Standard url_id \end_layout \end_inset \begin_inset Text \begin_layout Standard Number of times a particular page have been viewed, the key_value is the url_id of the page \end_layout \end_inset \begin_inset Text \begin_layout Standard PAGE_VISITORS(4) \end_layout \end_inset \begin_inset Text \begin_layout Standard url_id \end_layout \end_inset \begin_inset Text \begin_layout Standard number of visitors for this particular page \end_layout \end_inset \begin_inset Text \begin_layout Standard FILE_DOWNLOAD(5) \end_layout \end_inset \begin_inset Text \begin_layout Standard TBD \end_layout \end_inset \begin_inset Text \begin_layout Standard an event type specifying a file download, the key_value is be a value that specifies which file this refers to (exact meaning will be defined when the download-counter mechnism is designed) \end_layout \end_inset \end_inset \end_layout \begin_layout Section Implementation notes \end_layout \begin_layout Enumerate The first range will act as the base-line. its start time will be 0-0-0000 00:00 and its end time will be the FireStats installation time. this will allow us to set initial value for archived data elements (for example - to start the pages hit count at a certain value). \end_layout \begin_layout Enumerate initially the ranges length (besides the first one) will be of a single day each. but it will be possible to merge ranges by summing all their values and collpasing them into a single range that covers a larger period of time. \end_layout \begin_layout Enumerate With this design, adding additional data types should be easy. \end_layout \begin_layout Enumerate When archiving data, the following actions will be made withing an atomic transaction (all or nothing): \end_layout \begin_deeper \begin_layout Enumerate summarizing all the archive data elements that are gathered into ranges \end_layout \begin_layout Enumerate storing the data in the archive, creating ranges if need to \end_layout \begin_layout Enumerate Delete the source data from the storage \end_layout \end_deeper \begin_layout Enumerate Graphs will be supported by merging data from the live storage and the archives when needed to. (when the query overlaps the archive). \end_layout \end_body \end_document