The Database

From WikiThor
Jump to: navigation, search

Especially in this time of economic turmoil, statements with well-defined truth values are being thrown around popular media as true without empirical support. This is especially troubling and confusing when statements are mutually contradictory. The main problem, at least for me, is lack of access to the raw data and tools to perform the analyses supposedly claimed by the media.

One possible solution to this problem is one singular database to hold as many disparate data points as possible with enough meta-data to make discrete and varied analyses possible and, more importantly, fast and simple. The database itself is extremely simple in design. It consists of three tables: keys (or nodes, or ids), values, and sources. The keys table is simply a list of pairs of numbers. The first number is a unique identifier for a data point and is what is referenced by the values table (which I get to shortly). The second number references an item in the sources table. The values table consists of three numbers followed by a descriptor. The first number is a unique identifier for that entry in the values table. The second number is the identifier of the data point to which the value belongs. The third number is the actual data value for that part of the data point (for the linear-algebra-inclined, if the data point exists in an N-dimensional space, this is the value for one of the point's dimensions). The descriptor is a short, typically single-word description of the data (e.g. Year, annual income, monthly income, height, weight, price, demand, supply, etc.). The sources table consists of as many columns as is necessary, starting with the unique identifier column (to be referenced by the keys table) and the organization or individual who produced the data point.

Beyond the database itself, a library of number-crunching tools is necessary to make sense of the data. Among these are visualization methods and curve-fitting techniques. This is the bulk of the work that would be required to create a true solution to the problem outlined above.

The Data Harvester

Beyond the construction of the database itself, an automated system will be necessary to populate it with existing data from a variety of sources. This Data Harvester will need to be able to immediately adapt to the construction of any novel data set. Concretely, the Harvester will need to be able to intuit how data is organized in whatever new sources are given to it or it discovers with little to no human input.

Personal tools
other projects