You will also
demonstrate that you understand the strength and weakness of each system with respect to
certain query workload features. You will be given a real world data set in Question and
Answer area and a set of target queries. The target queries include very basic OLTP type
queries and analytic queries.
You are asked to design two storage options: one with MongoDB as the storage system
and the other with Neo4j as the storage system. For each option, you need to store a full
copy of the data in the system and implement a subset of the target queries.
Data set
The data that you will use is the latest dump (publication date: 2018-06-05) of the Artificial
Intelligence Stack Exchange question and answer site (https://ai.stackexchange.
com/). The dump is released and maintained by stackexchange: https://archive.org/
details/stackexchange. The original dump contains many files in XML format. The as signment uses a subset of the data stored in four tsv files. The data files and the description
(readme.txt) can be downloaded from Canvas.
The assignment data set contains the following files:
Posts.tsv stores information about post, each row represents a post, which could be
a question or an answer
Users.tsv stores users profile, each row represents a user
Votes.tsv stores detailed vote information about post, each row represents a vote,
including the vote type, the date this vote is made and a few other information
Tags.tsv contains summary of tag usage in this site.
1
Target Queries
Simple queries
[SQ1] Find all users involved in a given question (identified by id) and their respective
profiles including the creationDate, DisplayName, upVote and DownVote. Note, we
are only interested in existing users who either posted or answered the question. You
may ignore users that do not have an id.
[SQ2] Assuming each tag represents a topi
Use the order calculator below and get started! Contact our live support team for any assistance or inquiry.
[order_calculator]