Skip to content
rnowling edited this page Sep 7, 2014 · 1 revision

Welcome to the bigpetstore-data-generator wiki! BigPetStore is a big data example application and blueprint for Hadoop (and soon Spark!) that centers around transaction data for a fictional chain of pet stores. BigPetStore is part of the Apache BigTop project.

Here, we are working on a new data generator for BPS that will increase the complexity of the generated data through ab initio modeling of customer behavior. As a result, BPS will be able to support meaningful analytics examples.

The data generator is currently being developed in Python. Our long-term plan is to use the Python sandbox for rapid development and prototyping, while maintaining a separate JVM port that will be stable for community use. (A work-in-progress Java port is currently available.) As new features stabilize in the Python implementation, they will be migrated to the JVM version. We are aiming for including the JVM version in Apache BigTop.

Roadmap

  • v0.2 -- Focus: code clean up
    • Clean up Python code for readability / maintainability
    • Add unit tests
    • Add documentation (docstrings, achitecture overview)
  • v0.3 -- Focus: improving purchasing profiles
    • Add product generator so we have more fields to use in modeling purchasing behavior
    • Allow purchasing profiles to be used by multiple customers (saved on computation)
    • Change MSM transition probability calculation to incorporate stationary PDFs that are not dependent on previous purchase. This will allow us to encode global preferences such as brands.
  • v0.4 -- Focus: Time-dependent events
    • Add sales
    • Add simple weather simulation and incorporate effects into customer purchasing behaviors
    • Add store hours PDF to transaction time sampling
    • Add customer work hours profiles
  • v0.5 -- Focus: Customer product ratings
    • Add support for generating customer product ratings

Clone this wiki locally