5 Comments

  • Ann, I personally LOVE (and agree) this blog post. The insights , and easy way that you’ve shown users how to manage a contiguous dataset, are invaluable. Thank you for all that you do!

  • Jon Peltier says:

    Great post, I wish people would arrange their data like this. I tell them, you can spend five minutes making your data right first, or five hours fixing your analyses and visualizations later.

    My only question is, why is all your data left aligned? Using the General alignment shows at a glance whether the numbers are actual numbers or stored as text. (This is the second thing I wish people would do.)

  • Vivian Jefferson says:

    This is a great explanation of the importance of structuring datasets for efficiency. I see non-contiguous datasets a lot from beginning spreadsheet users because it hasn’t occurred to them to combine data collected separately into one table. A few weeks ago I helped someone create a contiguous dataset by making their mini sets into tables, then using the VSTACK function to append them together. That avoided possible errors from copying and pasting data.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    Two Types of Datasets: Contiguous vs. Non-Contiguous

    Updated on: Nov 5th, 2024
    Data Analysis
    , , , , , ,

    “Ann, I loved your training, but I’m having trouble applying what I learned. Something’s off with my datasets, and the graphs are taking forever!”

    This past year, I’ve spent more time teaching about data management than data visualization.

    When I look under the hood of companies’ spreadsheets, I’ve noticed way too many data management issues that could be avoided altogether.

    In this article, you’ll learn about a prerequisite for data visualization: contiguous datasets.

    Benefits of Contiguous Datasets

    In this video, you’ll see how a single contiguous dataset lets you use:

    • ONE set of formulas for data cleaning, recoding, and analyses
    • ONE set of pivot tables
    • ONE set of charts

    Then, at the end, you can slice and dice your charts with a variety of filters.

    Mini Datasets Spread Across One Sheet – NO!

    Here’s what I often see:

    Separate datasets for each time period.

    NOOOOOOOOOOOOOOOOO!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Sometimes there are dozens of mini datasets, like this:

    Mini Datasets Spread Over Multiple Sheets – NO!

    Or, just as terrible for graphs and dashboards — one mini dataset per sheet, like this.

    NOOOOOO!!!!!!!!!!!!!!!!!!!!!!

    Or, separate mini datasets spread across different Excel files altogether.

    NOOOOOOOOO!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

    Ann, What’s So Bad about Mini Datasets?!

    Separate mini datasets (“non-contiguous” or “non-touching” datasets) mean that we can only look at one time period at a time.

    We have to make a bunch of mini charts.

    It takes forever to make these the first time, and they’re a huge pain to update over time.

    It’s also tougher for our viewers to find patterns because the numbers are scattered across too many charts.

    NOOOOOO!!!!!!!!!!!!!!!

    Dataviz Prerequisite: A Single Contiguous Dataset

    Instead, the numbers should be stored in a single dataset, with the timeframe as its own column, like this:

    This running list of new entries — a log — is going to get very long.

    In real-life projects, the logs might have hundreds of thousands of entries.

    That’s okay!!!!!!!!!!!!!!!!! That’s preferred!!!!!!!!!!!!!!!!!!!!

    It’s counterintuitive, but contiguous logs make dataviz faster, not slower.

    Excel can handle millions of entries.

    The length of a dataset won’t make your analysis or visualization take any longer. Repeat after me: Contiguous logs make dataviz faster, not slower.

    However…

    The width — the number of columns — can certainly take a while, because there are so many different variables to consider.

    Bonus: Save Your table as an Excel Table for Easier Updating

    A table is the generic term for a collection of rows and columns.

    An Excel Table is a special feature that makes it faster and easier to update our log.

    In other words, Excel Tables make it easier to append our contiguous logs as we get new data.

    How to Turn tables into Excel Tables

    You’ll simply click on your contiguous log — your generic table.

    Then, go to the Insert tab.

    Choose a Table.

    Click OK.

    You’ll recognize the banded rows.

    Adding New Entries to Datasets Stored as Excel Tables

    Adding new entries — or appending — is easy.

    Let’s pretend you’re downloading data from your organization’s database. You might only be able to download one month at a time into its own sheet. That’s okay!

    We’ll simply copy and paste those new entries into our running log.

    Then, we’ll add the timeframe to that right-most column, too.

    Excel is smart, and it’ll know that your new entries are part of your new dataset. In other words, your new entries will feed into pivot tables and formulas seamlessly.

    Contiguous Datasets are Required for Static Dashboards

    Want a short handout, PDF, or email attachment to share with others?

    Maybe you’d want to see how all the projects combined are doing.

    Or, maybe you’d want a breakdown of the different projects.

    You could even add quick vizzes like sparklines to see trends, like this:

    Contiguous datasets are required in order to make static dashboards.

    Otherwise the sumifs, countifs, and averageifs behind the scenes will be impossible. Or, the formulas will be painfully slow to set up.

    Static dashboards should take less than an hour to design from start to finish.

    If it’s taking longer than that, it’s probably because (a) you don’t have a contiguous dataset or (b) you need more practice with formulas.

    Contiguous Datasets are Required for Interactive Dashboards

    Want to make interactive dashboards in Excel?

    Your technical coworkers will love exploring the insights for themselves.

    Interactive dashboards involve four pieces:

    1. A single contiguous dataset stored as a regular ol’ table or an Excel Table. You already know I prefer Excel Tables for datasets that are going to be added to or appended in the future.
    2. Pivot tables to tabulate the numbers (and bypass formulas, which can be tricky for novices).
    3. Pivot charts to, you know, visualize the numbers.
    4. Slicers (a fancy name for the filters).

    Once again, contiguous datasets are the foundation of data visualization.

    Have I sold you on contiguous datasets yet???

    Contiguous datasets are required for:

    • Making a single graph to show comparisons over time (not January, February, and March in separate graphs that take three times as long to create and update);
    • Making static dashboards with formulas and trendlines that’ll update (nearly) automatically as you add new entries to your log; and
    • Making interactive dashboards with charts that’ll update (nearly) automatically as you add new entries to your log.

    If your data visualization is taking too long… it’s usually a data management problem.

    And it can be easily fixed!!!

    Start storing all your non-contiguous datasets as a single contiguous dataset.

    More about Ann K. Emery
    Ann K. Emery is a sought-after speaker who is determined to get your data out of spreadsheets and into stakeholders’ hands. Each year, she leads more than 100 workshops, webinars, and keynotes for thousands of people around the globe. Her design consultancy also overhauls graphs, publications, and slideshows with the goal of making technical information easier to understand for non-technical audiences.

    5 Comments

  • Ann, I personally LOVE (and agree) this blog post. The insights , and easy way that you’ve shown users how to manage a contiguous dataset, are invaluable. Thank you for all that you do!

  • Jon Peltier says:

    Great post, I wish people would arrange their data like this. I tell them, you can spend five minutes making your data right first, or five hours fixing your analyses and visualizations later.

    My only question is, why is all your data left aligned? Using the General alignment shows at a glance whether the numbers are actual numbers or stored as text. (This is the second thing I wish people would do.)

  • Vivian Jefferson says:

    This is a great explanation of the importance of structuring datasets for efficiency. I see non-contiguous datasets a lot from beginning spreadsheet users because it hasn’t occurred to them to combine data collected separately into one table. A few weeks ago I helped someone create a contiguous dataset by making their mini sets into tables, then using the VSTACK function to append them together. That avoided possible errors from copying and pasting data.

  • Leave a Reply

    Your email address will not be published. Required fields are marked *

    You Might Like

    Our complimentary mini course for beginners to dataviz. Takes 45 minutes to complete.

    Enroll

    Need some time-savers in Excel? Feel like all the formulas are taking forever? Learn how to clean, analyze, and tabulate your dataset.

    Enroll

    Subscribe

    Not another fluffy newsletter. Get actionable tips, videos and strategies from Ann in your inbox.