top of page

Generate Large Dataset for Power BI

Updated: Jun 25

I this blog I want to show how you can use the Contoso Generator from SQLBI - Contoso Data Generator


There are quite a few options that they have provided -


  1. Pre-generated backups of different sizes

  2. PowerShell Script

  3. C# project


I am going to cover the PowerShell version in the blog. Before you even get started you need to ensure 3 things:


  1. SQL Server is installed (Use Developer editon, it has everything for free)- https://www.microsoft.com/en-in/sql-server/sql-server-2019

  2. .NetCore 3.1 is installed - https://dotnet.microsoft.com/en-us/download/dotnet/3.1

  3. 7 zip is installed - https://www.7-zip.org/


Once SQL Server is installed you need to give an alias as Demo as it is what is used in the code, you can change the alias and code if you want.


To create alias launch SQL Server Configuration Manager.


For SQL Server 2022 or above if alias field is disabled then refer to this - https://dba.stackexchange.com/questions/319901/how-to-create-alias-in-sql-server-2022


Navigate to SQL Server Network Configuration -> Protocols for SQL2019 -> TCP/IP -> Enable it

ree

Next right click on TCP/IP -> Properties -> IP Addresses -> Scroll to bottom -> IPAll -> TCP Port and provide a port number just make sure it is something that isn't already used. Let's go for 20231 for January 2023. After this click Apply and OK.


ree

Now we need to provide alias for both 32bit and 64bit clients.

ree

Then provide Alias, Port No, Server and SQL Instance name

ree

If you are not sure what is the name of SQL Server instance just open SSMS you will see the server name after it opens up or just look into the project explorer.

ree

Now we need to do the same for 64bit clients like Power BI

ree
ree

Once everything is done restart SQL Server from SQL Server Services

ree

To check if the alias is working just launch Power BI & SSMS:

ree
ree

Now that the setup is done, we can download the files.


The files that we need are here - https://github.com/sql-bi/Contoso-Data-Generator/releases/tag/v1.1.0 we need to download the zip folder Contoso-Data-Generator-exe-1.1.0 and place it in a folder, I kept it in the downloads folder and extracted the contents.


This is what is in that folder:

ree

Now create a folder named TEMP in the C Drive


Go back to the folder containing files so navigate to Downloads\Contoso-Data-Generator-exe-1.1.0\Contoso-Data-Generator\Scripts


There is a QuickRun PowerShell script that creates folders in the TEMP folder and runs the GenerateDatabases.ps1 file


The DatasetSqlbi folder contains some config files in which you can modify several parameters.

ree

Next under Scripts folder > GenerateDatabases.ps1 contains PowerShell script that also allows you to modify and overwrite some of the parameters of Config JSON file.

ree

At a time you need to have only one line uncommented to generate the database, I have the 3rd highlighted line uncommented so that means we are going to generate 1M database, using this you can generate 1B rows database as well.


If you want to understand more about what these parameters are and what they do you can read this: https://docs.sqlbi.com/contoso-data-generator/databasegenerator


Now we need to run the PowerShell script but for some of you this might be the first encounter with PowerShell so your system will have restrictions on running the scripts.


Just go to start and search for PowerShell and right click and run as Administrator


ree

In the PowerShell window just paste this code and press enter

Set-ExecutionPolicy Bypass -Scope CurrentUser -Force
ree

So now what you need to do is go back to the Scripts folder and on the empty space press SHIFT + Right Click and select open PowerShell window here.

ree

Just type Q in the PS window and press TAB for auto completeand you will get this:

ree

Press enter and it will start executing the code and generate the database.


While it generates the database you will be tempted to refresh the SQL Server and you will notice that a new database is created and the tables are added, but there is no use of it for now because after generating the database the script creates a backup file and deletes the databases from SQL Server so that you can restore it from the backup file directly. Also, don't use SQL Server while running this script, the script may not work or continue.


Watch this video:

Once everything is done it is time to restore the database, just go to the TEMP folder and navigate to C:\TEMP\ContosoDataset\SqlBackup here you will find Contoso 1M.bak file we need to use this to restore it into SQL Server.


Now got to SQL Server and Right click on Database and select Restore Database

ree

Select Device -> Elipses -> Add -> Just copy the path and navigate to the folder

ree

Finally in that folder place the BAK file generated by PowerShell script:


ree

Go back to SQL Server and refresh:

ree

Click on OK for all open windows.

ree

And the Database has been generated!

ree

Now you can generated even larger databases and practice stuff like - Incremental refresh, DirectQuery Optimizations, Composite Models on large database, DAX Query optimizations, Data model optimization, Dataflows, Aggregations, Hybrid tables etc.

2 Comments


_ _
_ _
Feb 18, 2024

Nice work! I have been trying to figure out how to use the generate contoso database for a while. Your step by step process finally worked. I will be using the data as sample data on my website and learn how to create incremental data updates. Thanks.

Like

Zhiqiang Zhuang
Zhiqiang Zhuang
Feb 11, 2024

Brilliant! Thank you so much!

Like
bottom of page