Quantcast
Channel: udayarumilli.com
Viewing all 145 articles
Browse latest View live

SQL Server DBA Responsibilities and Roles

$
0
0

DBA_Res3

SQL Server DBA Responsibilities

Key Responsibilities – DBA

We can categorize SQL Server DBA Responsibilities into 7 types

  • Capacity Management
  • Security Management
  • High Availability Management
  • Backup and Recovery Management
  • Performance Tuning
  • Process Improvements
  • Daily, Weekly and Monthly maintenance
  • Installations / Upgrades / Patching
  • Health Checks / Report Analysis

DBA Daily Responsibilities

On daily basis DBA should monitor the below

Backups

  • Confirm that backups have been made and successfully saved to a secure location
  • Check the backup failure alerts, correct the errors and rerun the backups
  • Review the average duration of backup, any significant changes occurred investigates on this. Most of the time it happens due to networking low bandwidth
  • Validate the backup files using restore verify only. We can create jobs to take care of the task and to send a notification if it fails to verify any backup.
  • Monitor all backup and log history is cleaning if it is designed.
  • Find out the newly added databases and define the backup plan

Disk Space

  • Verify the free space on each drive on all servers. If there is significant variance in free space from the day before, research the cause of the free space fluctuation and resolve if necessary. Often times, log files will grow because of monthly jobs.
  • Automate through a job. The job runs for every one hour and reports any drive which is having less than 15 % of free space. We can design a SSRS report to showcase and review the delta values.

Jobs / Maintenance plans

  • Check for the failed jobs, investigate the route cause and resolve the issue. Native SQL notification alert sends a mail to DBA team and also we can design a customized SSRS report which reports all failed/success job details with delta value.
  • Check for the job execution duration and if any significant changes find the root cause and resolve the issue.
  • Make sure that all process related jobs/ maintenance plans completed. That includes data pulling jobs, data update jobs etc.
  • All cleanup jobs are running fine, temp tables, logs, history, backup files etc

Servers/Databases

  • Confirm all servers/databases are up and running fine.
  • Usually in an Enterprise Database Environment Third Party Tools are used to monitor Servers (Ex: “What’s Up”)
  • For database monitoring we can design a native SQL Server solution using T-SQL code and a maintenance plan, it run min by min and send an email to DBA team if it is not able to connect to any of the database in the instance.

Performance

  • Regularly monitor and identify blocking issues. We can design a procedure that continuously run on all PROD servers and notifies DBA team if any blockings, long running quires/transactions.
  • Check Performance counters on all production servers and verify that all counters are within the normal range. We can design a SSRS metrics report to review the counters for every one hour.
  • Throughout the day, periodically monitor performance using both System Monitor and DMV.
  • Check the fragmentation and rebuild/ reorganize the indexes. We can use a native procedure which takes care of the task.
  • Make sure all Stats Update / nightly_checkalloc / Index_Rebuild jobs are completed without any issue.

Logs

  • Have a look at both SQL Server logs and Windows logs. If you find any strange issues notify the network or storage teams. Most of the times we find Network related or I/O related issues.
  • Check the centralized error logs if any.

Security

  • Check the error logs for failed logins and notify the audit team if necessary
  • Security Logs – Review the security logs from a third party solution or from the SQL Server Error Logs to determine if you had a breach or a violation in one of your policies.

High-Availability

  • High Availability or Disaster Recovery Logs – Check your high availability and/or disaster recovery process logs.  Depending on the solution (Log Shipping, Clustering, Replication, Database Mirroring, CDP, etc.) that you are using dictates what needs to be checked.
  • We can design a native scripts using T-SQL to monitor Replication, Mirroring, Log shipping
  • Monitor logshipping and mirroring using the customized stored procs.
  • In most of the environments we see third party tools in monitoring Clusters or we can design our own native scripts using Windows Batch Programming , Powershell and T-SQL.

Request Handling:

  • Check the escalated issues first
  • Check the current queue for requests and identify requests to be processed and work on the issue.
  • We usually process the requests based on the SLA.

Weekly / Monthly Checklist

  • Backup Verification (Comprehensive)- Verify your backups and test on a regular basis to ensure the overall process works as expected. Contact your off site tape vendor and validate the type does not have any restore errors
  • Check the logins, service accounts for expire dates
  • Backup Verification – Verify your backups on a regular basis. Randomly choose one or two backups and try to restore verify.
  • Windows, SQL Server or Application Updates – Check for service packs/patches that need to be installed on your SQL Server from either a hardware, OS, DBMS or application perspective
  • Capacity Planning – Perform capacity planning to ensure you will have sufficient storage for a specific period of time such as for 3, 6, 12 or 18 months.
  • Fragmentation – Review the fragmentation for your databases to determine if you particular indexes must be rebuilt based on analysis from a backup SQL Server.
  • Maintenance – Schedule an official maintenance, do all required health checks on all premium databases and servers.
  • Security – Remove unneeded logins and users for individuals that have left the organization, had a change in position, etc.
  • Moving data from production to archive: If your environment requires any specific DBA related data for a long time, plan for an archival procedure. We can achieve data from actual OLTP system to other dedicated DBA Server / Database

Monitoring Infrastructure

  • We need to work with the other teams to make sure that all servers are at health condition and to balance the infrastructure.
  • Usually we approach other teams for below (CR/IR/SR – Change Request/ Incident / Service)
  • Found I/O errors
  • Networking issues
  • F/R – Flatten and Rebuild
  • Adding space / drives – SAN (Storage Area Network)
  • Starting a Server – When a manual interaction needed
  • Backup Tapes – Archived backups
  • Escalating an issue

Documentation

  • Document all changes you make to the environment that includes:
  • Installations / Upgrades
  • Service packs / HotFixes applied
  • New databases added
  • Logins \ Roles added / removed
  • Check lists
  • Infrastructure changes
  • Process improvements
  • Maintain a centralized inventory with all details for below items
  • Database Servers
  • Application Servers
  • Web Servers
  • Logins (Both SQL Server and Windows)
  • Database Instances
  • Databases (Dev / Stag / QA / PROD)

Note: Document as much as information. It really helps in critical situations. Try to add owner at each level, for example application owners, server owners etc. We can easily reach them in-case of any emergencies / Maintenance.

Third Party Tools

  • MOM 2005 – Microsoft Operational Manager for monitoring IT infrastructure
  • ApexSQL Monitor
  • SQL Spotlight by Quest
  • Microsoft Systems Center 2012 (SCOM)
  • SQL Monitor by RedGate
  • LiteSpeed
  • Netbackup
  • RoboCopy – Copying data, files across servers
  • Heartbeat – Database monitoring tool
  • What’Up – Server monitoring tool
  • Octopus – Application deployment tool
  • Team Viewer – Remote server connector
  • VNC – Remote server connector
  • SQL DOC2 – Reporting database architecture

Thank You

The post SQL Server DBA Responsibilities and Roles appeared first on udayarumilli.com.


SSIS Interview Questions and Answers for Experienced and Freshers

$
0
0

SSIS Interview Questions 1

SSIS – Part 1

SSIS Interview Questions and Answers for Experienced and Freshers

 

Here we are publishing series of posts on SSIS Interview questions with answers for experienced and freshers . Below is the series 1.

Q. Define SSIS?

Ans:

SQL Server Integration Services — commonly known as SSIS is the new platform that was introduced in SQL Server 2005, for data transformation and data integration solutions. This replaced the DTS in SQL Server 2000.

Q. Name a few SSIS components?

Ans:

  • Integration Services Projects
  • Integration Services Packages
  • Control Flow Elements
  • Data Flow Elements
  • Integration Services Connections
  • Integration Services Variables
  • Integration Services Event Handlers
  • Integration Services Log Providers

Q. What is a project and Package in SSIS?

Ans:

Project is a container for developing packages. Package is nothing but an object. It implements the functionality of ETL — Extract, Transform and Load — data.

Q. What are the 4 elements (tabs) that you see on a default package designer in BIDS?

Ans:

Control Flow, Data Flow, event Handler and package explorer. (Parameters – 2012 Data Tools)

Q. What is a Control flow and Data Flow elements in SSIS?

Ans:

Control Flow:

Control flow element is one that performs any function or provides structure or control the flow of the elements. There must be at least one control flow element in the SSIS package. In SSIS a workflow is called a control-flow. A control-flow links together our modular data-flows as a series of operations in order to achieve a desired result.

A control flow consists of one or more tasks and containers that execute when the package runs. To control order or define the conditions for running the next task or container in the package control flow

Data Flow:

All ETL tasks related to data are done by data flow elements. It is not necessary to have a data flow element in the SSIS package. A data flow consists of the sources and destinations that extract and load data, the transformations that modify and extend data, and the paths that link sources, transformations, and destinations. Before you can add a data flow to a package, the package control flow must include a Data Flow task. The Data Flow task is the executable within the SSIS package that creates, orders, and runs the data flow. A separate instance of the data flow engine is opened for each Data Flow task in a package.

Q. What are the 3 different types of control flow elements in SSIS?

Ans:

  • Structures provided by Containers
  • Functionality provided by Tasks
  • Precedence constraints that connect the executables, containers, and tasks into an ordered control flow.

Q. What are the 3 data flow components in SSIS?

Ans:

  • Source
  • Transformation
  • Destination

Q. What are connections and connection managers in SSIS?

Ans:

Connection as its name suggests is a component to connect to any source or destination from SSIS — like a sql server or flat file or lot of other options that SSIS provides. Connection manager is a logical representation of a connection.

Q. What is the use of Check Points in SSIS?

Ans:

SSIS provides a Checkpoint capability which allows a package to restart at the point of failure.

Q. What are the command line tools to execute SQL Server Integration Services packages?

Ans:

DTSEXECUI – When this command line tool is run a user interface is loaded in order to configure each of the applicable parameters to execute an SSIS package.

DTEXEC – This is a pure command line tool where all of the needed switches must be passed into the command for successful execution of the SSIS package.

Q. Can you explain the SQL Server Integration Services functionality in Management Studio?

Ans:

You have the ability to do the following:

  • Login to the SQL Server Integration Services instance
  • View the SSIS log
  • View the packages that are currently running on that instance
  • Browse the packages stored in MSDB or the file system
  • Import or export packages
  • Delete packages
  • Run packages

Q. Can you name some of the core SSIS components in the Business Intelligence Development Studio you work with on a regular basis when building an SSIS package?

Ans:

  • Connection Managers
  • Control Flow
  • Data Flow
  • Event Handlers
  • Variables window
  • Toolbox window
  • Output window
  • Logging
  • Package Configurations

Q. Name Transformations available in SSIS?

Ans:

DATACONVERSION: Converts columns data types from one to another type. It stands for Explicit Column Conversion.

DATAMININGQUERY: Used to perform data mining query against analysis services and manage Predictions Graphs and Controls.

DERIVEDCOLUMN: Create a new (computed) column from given expressions.

EXPORTCOLUMN: Used to export a Image specific column from the database to a flat file.

FUZZYGROUPING: Used for data cleansing by finding rows that are likely duplicates.

FUZZYLOOKUP: Used for Pattern Matching and Ranking based on fuzzy logic.

AGGREGATE: It applies aggregate functions to Record Sets to produce new output records from aggregated values.

AUDIT: Adds Package and Task level Metadata: such as Machine Name, Execution Instance, Package Name, Package ID, etc..

CHARACTERMAP: Performs SQL Server column level string operations such as changing data from lower case to upper case.

MULTICAST: Sends a copy of supplied Data Source onto multiple Destinations.

CONDITIONALSPLIT: Separates available input into separate output pipelines based on Boolean Expressions configured for each output.

COPYCOLUMN: Add a copy of column to the output we can later transform the copy keeping the original for auditing.

IMPORTCOLUMN: Reads image specific column from database onto a flat file.

LOOKUP: Performs the lookup (searching) of a given reference object set to a data source. It is used for exact matches only.

MERGE: Merges two sorted data sets into a single data set into a single data flow.

MERGEJOIN: Merges two data sets into a single dataset using a join junction.

ROWCOUNT: Stores the resulting row count from the data flow / transformation into a variable.

ROWSAMPLING: Captures sample data by using a row count of the total rows in dataflow specified by rows or percentage.

UNIONALL: Merge multiple data sets into a single dataset.

PIVOT: Used for Normalization of data sources to reduce anomalies by converting rows into columns

UNPIVOT: Used for de-normalizing the data structure by converts columns into rows in case of building Data Warehouses.

References:

For more MSBI stuff please have a look at below references:

http://blogs.msdn.com/b/business-intelligence

https://sreenivasmsbi.wordpress.com

http://www.msbiguide.com

http://msbiravindranathreddy.blogspot.in

http://sqlschool.com/MSBI-Interview-Questions.html

https://www.katieandemil.com

http://www.venkateswarlu.co.in

http://www.sqlserverquest.com

http://www.msbiguide.com

The post SSIS Interview Questions and Answers for Experienced and Freshers appeared first on udayarumilli.com.

SSIS Interview Questions and Answers Part 2

$
0
0

 SSIS Interview Questions

 

SSIS – Part 2

SSIS Interview Questions and Answers for Experienced and Fresher’s

Here we are publishing series of posts on SSIS Interview questions and answers Part 2 for experienced and freshers . Below is the series 2.

Q. What is a breakpoint in SSIS?

Ans:

A breakpoint is a stopping point in the code. The breakpoint can give the Developer\DBA an opportunity to review the status of the data, variables and the overall status of the SSIS package.
Breakpoints are setup in BIDS. In BIDS, navigate to the control flow interface. Right click on the object where you want to set the breakpoint and select the ‘Edit Breakpoints…’ option.

Q. Can you name 5 or more of the native SSIS connection managers?

Ans:

  • OLEDB connection – Used to connect to any data source requiring an OLEDB connection (i.e., SQL Server)
  • Flat file connection – Used to make a connection to a single file in the File System. Required for reading information from a File System flat file
  • ADO.Net connection – Uses the .Net Provider to make a connection to SQL Server 2005 or other connection exposed through managed code (like C#) in a custom task
  • Analysis Services connection – Used to make a connection to an Analysis Services database or project. Required for the Analysis Services DDL Task and Analysis Services Processing Task
  • File connection – Used to reference a file or folder. The options are to either use or create a file or folder
  • Excel
  • FTP
  • HTTP
  • MSMQ
  • SMO
  • SMTP
  • SQL Mobile
  • WMI

Q. How do you eliminate quotes from being uploaded from a flat file to SQL Server?
Ans:

In the SSIS package on the Flat File Connection Manager Editor, enter quotes into the Text qualifier field then preview the data to ensure the quotes are not included.

Q. Can you name 5 or more of the main SSIS tool box widgets and their functionality?

Ans:

  • ActiveX Script Task
  • Analysis Service Processing Task
  • Analysis Services Execute DDL Task
  • Backup Database Task
  • Bulk Insert Task
  • CDC Control Task
  • Check Database Integrity Task
  • Data Flow Task
  • Data Mining Query Task
  • Data Profiling Task
  • Execute DTS 2000 Package Task – Till 2008
  • Execute Package Task
  • Execute Process Task
  • Execute SQL Server Agent Job Task
  • Execute SQL Task
  • Execute T-SQL Statement Task
  • Expression Task
  • File System Task
  • For Loop Container
  • Foreach Loop Container
  • FTP Task
  • History Cleanup Task
  • Maintenance Cleanup Task
  • Message Queue Task
  • Notify operator Task
  • Rebuild Index Task
  • Reorganize Index Task
  • Script Task
  • Send Mail Task
  • Sequence Container
  • Shrink Datbase Task
  • Transfer Database Task
  • Transfer error message
  • Transfer Jobs Task
  • Transfer Logins Task
  • Transfer Mastor Stored Procedures Task
  • Transfer SQL Server Object Task
  • Update Ststistics Task
  • Web Service Task
  • WMI Datareader Task
  • WMI Event Watcher Task
  • XML Task

Q. Can you explain one approach to deploy an SSIS package?

Ans:

  • One option is to build a deployment manifest file in BIDS, then copy the directory to the applicable SQL Server then work through the steps of the package installation wizard
  • A second option is using the dtutil utility to copy, paste, rename, delete an SSIS Package
  • A third option is to login to SQL Server Integration Services via SQL Server Management Studio then navigate to the ‘Stored Packages’ folder then right click on the one of the children folders or an SSIS package to access the ‘Import Packages…’ or ‘Export Packages…’option.
  • A fourth option in BIDS is to navigate to File | Save Copy of Package and complete the interface.

Q. Can you explain how to setup a checkpoint file in SSIS?

Ans:
The following items need to be configured on the properties tab for SSIS package:

CheckpointFileName – Specify the full path to the Checkpoint file that the package uses to save the value of package variables and log completed tasks. Rather than using a hard-coded path, it’s a good idea to use an expression that concatenates a path defined in a package variable and the package name.

CheckpointUsage – Determines if/how checkpoints are used. Choose from these options: Never (default), If Exists, or Always. Never indicates that you are not using Checkpoints. “If Exists” is the typical setting and implements the restart at the point of failure behavior. If a Checkpoint file is found it is used to restore package variable values and restart at the point of failure. If a Checkpoint file is not found the package starts execution with the first task. The Always choice raises an error if the Checkpoint file does not exist.

SaveCheckpoints – Choose from these options: True or False (default). You must select True to implement the Checkpoint behavior.

Q. Would you recommend using “Check Points” in SSIS?

Ans:

As per my experience I could say “NO” as there are compatibility issues with various options hence using checkpoints may give unpredictable results. Checkpoints doesn’t work properly when a SSIS package contains

  • Complex logic
  • Iterations/Loops
  • Transactions Enabled
  • “Object” type variables
  • Parallel execution

Checkpoints works fine when the package is having straightforward control flow with a single thread.

 

Q. Can you explain different options for dynamic configurations in SSIS?

Ans:

  • Use an XML file
  • Use custom variables
  • Use a database per environment with the variables
  • Use a centralized database with all variables

Q. How do you upgrade an SSIS Package?

Ans:

Depending on the complexity of the package, one or two techniques are typically used:

  • Recode the package based on the functionality in SQL Server DTS.
  • Use the Migrate DTS 2000 Package wizard in BIDS and then recode any portion of the package that is not accurate

Q. Can you name five of the Perfmon counters for SSIS and the value they provide?

Ans:

SQLServer: SSIS Service

SSIS Package Instances – Total number of simultaneous SSIS Packages running

SQLServer: SSIS Pipeline

BLOB bytes read – Total bytes read from binary large objects during the monitoring period.

BLOB bytes written – Total bytes written to binary large objects during the monitoring period.

BLOB files in use – Number of binary large objects files used during the data flow task during the monitoring period.

Buffer memory: The amount of physical or virtual memory used by the data flow task during the monitoring period.

Buffers in use – The number of buffers in use during the data flow task during the monitoring period.

Buffers spooled – The number of buffers written to disk during the data flow task during the monitoring period.

Flat buffer memory – The total number of blocks of memory in use by the data flow task during the monitoring period.

Flat buffers in use – The number of blocks of memory in use by the data flow task at a point in time.

Private buffer memory – The total amount of physical or virtual memory used by data transformation tasks in the data flow engine during the monitoring period.

Private buffers in use – The number of blocks of memory in use by the transformations in the data flow task at a point in time.

Rows read – Total number of input rows in use by the data flow task at a point in time.

Rows written – Total number of output rows in use by the data flow task at a point in time.

Q. How do you handle errors in ssis?

Ans:

When a data flow component applies a transformation to column data, extracts data from sources, or loads data into destinations, errors can occur. Errors frequently occur because of unexpected data values.

Errors typically fall into one the following categories:

Data conversion errors: occurs if a conversion results in loss of significant digits, the loss of insignificant digits, and the truncation of strings. Data conversion errors also occur if the requested conversion is not supported.

Expression evaluation errors: occurs if expressions that are evaluated at run time perform invalid operations or become syntactically incorrect because of missing or incorrect data values.

Lookup errors: occurs if a lookup operation fails to locate a match in the lookup table.

Many data flow components support error outputs, which let you control how the component handles row-level errors in both incoming and outgoing data. You specify how the component behaves when truncation or an error occurs by setting options on individual columns in the input or output.

Q. How do you do Logging in SSIS?

Ans:

  • SSIS includes logging features that write log entries when run-time events occur and can also write custom messages.
  • The Integration Services log providers can write log entries to text files, SQL Server Profiler, SQL Server, Windows Event Log, or XML files.
  • Logs are associated with packages and are configured at the package level. Each task or container in a package can log information to any package log. The tasks and containers in a package can be enabled for logging even if the package itself is not.

To enable logging in a package:

  • In Business Intelligence Development Studio, open the Integration Services project that contains the package you want.
  • On the SSIS menu, click Logging.
  • Select a log provider in the Provider type list, and then click Add.

 

Q. Demonstrate how you would suggest using configuration files in packages.  Would you consider it a best practice to create a configuration file for each connection manager or one for the entire package?

Ans:

There should be a single configuration file for each connection manager in your packages that stores their connection string information.  So if you have 6 connection managers then you have 6 config files.  You can use the same config file across all your packages that use the same connections.

If you have a single config file that stores all your connection managers then all your packages must have contain the connection managers that are stored in that config file.  This means you may have to put connection managers in your package that you don’t even need.

 

Q. Demonstrate how checkpoints work in a package.

Ans:

When checkpoints are enabled on a package if the package fails it will save the point at which the package fails.  This way you can correct the problem then rerun from the point that it failed instead of rerunning the entire package.  The obvious benefit to this is if you load a million record file just before the package fails you don’t have to load it again.

Q. Demonstrate how transactions work in a package.

Ans:

If transactions are enabled on your package and tasks then when the package fails it will rollback everything that occurred during the package. First make sure MSDTC (Microsoft Distributed Transaction Coordinator) is enabled in the Control Panel -> Administrative Tools -> Component Services. Transactions must be enabled not only on the package level but also on each task you want included as part of the transaction. To have the entire package in a transaction set Transaction Option at the package level to Required and each task to Supported.

Q. If you have a package that runs fine in Business Intelligence Development Studio (BIDS) but fails when running from a SQL Agent Job what would be your first guess on what the problem is?

Ans:

The account that runs SQL Agent Jobs likely doesn’t have the needed permissions for one of the connections in your package. Either elevate the account permissions or create a proxy account.

To create a proxy account you need to first create new credentials with the appropriate permissions. Next assign those credentials to a proxy account. When you run the job now you will select Run As the newly created proxy account.

 

Q. What techniques would you consider to add auditing to your packages?  You’re required to log when a package fails and how many rows were extracted and loaded in your sources and destinations.

Ans:

I like to create a database that is designated for package auditing. Track row counts coming from a source and which actually make it to a destination. Row counts and package execution should be all in one location and then optionally report off that database.

There are also third party tools that can accomplish this for you (Pragmatic Works BI xPress).

 

Q. Demonstrate or whiteboard techniques you would use to for CDC (Change Data Capture)?  Tell how you would write a package that loads data but first detects if the data already exists, exists but has changes, or is brand new data for a destination.

Ans:

For small amounts of data I may use the Slowly Changing Dimension. More often than not the data is too large to use in such a slow transform.

I prefer to do a lookup on the key of the target table and rows that don’t match are obviously new rows that can be inserted. If they do match it’s possible they are updates or duplicates. Determine this by using a conditional split comparing rows from the target to incoming rows. Send updates to a staging table that can then be updated in an Execute SQL Task.

Explain that putting updates in a staging table instead of updating using the OLE DB Command is much better for performance because the Execute SQL Task performs a bulk operation.

Q. Explain what breakpoints are and how you would use them.

Ans:
Breakpoints put pauses in your package. It’s a great tool for debugging a package because you can place a breakpoint on a task and it will pause the package based on execution events.

A reason in which I have used breakpoints is when I have a looping container and I want to see how my variables are changed by the loop. I would place a watch window on the package and type the variable name in. Set a break point on the container the stop after each iteration of the loop.

Q. What are the main components involved in SSIS?

Ans:

  • SSIS is not improved version of DTS
  • SSIS is completely redesigned and build from ground up using .NET code.
  • SSIS is mainly divided into two parts.
  • Data Transformation Pipeline (DTP) – Data Flow
  • Data Transformation Runtime (DTR) – Control Flow
  • In SQL SERVER 7 / 2000 the data flow is stronger than control flow but in SSIS both are in the same level

Q. What is the work of DTP Engine?

Ans:

  • DTP consists of DTP Engine and DTP Object model
  • DTP uses Data Adapters to connect source and destination
  • DTP engine uses DTP Object Model which is nothing but an API.
  • SSIS comes with adapters for SQL Server databases, XML, flat files, and other OLE DB–compliant data sources
  • The job of the data adapters is to make connections to the data’s source and destination endpoints
  • The job of the transformations is to move and optionally manipulate the data as it’s moved between the source and destination endpoints.

Q. How the DTR works in SSIS?

Ans:

  • The DTR consists of the DTR engine and the DTR components.
  • DTR components are objects that enable you to govern the execution of SSIS packages.
  • The primary DTR components are packages, containers, and tasks.
  • DTR engine stores package layout; runs packages; and provides debugging, logging, and event handling services.
  • The DTR is accessed using the DTR object framework. The DTR run-time object framework is the API that supports the Integration Services Import/Export Wizard and the Integration Services Designer in addition to the command-line dtexec tool.

Q. Can you explain the SSIS Architecture?

Ans:

Runtime engine

The Integration Services runtime saves the layout of packages, runs packages, and provides support for logging, breakpoints, configuration, connections, and transactions.

API or object model

The Integration Services object model includes managed application programming interfaces (API) for creating custom components for use in packages, or custom applications that create, load, run, and manage packages. Developer can write custom applications or custom tasks or transformations by using any common language runtime (CLR) compliant language.

Integration Services service: It is a Windows service, monitors running SSIS packages and manages the storage of packages.

Data flow: It contains a data flow engine that manages the data flow components. There are 3 types of

Data Flow components – Source components (which extracts the data from a system), Transformation components (performs transformations, modifications onto the extracted data) and Load components (which simply performs the data loading tasks into the destination systems). Besides the available data flow components, we can write our own custom data flow components to accomplish any custom requirements.

References:

For more MSBI stuff please have a look at below references:

http://blogs.msdn.com/b/business-intelligence

https://sreenivasmsbi.wordpress.com

http://www.msbiguide.com

http://msbiravindranathreddy.blogspot.in

http://sqlschool.com/MSBI-Interview-Questions.html

https://www.katieandemil.com

http://www.venkateswarlu.co.in

http://www.sqlserverquest.com

http://www.msbiguide.com

The post SSIS Interview Questions and Answers Part 2 appeared first on udayarumilli.com.

SSIS Interview Questions and Answers Part 3

$
0
0

SSIS Interview

SSIS – PART 3

SSIS – Performance Tuning

SSIS Interview Questions and Answers Part 3

Here we are publishing series of posts on SSIS Interview questions and answers Part 3 for experienced and freshers. Below is the series 3.

Q. How to quickly load data into sql server table?

Ans:

Fast Load option: This option is not set by default so most developers know this answer as otherwise the load is very slow.

Q. What are the fast load options available in SSIS?

Ans:

The OLE DB Destination provides more than one way to load data in the destination (5 types of Data Access Mode).  Use Fast Load option while loading data into the destination.

  • Data Access Mode – It allows to define the method to upload data into the destination. The fast load option will use BULK INSERT statement instead of INSERT statement. If the fast load option is not selected then by default INSERT is used.
  • Keep Identity – If selected, the identity values of source are preserved and the same are uploaded into the destination table. Else destination table will create its own identity values if there is any column of identity type.
  • Keep Nulls – If selected, the null values of the source are preserved and are uploaded into the destination table. Else if any column has default constraint defined at destination table and NULL value is coming from the source for that column then in that case, default value will be inserted into the destination table.
  • Table Lock – If selected, the TABLOCK is acquired on the table during data upload. It is the recommended option if table is not being used by any other application at the time of data upload as it removes the overhead of lock escalation.
  • Check Constraints – Check constraints will always check for any constraint for the data that is coming through pipeline. It is preferable to uncheck this option if constraint checking is not required. This will reduce the overhead for the pipeline engine.
  • Rows per batchRowsPerBatch is the number of rows you would want in One Buffer. SSIS automatically sets this property based on the RowSize and MaxBufferRows property. The number of rows coming from the pipeline per batch can be defined by user. The default value is -1 if it is kept blank. You can specify the no. of rows as a positive integer (N) so that the records will come as small segments or batches, each segment containing N no. of rows.
  • Maximum insert commit size – You can specify the batch size that the OLE DB destination tries to commit during fast load operations; it actually splits up chunks of data as they are inserted into your destination. If you provide a value for this property, the destination commits rows in batches that are the smaller from either (a) the Maximum insert commit size, or (b) the remaining rows in the buffer that is currently being processed.
  • Network limitations:  You can transfer data as fast as your network supports. But use them efficiently; you can customize SSIS to use the maximum bandwidth of your network. You can set the Packet Size property of the connection manager to an integer value that suits you. The max value that you can insert is 32767.

Q. What are the lookup cache modes available and how to use them?

Ans:

In 2008 we have three different cache modes for lookup transformations.

  • Full Cache – Default
  • Partial Cache
  • No Cache

Full Cache:

The database is queried once during the pre-execute phase of the data flow. The entire reference set is pulled into memory. This approach uses the most memory, and adds additional startup time for your data flow. Lookup will not swap memory out to disk, so your data flow will fail if you run out of memory.

  • When to use Full cache mode
  • When you’re accessing a large portion of your reference set
  • When you have a small reference table
  • When your database is remote or under heavy load, and you want to reduce the number of queries sent to the server

Partial Cache:

In this mode, the lookup cache starts off empty at the beginning of the data flow. When a new row comes in, the lookup transform checks its cache for the matching values. If no match is found, it queries the database. If the match is found at the database, the values are cached so they can be used the next time a matching row comes in.

In 2008 there is a new Miss Cache feature that allows you to allocate a certain percentage of your cache to remembering rows that had no match in the database. This is useful in a lot of situations, as it prevents the transform from querying the database multiple times for values that don’t exist

  • When to use this cache mode
  • When you’re processing a small number of rows and it’s not worth the time to charge the full cache
  • When you have a large reference table
  • When your data flow is adding new rows to your reference table

No Cache:

As the name implies, in this mode the lookup transform doesn’t maintain a lookup cache (actually, not quite true – we keep the last match around, as the memory has already been allocated). In most situations, this means that you’ll be hitting the database for every row.

  • When to use this cache mode
  • When you’re processing a small number of rows
  • When you have non-repeating lookup indexes
  • When your reference table is changing (inserts, updates, deletes)
  • When you have severe memory limitations

Q. What are the different types of Transformations in SSIS?

Ans:

Non-BlockingNo blocking

Partial BlockingThe downstream transformations wait for certain periods, it follows start then stop and start over technique

Full Blocking: The downstream has to be waiting till the data has been released from the upstream transformation.

Non-blocking transformations

  • Audit
  • Cache Transform
  • Character Map
  • Conditional Split
  • Copy Column
  • Data Conversion
  • Derived Column
  • Export Column
  • Import Column
  • Lookup
  • Multicast
  • OLE DB Command
  • Percentage Sampling
  • Script Component
  • Slowly Changing Dimension

Partial blocking transformations

  • Data Mining
  • Merge
  • Merge Join
  • Pivot
  • Unpivot
  • Term Lookup

Fully Blocking Transformations

  • Aggregate
  • Fuzzy grouping
  • Fuzzy lookup
  • Row Sampling
  • Sort
  • Term Extraction

If you clearly observe Sort is a fully blocking transformation, so it’s better to sort your data using the SQL command in OLE DB Source instead of using sort transformation. Merge transform requires Sort but not Union All, so use Union All wherever possible.

Q. Consider a scenario where I am using “Sort” transformation and my requirement is to after sort operation completed I have to remove all duplicate records. Duplicate records are defined based on sort values for example I am sorting result set based on three columns, when these 3 columns are having same values those rows are considered as duplicates. Now my question is which transformation we have to use to ignore all these duplicate records?

Ans:

We need not use any specific transformation to remove duplicate records based on sort columns. There is a feature available at “Sort” transformation itself. We can find an option “Remove duplicate sort values” at the bottom left corner of SORT transformation editor. Just check that box.

 

Q. How to avoid the sort transformation in SSIS?

Ans:

Input datasets are to be in sorted order while dealing with the “Merge” or “Merge Join” transformations. To avoid the sort transformation we can directly use a “Query” with order by clause at data source. But remember we can do that when data source is OLEDB or Excel. When it comes to flat file we don’t have a choice but choose the best way to implement sort transformation.

For example if there is an aggregate required then apply aggregate before applying the sort transformation. If possible load flat file data into stage tables, apply sort at database level and load them at destination. So that we should have two data flows one is to load data from flat files to stage tables and other is to loading data into destination from stage tables hence we can use parallelism property.

 

Q. How an SSIS package or a data flow knows that the input dataset / source dataset is in sorted order?

Ans:

If the source dataset is in sorted order or we are using a query with order by at source we have to explicitly mention this information at “OLEDB” source.

There are two properties that we need to change at OLEDB source.

1. Open OLEDB source advanced editor

2. Goto tab “Input and Output Properties”

3. Select “OLEDB source OUTPUT”

4. In the properties select the value “True” for the property “IsSorted”

5. Expand Output Column list and select the column name and set “SortKeyPosition” value to one.

6. Repeat the step 5 for all columns in order by cluase by giving appropriate priority

Q. What data providers supported for OLEDB connection manager for cache option when lookup transformation?

Ans:

SQL Server

Oracle

DB2

Q. From your customer side one of the architect asked you the below information. “I just wanted to know how many number of execution trees are being created for SSIS package which loads data on daily basis.”

How do we know this information?

Ans:
We can actually use custom log events to capture this information.

The log entry “PipelineExecutionTrees” helps us know about the execution trees created at run time. It includes lots of info for example number of rows stored in a buffer while executing a transformation etc.

For more info please have a look at below link

http://msdn.microsoft.com/en-us/library/ms345174.aspx

Q. Do you know when an execution tree created and when it ends in a dataflow? Simply what is the scope of an execution tree?

Ans:

The work to be done in the data flow task is divided into multiple chunks, which are called execution units, by the dataflow pipeline engine.  Each represents a group of transformations. The individual execution unit is called an execution tree, which can be executed by separate thread along with other execution trees in a parallel manner. The memory structure is also called a data buffer, which gets created by the data flow pipeline engine and has the scope of each individual execution tree. An execution tree normally starts at either the source or an asynchronous transformation and ends at the first asynchronous transformation or a destination. During execution of the execution tree, the source reads the data, then stores the data to a buffer, executes the transformation in the buffer and passes the buffer to the next execution tree in the path by passing the pointers to the buffers.
​​

Q. While running SSIS package, after 15 min of execution it went to hung state. How you troubleshoot?

Ans:

There are three common reasons that hold / hung the SSIS execution.

  1. Resource Bottleneck: Memory / CPU / IO / Network
  2. Blocking / Deadlock: Blocking happens at database level or In accessing a file or reading writing variables from script task.
  3. Poor Performance query: If SSIS stops at Execute SQL Task look for query using inside the task and tune it.

Looking through above aspects one can identify the issue, based on that we can provide the resolution. If everything looks good but still SSIS is in hung state then check the latest service pack is applied if that’s also passed collect the hung dump file using ADPlus and contact Microsoft support center.

Q. SSIS 2008 uses all available RAM, and after package completes Memory is not released?

Ans:

This is not actually a problem. You have allowed SQL Server to use x amount of memory, so it does. SQL Server takes that memory as required, up to the limit set, but it does not release it. It can respond to request from OS, again read up on the fine details, but by default once it has got hold of some memory it will keep it even if it is not using it currently. The simple reason is that finding and taking hold of memory is quite expensive to do, so once it has it it keeps it and then any subsequent operations that need memory will have it available much faster. This makes perfect sense when you remember that SQL Server is a service application and more often than not runs on a dedicated machine.

Q. What is the property “RunInOptimized”? How to set this property?

Ans:

If this property is set to true then the SSIS engine ignore the unused/unmapped columns. Means it does not allocate memory to store data for those columns. At the compilation phase itself SSIS engine identifies what are the columns from source are using across the package, if it finds any columns are neither using nor mapping to destination, it simply ignores all those columns.

We can set this property at two levels “Project Level” and “Package Level”.

Project Level: From project properties → Debugging → RunIn*****. By default “FALSE”

Package Level: Can find in DataFlow properties. By default “TRUE”

Q. Does using “RowCount” transformation affects the package performance?

Ans:

Rowcount component is a synchronous component and it doesn’t actually do anything particularly resource intensive means the performance degradation of your package should be negligible.

We do use this component to capture the number of inserts, deletes and updates from each data-flow and then using “OnPost Execute” event this information would be written to a SQL Server table.

 

Q. A SSIS 2008 package has been crashed due to low memory. How to resolve low memory issues with SSIS package?

Ans:

1. Add more memory to the physical machine

2. Run SSIS package on a computer that is not running an instance of SQL Server

3. When SSIS and SQL instance on the same machine, balance the memory allocated to SQL Server instance using “MAX Server Memory” option.

4. Run SSIS package components in series instead of parallel

 

Q. How to identify the SSIS processes?

Ans:

SSIS run-time processes include the DTExec.exe process and the DTSHost.exe process.

 

Q. How to enable containers continue to run even a task failed inside the container? Suppose you have an application, where we need to loop through some log table based on the IDs & load data into the destination. Now, in this scenario there might be the situation where some of the tasks in foreach loop container may fail. But your requirement is even though the inner tasks fail we should process the other sources which are available with us.

Ans:

We can do this by updating the propagation property of a task / container to “False”. It means that the loop or sequence container ignores the failure of an internal task.

Assume we have designed a foreach loop container with a dataflow task. As per our requirement DFT is loading 100 files into database if DFT is failed to load 47th file it should skip the error and should continue to load from 48th file.

Steps to accomplish this are:

Select the Data Flow Task and goto eventhandler

Enable the OnError Event handler.

In the Event Handler tab, click on the “Show System Variables”.

Now select the “Propogate” property & change its value to “False”.

This will ensure that the parent control i.e. ForEach loop will not know about the error in the child task.

If incase the foreach loop container is having more than one task, instead of setting the property to all these tasks, add all these tasks to sequence container and change the “Propagate” property of sequence container.

Note: When this kind of situation comes to the single task instead of a loop we can actually use a property called “ForceExecutionValue” to “True” and give the value to “ForcedExecutionValue”“1”. This means that irrespective of execution result ssis engine forces the outcome to success.

 

Q. What is ForceExecution property in SSIS component properties?

Ans:

ForceExecution is a property of Controlflow elements in SSIS. If it is enabled to any of the element then ssis engine follows the execution result as per the given parameters. In other words to control the execution result of any control flow element we can use this property.

ForceExecutionValue: True or False

ForcedExecutionType: <Datatype>

ForcedExecutionValue: <Value>, we usually gives as 1 to make sute its true.

 

Q. How to improve the performance of a SSIS package?

Ans:

1- Utilize parallelism: It is easy to utilize parallelism in SSIS. All you need to do is to recognize which Data Flow Tasks (DFTs) could be started at the same time and set the control flow constraints of your package in the way that they all can run simultaneously.

 2- Synchronous vs. Asynchronous components: A synchronous transformation of SSIS takes a buffer, processes the buffer, and passes the result through without waiting for the next buffer to come in. On the other hand, an asynchronous transformation needs to process all its input data to be able to give out any output. This can cause serious performance issues when the size of the input data to the asynchronies transformation is too big to fit into memory and needs to be transferred to HDD at multiple stages.

 3- Execution tree: An execution tree starts where a buffer starts and ends where the same buffer ends. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread. When a new buffer is created such as when a partially blocking or blocking transformation is added to the pipeline, additional memory is required to handle the data transformation; however, it is important to note that each new tree may also give you an additional worker thread.

 4-OLE DB Command transformation: OLE DB Command is a row-by-row transformation, meaning that it runs the command in it on each one of its input rows. This make sit to be damn too slow when the number of the rows goes up. The solution for boosting performance is to stage data into a temporary table and use Execute SQL Task outside that DFT.

 5-SQL Server Destination vs. OLE DB Destination: There is multiple reason why to use OLE DB Destination and not use SQL Server Destination:

  • OLE DB Destination is mostly faster,
  • OLE DB Destination is a lot clearer when it fails (The error message is more helpful),
  • SQL Server Destination works only when SSIS is installed on the destination server.

6- Change Data Capture (CDC): Try to reduce the amount of data to be transferred to the maximum level you can, and do it as close to the source as you can. A Modified On column on the source table(s) helps a lot in this case.

 7- Slowly Changing Dimension (SCD) transformation: There is only one advice about SSIS’s Slowly Changing Dimension transformation, and that is get rid of it! The reasons are:

  • It doesn’t use any cached data, and goes to the data source every single time it is called,
  • It uses many OLE DB Command transformations,
  • Fast Data Load is off by default on its OLE DB Destination.

 8. Choose the best way in designing Data flow between SQL and SSIS: Remember SSIS is good at Row by Row operations where AS SQL is not. So depends on the situation design data flow using DFT components instead of executing a query using “Execute SQL Task”.

 9. Use queries for selecting data rather than selecting a table and checking off the columns you want. This will reduce the initial record set before SSIS gets it rather than ignoring the fields

 10. Carefully deal with your connections. By default, your connection manager will connect to the database as many times as it wants to. You can set the RetainSameConnection property so it will only connect once. This can allow you to manage transactions using an ExecuteSQL task and BEGIN TRAN / COMMIT TRAN statements avoiding the overhead of DTC.

 11. While running the package with in BIDS ensure you set the package to run in optimized mode.

 12. While loading data into destination tables it’s helpful to use the “Fast Load option”.

13. Wherever possible Consider aggregating and (un)pivotting in SQL Server instead doing it in SSIS package – SQL Server outperforms Integration Services in these tasks;

 14. Avoid manipulating large datasets using T-SQL statements. All T-SQL statements cause changed data to write out to the transaction log even if you use Simple Recovery Model.

 15. For large datasets, do data sorts at the source if possible.

 16. Use the SQL Server Destination if you know your package is going to run on the destination server, since it offers roughly 15% performance increase over OLE DB because it shares memory with SQL Server.

 17. Increase the network packet size to 32767 on your database connection managers. This allows large volumes of data to move faster from the source servers.

 18. If using Lookup transforms, experiment with cache sizes – between using a Cache connection or Full Cache mode for smaller lookup datasets, and Partial / No Cache for larger datasets. This can free up much needed RAM.

 19. Make sure “Lock Options” is using while loading very large datasets as bulk insert happens when it satisfies the below conditions.

a) Destination table is empty

b) Destination database recovery model is either simple or bulk insert

c) When table lock option specified

20. Experiment with the DefaultBufferSize and DefaulBufferMaxRows properties. You’ll need to monitor your package’s “Buffers Spooled” performance counter using Perfmon.exe, and adjust the buffer sizes upwards until you see buffers being spooled (paged to disk), then back off a little.

 21. Do all setbased, aggregations and sort operations at source or destination using T-SQL.

 22. If possible always use “NOLOCK” at source and “LOCK” at destination.

 23. While loading to data warehouses try to disable the indexes while loading.

Q. Can you explain the settings “Rows Per Batch” and “Maximum Insert Commit Size”?

Ans:

These options are available at “OLEDB destination” in DFT.

Rows per batch – The default value for this setting is -1 which specifies all incoming rows will be treated as a single batch. You can change this default behaviour and break all incoming rows into multiple batches. The allowed value is only positive integer which specifies the maximum number of rows in a batch.

Maximum insert commit size – The default value for this setting is ‘2147483647’ (largest value for 4 byte integer type) which specifies all incoming rows will be committed once on successful completion. You can specify a positive value for this setting to indicate that commit will be done for those number of records. You might be wondering, changing the default value for this setting will put overhead on the dataflow engine to commit several times. Yes that is true, but at the same time it will release the pressure on the transaction log and tempdb to grow tremendously specifically during high volume data transfers.

 

Q. Can you explain the DFT properties “DefaultBufferMaxRows” and “DefaultBufferMaxSize”?

Ans:

The data flow task in SSIS (SQL Server Integration Services) sends data in series of buffers. How much data does one buffer hold? This is bounded by DefaultBufferMaxRows and DefaultBufferMaxSize, two Data Flow properties. They have default values of 10,000 and 10,485,760 (10 MB), respectively. That means, one buffer will contain either 10,000 rows or 10 MB of data, whichever is less.

You can adjust these two properties based on your scenario. Setting them to a higher value can boost performance, but only as long as all buffers fit in memory. In other words, no swapping please!

 

Q. How can we connect to Oracle, DB2 and MySQL from SSIS?

Ans:

Oracle:

Native OLEDB\Microsoft OLEDB Provider for Oracle

Native .Net providers\ or

.Net providers for OLEDB\

 

MySQL:

.Net Providers \ MySQL Data Provider Or

.Net Providers \ ODBC

 

DB2:

Native OLEDB\Microsoft OLEDB Provider for DB2

Native .Net providers\ ,

.Net providers\ ODBC OR

.Net providers for OLEDB\

 

Q. Can’t we do FastLoad using “ADODotNet Destination”?

Ans:

Yes, there is an option called “Use Bulk insert when possible” that needs to be tick at the time of mapping.

 

Q. How to check whether SSIS transformations are using memory or spilling to Disk due to huge loads and asynchronous transformations?

Ans:

A great way to check if your packages are staying within memory is to review the SSIS performance counter Buffers spooled, which has an initial value of 0; above 0 is an indication that the engine has started swapping to disk.

 

Q. How to find how much of total memory allocated to SSIS and SQL Server?

Ans:

Below are the performance counters which can help us in finding memory details.

Process / Private Bytes (DTEXEC.exe): The amount of memory currently in use by Integration Services.

Process / Working Set (DTEXEC.exe): The total amount of allocated memory by Integration Services.

SQL Server: Memory Manager / Total Server Memory: The total amount of memory allocated by SQL Server. Because SQL Server has another way to allocate memory using the AWE API, this counter is the best indicator of total memory used by SQL Server..

Memory / Page Reads / sec: Represents to total memory pressure on the system. If this consistently goes above 500, the system is under memory pressure.

 

Q. See there is a scenario: We have a package which has to be open using BIDS / SSDT and has to be modified different elements. But from the location where the SSIS has to be open and modified is not having permissions to access the databases hence all connection managers and other location constraints will fail in validation phase and it takes lot of time to validate all of these connections. Do you have any idea how to control this validation phase?

Ans:

Below are the different methods to switch off the package validation.

Work OffLine: There is a option called Work Offline. It doesn’t try to locate/validate packages. Once the package is ready then we have to uncheck the option Work Offline from SSI menu.

Delay Validation: Set the values to “True” to skip the validation while opening the package. It only applies for executables / control flow elements including package.

ValidateExternalMetadata: Property is set to be “True” for disabling the validation for dataflow components.

Q. SSIS performance matters:

For more details on SSIS performance tuning you can check below references.

http://consultingblogs.emc.com/jamiethomson/archive/2007/12/18/SSIS_3A00_-A-performance-tuning-success-story.aspx
http://www.mssqltips.com/sqlservertip/1867/sql-server-integration-services-ssis-performance-best-practices/
http://stackoverflow.com/questions/2678119/is-there-a-reason-why-ssis-significantly-slows-down-after-a-few-minutes
http://stackoverflow.com/questions/1093021/tracking-down-data-load-performance-issues-in-ssis-package

References:

For more MSBI stuff please have a look at below references:

http://blogs.msdn.com/b/business-intelligence

https://sreenivasmsbi.wordpress.com

http://www.msbiguide.com

http://msbiravindranathreddy.blogspot.in

http://sqlschool.com/MSBI-Interview-Questions.html

https://www.katieandemil.com

http://www.venkateswarlu.co.in

http://www.sqlserverquest.com

http://www.msbiguide.com

The post SSIS Interview Questions and Answers Part 3 appeared first on udayarumilli.com.

SSIS Interview Questions and Answers Part 4

$
0
0

SSIS_InterviewQuestions_4

SSIS – Part 4

SSIS Interview Questions and Answers for Experienced and Freshers

Here we are publishing series of posts on SSIS Interview questions and answers Part 4 for experienced and freshers. Below is the series 4.

Q. Difference between Unionall and Merge Join?

Ans:

  • Merge transformation can accept only two inputs whereas Union all can take more than two inputs
  • Data has to be sorted before Merge Transformation whereas Union all doesn’t have any condition like that.

Q. What is difference between Multicast and Conditional Split?

Ans:

The Multicast transformation distributes its input to one or more outputs. This transformation is similar to the Conditional Split transformation. Both transformations direct an input to multiple outputs. The difference between the two is that the Multicast transformation directs every row to every output, and the Conditional Split directs a row to a single output

Q. What is the difference between DTS and SSIS?

Ans:

Well, nothing except both the Microsoft SQL Server Products.

Even though both are the ETL tools, we can differentiate if you are asked observations.

S.no DTS SSIS
1 Data Transformation Services Sql Server Integration Services
2 Using Activex Script Using Scripting Language
3 No Deployment wizard Deployment wizard
4 Limited Set of Transformation available Huge of Transformations available
5 Not Supporting BI Functionality Completely supporting end to end process of BI
6 Single Tasks at a time Multi Tasks run parallel
7 It is Un managed script Managed by CLR
8 DTS can develop thru Enterprise manager SSIS can thru Business Intelligence Development Studio (BIDS, nothing but new version of VS IDE)
9 We can deploy only at local server It can be deployed using multiple server using BIDS
10 Designer contains Single Pane SSIS designer contains 4 design panes:
  a) Control Flow
  b) Data Flow
  c) Event Handlers &
  d) Package Explorer.
11 No Event Hander Event Handler Available
12 No Solution Explorer Solution Explorer is available, with packages, connections and Data Source Views (DSV)
13 Connection and other values are static, not controlled at runtime. It can be controlled dynamically using configuration

Q. What is the difference between Fuzzy Lookup and Fuzzy Grouping?

Ans:

The Fuzzy Grouping task performs the same operations as the Fuzzy Lookup task but instead of evaluating input records against an outside reference table, the input set becomes the reference. Input records are therefore evaluated against other records in the input set and evaluated for similarity and assigned to a group.

Q. What’s the difference between Control Flow and Data Flow?

Ans:

Control Flow:

  • Process Oriented
  • Doesn’t manage or pass data between components.
  • It functions as a task coordinator
  • In control flow tasks requires completion (Success.,failure or completion)
  • Synchronous in nature, this means, task requires completion before moving to next task. If the tasks are not connected with each other but still they are synchronous in nature.
  • Tasks can be executed both parallel and serially
  • Three types of control flow elements in SSIS 2005
  • Containers:Provides structures in the packages
  • Tasks: Provides functionality in the packages
  • Precedence Constraints: Connects containers, executables and tasks into an ordered control flow.
  • It is possible to include nested containers as SSIS Architecture supports nesting of the containers. Control flow can include multiple levels of nested containers.

Data Flow

  • Streaming in nature
  • Information oriented
  • Passes data between other components
  • Transformations work together to manage and process data. This means first set of data from the source may be in the final destination step while at the same time other set of data is still flowing. All the transformations are doing work at the same time.
  • Three types of Data Flow components
  • Sources: Extracts data from the various sources (Database, Text Files etc)
  • Transformations: Cleans, modify, merge and summarizes the data
  • Destination: Loads data into destinations like database, files or in memory datasets

Q. What is difference between For Loop and For Each Loop?

Ans:

A for loop will execute the tasks a specified number of times, in other words 10 times, or 25 times, and the number of times is specified in the definition of the container. You can use a variable to specify what that count is.

A for each loop will execute once for each item in the collection of items that it is looking at. A good example would be if users are putting an Excel file into a directory for import into the DB. You cannot tell ahead of time how many will be in the directory, because a user might be late, or there might be more than one file from a given user. When you define the ForEach container, you would tell it to execute for each *.xls in the directory and it will then loop through, importing each one individually, regardless of how many files are actually there.

Q. What is the difference between “OLEDB command” transformation and “OLEDB” destination in dataflow?

Ans:

The OLE DB Command is a pretty simple transformation that’s available within a Data Flow that can run a SQL statement that can insert, update, or delete records to, in, or from a desired table.  It’s good to keep in mind that this transformation initiates a row-by-row operation, so you may experience some performance limitations when dealing with large amounts of data.

OLEDB destination can use Fast Load options hence perform bulk uploads.

Q. What is the Difference between merge and Merge Join Transformation?

Ans:

  • Merge Transformation:
  • The data from 2 input paths are merged into one
  • Work as UNION ALL
  • Metadata for all columns needs to be same
  • Use when merging of data from 2 data source
  • Merge Join Transformation:
  • The data from 2 inputs are merged based on some common key.
  • Work as JOIN (LEFT, RIGHT OR FULL)
  • Key columns metadata needs to be same.
  • Use when data from 2 tables having foreign key relationship needs to present based on common key

Q. What is the difference between “ActiveX Script” and “Script Task”?

Ans:

  • We could say “Script Task” is the latest version for the deprecated feature “ActiveX Script”. Both are used to implement extended functionality in SSIS.
  • ActiveX script supports VBScript and JScript where as “Script Task supports “VB.Net and C#.Net”.
  • “Script Task” is preferable as “ActiveX Script” has been removed in MSSQL 2012.
  • Script Task is supported with integrated help, IntelliSense, debugging and can reference external Dotnet assembles.

Q. What is the difference between “Script Task” and “Script Component”?

Ans:

  • Both are used to extend the native functionality of SSIS.
  • “Script Task” is to enhance the functionality for control flow where as “Script Component” is to enhance the functionality for Data flow.
  • “Script Task” can handle the execution of parts of the package where as “Script Component” can handle the data flow and transformations by processing row by row.

Q. What is the difference between “Execute SQL Task” and “Execute T-SQL statement” Task?

Ans:

  • The Execute T-SQL Statement task takes less memory, parse time, and CPU time than the Execute SQL task, but is not as flexible.
  • If you need to run parameterized queries, save the query results to variables, or use property expressions, you should use the Execute SQL task instead of the Execute T-SQL Statement task
  • Execute T-SQL Statement task supports only the Transact-SQL version of the SQL language
  • Execute SQL task supports many connection types but the Execute T-SQL Statement task supports only ADO.NET

Q. What is the difference between “Data Conversion” and “Derived Column” transformations?

Ans:

Data Conversion transformation is used o convert the datatype of a column. Same operation can be done using “Derived Column “transformation using typecast but derived column can also be used to add / create a new column by manipulating the existing column based on expressions.

We have to choose “Data Conversion” when the requirement is only intended to change the datatype. In other words “Data Conversion” is introduced just for developer convenience as it’s a direct method where as in “Derived Column” we have to use an expression to change the datatype of a column.

From 2008 in “Derived Column” transformation, datatype and length information is read only, when we create a new column or created from existing , data type would be assigned based on the expression outcome and the datatype is a read-only column.

To change the datatype we have to use “Data Conversion” transformation.

 

Q. What is the difference between “Copy Column” and “Derived Column”?

Ans:

Both transformations can add new columns.

Copy column can add new columns only through existing columns but coming to Derived column it can add new columns without any help from existing columns.

Derived Column can assign different transformations and data types to the new columns whereas Copy Column cannot.

Derived Column supports error output whereas Copy Column cannot.

Q. What is the difference between UNIONALL and MERGE transformations?

Ans:

The Merge transformation combines two sorted datasets into a single dataset. The rows from each dataset are inserted into the output based on values in their key columns.

The Merge transformation is similar to the Union All transformations. Use the Union All transformation instead of the Merge transformation in the following situations:

  • The transformation inputs are not sorted.
  • The combined output does not need to be sorted.
  • The transformation has more than two inputs.

 

Q. What is the difference between for loop and for each loop container?

Ans:

The “For Loop Container” executes specified number of times like 10 times, 20 times until the specified condition is met.

The “Foreach Loop Container” runs over an iterator. This iterator can be files from a folder, records from ADO, data from a variable etc.

Q. How to pass property value at Run time? How do you implement Package Configuration?

Ans:

A property value like connection string for a Connection Manager can be passed to the pkg using package configurations. Package Configuration provides different options like XML File, Environment Variables, SQL Server Table, Registry Value or Parent package variable.

Q. How would you deploy a SSIS Package on production?

Ans:

  • Using Deployment Manifest
  • Create deployment utility by setting its property as true.
  • It will be created in the bin folder of the solution as soon as package is build.
  • Copy all the files in the utility and use manifest file to deploy it on the Prod.
  • Using import/Export and scheduling a job

Q. What are the new features added in SQL Server 2008 SSIS?

Ans:

  • Improved Parallelism of Execution Trees
  • .NET language for Scripting
  • New ADO.NET Source and Destination Component
  • Improved Lookup Transformation
  • New Data Profiling Task
  • New Connections Project Wizard
  • DT_DBTIME2, DT_DBTIMESTAMP2, and DT_DBTIMESTAMPOFFSET data types

Improved Parallelism of Execution Trees: The biggest performance improvement in the SSIS 2008 is incorporation of parallelism in the processing of execution tree. In SSIS 2005, each execution tree used a single thread whereas in SSIS 2008, the Data flow engine is redesigned to utilize multiple threads and take advantage of dynamic scheduling to execute multiple components in parallel, including components within the same execution tree

.NET language for Scripting: SSIS 2008 is incorporated with new Visual Studio Tool for Application(VSTA) scripting engine. Advantage of VSTA is it enables user to use any .NET language for scripting.

New ADO.NET Source and Destination Component: SSIS 2008 gets a new Source and Destination Component for ADO.NET Record sets.

Improved Lookup Transformation: In SSIS 2008, the Lookup Transformation has faster cache loading and lookup operations. It has new caching options, including the ability for the reference dataset to use a cache file (.caw) accessed by the Cache Connection Manager. In addition same cache can be shared between multiple Lookup Transformations.

New Data Profiling Task: SSIS 2008 has a new debugging aid Data Profiling Task that can help user analyze the data flows occurring in the package. The Data Profiling Task can help users to discover the coerce of these errors by giving better visibility into the data flow.

New Connections Project Wizard: One of the main usability enhancements to SSIS 2008 is the new Connections Project Wizard. The Connections Project Wizard guides user through the steps required to create source and destinations.

DT_DBTIME2, DT_DBTIMESTAMP2, and DT_DBTIMESTAMPOFFSET data types – facilitate data type mapping to equivalent T-SQL date/time data types introduced in SQL Server 2008. Their primary purpose is to provide support for more accurate time measurements.

Q. What are Synchronies and Asynchronous transformations in SSIS?

Ans:

Synchronizes Transformations:

A synchronous transformation processes incoming rows and passes them on in the data flow one row at a time. Output is synchronous with input, it occurs at the same time. Therefore, to process a given row, the transformation does not need information about other rows in the data set. When a transform can modify the row in place so as to not change the physical layout of the result set, it is said to be a synchronous transformation. The output of a synchronous component uses the same buffer as the input and does not require data to be copied to a new buffer to complete the transformation. Reuse of the input buffer is possible because the output of a synchronous component usually contains the same number of records as the input;

An example of a synchronous transformation is the Data Conversion transformation. For each incoming row, it converts the value in the specified column and sends the row on its way. Each discrete conversion operation is independent of all the other rows in the data set.

Asynchronous Transformations:

The output buffer or output rows are not in sync with the input buffer; output rows use a new buffer. In these situations it’s not possible to reuse the input buffer because an asynchronous component can have more, the same or less output records than input records.

  • The component has to acquire multiple buffers of data before it can perform its processing. An example is the Sort transformation, where the component has to process the complete set of rows in a single operation.
  • The component has to combine rows from multiple inputs. An example is the Merge transformation, where the component has to examine multiple rows from each input and then merge them in sorted order.
  • There is no one-to-one correspondence between input rows and output rows. An example is the Aggregate transformation, where the component has to add a row to the output to hold the computed aggregate values.

Asynchronous components can further be divided into the two types described below:

  • Partially Blocking Transformation – the output set may differ in terms of quantity from the input set. Thus new buffers need to be created to accommodate the newly created set.
  • Blocking Transformation – a transformation that must hold one or more buffers while it waits on one or more buffers, before it can pass that buffer down the pipeline. All input records must read and processed before creating any output records. For example, a sort transformation must see all rows before sorting and block any data buffers from being passed down the pipeline until the output is generated.

Note:

Synchronous components reuse buffers and therefore are generally faster than asynchronous components

Q. Any Idea About execution tree?

Ans:

At run time, the data flow engine breaks down Data Flow task operations into execution trees. These execution trees specify how buffers and threads are allocated in the package. Each tree creates a new buffer and may execute on a different thread.

Execution trees are enormously valuable in understanding buffer usage. They can be displayed for packages by turning on package logging for the Data Flow task

Q. Where are SSIS package stored in the SQL Server?

Ans:

  • SQL Server 2000: MSDB..sysdtspackages
  • SQL Server 2005: MSDB..sysdtspackages90
  • SQL Server 2008: MSDB..sysssispackages

 

Stores the actual content and the following tables do the supporting roles.

  • Sysdtscategories
  • sysdtslog90
  • sysdtspackagefolders90
  • sysdtspackagelog
  • sysdtssteplog
  • sysdtstasklog

2008:

  • sysssispackagefolders
  • sysssislog

Q. How to achieve parallelism in SSIS?

Ans:

Parallelism is achieved using MaxConcurrentExecutable property of the package. Its default is -1 and is calculated as number of processors + 2.

Q. Differences between dtexec.exe and dtexecui.exe

Ans:

Both dtexec.exe and dtexecui.exe execute SSIS packages in the same manner. The difference is that dtexecui provided a graphical user interface to construct the command line arguments for dtexec. The command string that is generated with dtexecui can be used as command line arguments to dtexec.

Q. Demonstrate or whiteboard how you would suggest using configuration files in packages.  Would you consider it a best practice to create a configuration file for each connection manager or one for the entire package?

Ans:

There should be a single configuration file for each connection manager in your packages that stores their connection string information.  So if you have 6 connection managers then you have 6 config files.  You can use the same config file across all your packages that use the same connections.

If you have a single config file that stores all your connection managers then all your packages must have contain the connection managers that are stored in that config file.  This means you may have to put connection managers in your package that you don’t even need.

Q. Demonstrate or whiteboard using a loop in a package so each file in a directory with the .txt extension is loaded into a table.  Before demonstrating this tell which task/container accomplishes this and which enumerator will be used. 

Ans:

This would require a Foreach Loop using the Foreach File Enumerator.  Inside the Foreach Loop Editor you need to set a variable to store the directory of the files that will be looped through.  Next select the connection manager used to load the files and add an expression to the connection string property that uses the variable created in the Foreach Loop.

Q. What techniques would you consider to add notification to your packages?  You’re required to send emails to essential staff members immediately after a package fails.

Ans:

This could either be set in the SQL Agent when the package runs or actually inside the package you could add a Send Mail Task in the Event Handlers to notify when a package fails.

There are also third party tools that can accomplish this for you (Pragmatic Works BI xPress).

Q. Have you used SSIS Framework?

Ans:

This is common term in SSIS world which just means that you have templates that are set up to perform routine tasks like logging, error handling etc. Yes answer would usually indicate experienced person, no answer is still fine if your project is not very mission critical.

Q. How many difference source and destinations have you used?

Ans:

It is very common to get all kinds of sources so the more the person worked with the better for you. Common ones are SQL Server, CSV/TXT, Flat Files, Excel, Access, Oracle, MySQL but also Salesforce, web data scrapping.

Q. What configuration options have you used?

Ans:

This is an important one. Configuration should always be dynamic and usually is done using XML and/or Environment Variable and SQL Table with all configurations.
Q. How do you apply business rules in SSIS (Transformations….Specific calculations but also cleansing)?

Ans:

Some people use SSIS only to extract data and then go with stored procedures only….they are usually missing the point of the power of SSIS. Which allow creating “a flow” and on each step applies certain rules this greatly simplifies the ETL process.

Q. Give example of handling data quality issues?

Ans:

Data Quality is almost always a problem and SSIS handles it very well. Examples include importing customers from different sources where customer name can be duplicates. For instance you can have as company name: SQL Server Business Intelligence but also SQL Server BI or SQL Server BI LTD or SQL Server BI Limited or inteligence (with one l). There are different ways to handle it. Robust and time consuming is to create a table with or possible scenarios and update it after each update. You can also use fuzzy grouping which is usually easy to implement and will make usually very good decisions but it is not 100% accurate so this approach has to be justified.

Other typical quality issues are nulls (missing values), outliers (dates like 2999 or types like 50000 instead of 5000 especially important if someone is adjusting the value to get bigger bonus), incorrect addresses and these are either corrected during ETL, ignored, re-directed for further manual updates or it fails the packages which for big processes is usually not practiced.

Q. When to use Stored Procedures?

Ans:

This one is very important but also tricky. ALL SSIS developers have SQL Server background and that is sometime not very good if they use SQL not SSIS approach.

Let’s start with when you typically use SPs. This is for preparing tables (truncate), audit tasks (usually part of SSIS framework), getting configuration values for loops and a few other general tasks.
During ETL extract you usually type simple SQL because it comes from other sources and usually over complication is not a good choice (make it dynamic) because any changes usually affect the package which has to be updated as well.

During Transformation phase (business rules, cleaning, core work) you should use Transformation tasks not Stored procedures! There are loads of tasks that make the package much easier to develop but also a very important reason is readability which is very important for other people who need to change the package and obviously it reduces risks of making errors. Performance is usually very good with SSIS as it is memory/flow based approach. So when to use Stored Procedures for transformations? If you don’t have strong SSIS developers or you have performance reasons to do it. In some cases SPs can be much faster (usually it only applies to very very large datasets). Most important is have reasons which approach is better for the situation.

Q. What is your approach for ETL with data warehouses (how many packages you developer during typical load etc)?

Ans:

This is rather generic question. A typical approach (for me) when building ETL is to. Have a package to extract data per source with extract specific transformations (lookups, business rules, cleaning) and loads data into staging table. Then a package do a simple merge from staging to data warehouse (Stored Procedure) or a package that takes data from staging and performs extra work before loading to data warehouse. I prefer the first one and due to this approach I occasionally consider having extract stage (as well as stage phase) which gives me more flexibility with transformation (per source) and makes it simpler to follow (not everything in one go). So to summarize you usually have package per source and one package per data warehouse table destination. There are might be other approach valid as well so ask for reasons.

Q. What is XMLify component?

Ans:

It is 3rd party free component used rather frequently to output errors into XML field which saves development time.
Q. What command line tools do you use with SSIS?

Ans:

dtutil (deployment), dtexec (execution), dtexecui (generation of execution code)

Q. What is data cleansing?

Ans:

Used mainly in databases, the term refers to identifying incomplete, incorrect, inaccurate, irrelevant, etc. parts of the data and then replacing, modifying, or deleting this dirty data.

Q. Any Idea what is ETI?

Ans:

Yes! ETI (Error Tolerant Index) is a technique used in Fuzzy Lookup / Fuzzy Grouping for data cleansing operation. The ETI is a decomposition of the field values contained within a reference table of values into smaller tokens is nothing but a match index.

For example, instead of searching for a street address that contains the value “112 Sunny Vail Ln.”, smaller components of the reference value might be used, such as “sunn”, “nyva”, and “112”.

These individual words are called Tokens, and all tokens in a index are divided using some special character and search with the reference table.

Q. What is Fuzzy Lookup? Can you demonstrate it?

Ans:

Fuzzy lookup transformation is data cleaning task that helps to clean the incoming data with the reference table with the actual value. This transformation tries to find the exact or similar value as a result. The result data set is also depends on the fuzzy matching configuration in the fuzzy lookup transformation task. Fuzzy lookup task will be more helpful when you have data typo issues in the source data.

Fuzzy Lookup transformation creates temporary objects, such as tables and indexes in the SQL Server TempDB. So, make sure that the SSIS user account has sufficient access to the database engine to create and maintain this temporary table. Fuzzy lookup transformation has 3 features.

  • Defining maximum number of matches to return to output – It starts with 1 and that is the recommended.
  • Token delimiters – It has a set of predefined delimiters and we can also add our’s
  • Similarity score – It is the fuzzy algorithm input to match the score with the input row and reference row. This value is between 0 and 1. higher the value is the accurate the result. It is usually 0.60 is the best value for similarity score.

Q. What shape would you use to concatenate two input fields into a single output field?

Ans:

Pivot transformation

Q. What is the Multicast Shape used for?

Ans:

The Multicast transformation distributes its input to one or more outputs. This transformation is similar to the Conditional Split transformation. Both transformations direct an input to multiple outputs. The difference between the two is that the Multicast transformation directs every row to every output, and the Conditional Split directs a row to a single output

Q. What types of things can I pass between packages in SSIS?

Ans:

We can pass Variables primarily between packages. Within a variable we can pass them as any type that is available. So if you were to create an object variable, although memory consuming, we could potentially pass a table that is in memory. Granted, in SQL Server 2012 (Denali) this is much, much easier now with parameters. Actually, this was almost a relief in a way. Configuring packages to consume parent variables was a time consuming and in some cases, confusing situation when many variables were in the process.

Q. How to accomplish incremental loads? (Load the destination table with new records and update the existing records from source (if any updated records are available)

Ans:

There are few methods available:

  • You can use Lookup Transformation where you compare source and destination data based on some id/code and get the new and updated records, and then use Conditoional Split to select the new and updated rows before loading the table. However, I don’t recommend this approach, especially when destination table is very huge and volume of delta is very high.
  • Use Execute SQL Task and with Staging table
  • Find the Maximum ID & Last ModifiedDate from destination and store in package variables. (Control Flow)
  • Pull the new and updated records from source and load to a staging table (A dataload table created in destination database) using above variables.(Data Flow)
  • Insert and Update the records using Execute SQL Task (Control Flow)
  • Use the feature CDC (Change Data Capture) from SQL Server 2008
  • Use Conditional split to split data for Inserts. Updates and Deletes
  • For inserts redirect to a OLEDB Destination
  • For Updates and Deletes redirect using a OLEDB Command transformation

Q. How can you enable the CDC for a table?

Ans:

To enable CDC to a table first the feature should be enabled to the corresponding database. Both can be done using the below procs.

exec sys.sp_cdc_enable_db_change_data_capture

sys.sp_cdc_enable_table_change_data_capture

Q. How can you debug Dataflow?

Ans:

Microsoft Integration Services and the SSIS Designer include features and tools that you can use to troubleshoot the data flows in an Integration Services package.

  • SSIS Designer provides data viewers.
  • SSIS Designer and Integration Services transformations provide row counts.
  • SSIS Designer provides progress reporting at run time.
  • Redirect to specified points using error output

Q. How to debug control flow?

Ans:

  • Integration Services supports breakpoints on containers and tasks.
  • SSIS Designer provides progress reporting at run time.
  • Business Intelligence Development Studio provides debug windows.

Q. What can you tell me about Ralph Kimball?

Ans:

Ralph Kimball is an author on the topic of data warehousing and BI.  He has been regarded as one of the original architects of data warehousing.  Kimball has always had the firm belief that data warehouses should fast and understandable.  Oh, and he developed this whole methodology of dimensional modeling.  There is that.  (It’s also probably a good idea to know the basic idea and structure of dimensional modeling)

Q. Are you familiar with Package Configurations?

Ans:
Yes. Recently I was working on a project where we used the SQL Server Table package configuration to store values for the package parameters. That allowed me to build a GUI for the users to update the package variables each month with new values.
Q. Have you ever used the XML package configuration?

Ans:

Yes. In fact, that is the method we use for storing the connection string used by the sql server table package configuration for the project I just mentioned. We have a dev/production environment, so using an xml file with the connection string (and pointing to that XML file from an environment variable) makes it easy to switch between the two servers.

References:

For more MSBI stuff please have a look at below references:

http://blogs.msdn.com/b/business-intelligence

https://sreenivasmsbi.wordpress.com

http://www.msbiguide.com

http://msbiravindranathreddy.blogspot.in

http://sqlschool.com/MSBI-Interview-Questions.html

https://www.katieandemil.com

http://www.venkateswarlu.co.in

http://www.sqlserverquest.com

http://www.msbiguide.com

The post SSIS Interview Questions and Answers Part 4 appeared first on udayarumilli.com.

SSIS Interview Questions and Answers Part 5

$
0
0

udayarumilli_SSIS_PART5

SSIS – Part 5

SSIS Interview Questions and Answers for Experienced and Fresher’s

SSIS Interview Questions and Answers  Part 5

Here we are publishing series of posts on SSIS Interview questions and answers Part 5 for experienced and fresher’s. Below is the series 5.

Q. What are the SSIS package protection levels?

Ans:

There are 6 different types of protection levels.

  • Do not save sensitive – (When exporting using DTUTIL specify for protection- 0)
  • Encrypt sensitive with user key – 1
  • Encrypt sensitive with password – 2
  • Encrypt all with password -3
  • Encrypt all with user key – 4
  • Rely on server storage

Do not save sensitive: makes the sensitive data unavailable to other users. If a different user opens the package, the sensitive information is replaced with blanks and the user must provide the sensitive information.

Encrypt sensitive with user key: Uses a key that is based on the current user profile to encrypt only the values of sensitive properties in the package. Only the same user who uses the same profile can load the package. If a different user opens the package, the sensitive information is replaced with blanks and the current user must provide new values for the sensitive data. If the user attempts to execute the package, package execution fails.

Encrypt sensitive with password: Uses a password to encrypt only the values of sensitive properties in the package. To open the package in SSIS Designer, the user must provide the package password. If the password is not provided, the package opens without the sensitive data and the current user must provide new values for sensitive data. If the user tries to execute the package without providing the password, package execution fails.

Encrypt all with password: Uses a password to encrypt the whole package. The user must provide the package password. Without the password the user cannot access or run the package.

Encrypt all with user key: Uses a key that is based on the current user profile to encrypt the whole package. Only the user who created or exported the package can open the package in SSIS Designer or run the package by using the dtexec command prompt utility

Rely on server storage: Protects the whole package using SQL Server database roles. This option is supported only when a package is saved to the SQL Server msdb database.

When it is time to deploy the packages, you have to change the protection level to one that does not depend on the developer’s user key. Therefore you typically have to select EncryptSensitiveWithPassword, or EncryptAllWithPassword. Encrypt the packages by assigning a temporary strong password that is also known to the operations team in the production environment.

Q. What are the phases of execution of a package when running with DTEXEC?

Ans:

  • Command sourcing phase: The command prompt reads the list of options and arguments
  • Package load phase: The package specified by the /SQL, /FILE, or /DTS option is loaded.
  • Configuration phase: Options are processed in this order:
    • Options that set package flags, variables, and properties.
    • Options that verify the package version and build.
    • Options that configure the run-time behavior of the utility, such as reporting.
  • Validation and execution phase: The package is run, or validated without running if the /VALIDATE option is specified.

Q. What are the exit codes from DTEXEC?

Ans:

  • 0: The package executed successfully.
  • 1: The package failed.
  • 3: The package was canceled by the user.
  • 4: The utility was unable to locate the requested package.
  • 5: The utility was unable to load the requested package.
  • 6: The utility encountered an internal error of syntactic or semantic errors in the command line.

Q. Can you demonstrate the DTEXEC?

Ans:

Execute a package located on file system:

DECLARE @returncode int

EXEC @returncode = xp_cmdshell ‘dtexec /f “C:\UpsertData.dtsx”‘

To execute an SSIS package saved to SQL Server using Windows Authentication:

dtexec /sq pkgOne /ser productionServer

 

To execute an SSIS package saved to the File System folder in the SSIS Package Store:

dtexec /dts “\File System\MyPackage”

To validate a package that uses Windows Authentication and is saved in SQL Server without executing the package:

dtexec /sq pkgOne /ser productionServer /va

To execute an SSIS package that is saved in the file system, and specify logging options:

dtexec /f “c:\pkgOne.dtsx” /l “DTS.LogProviderTextFile;c:\log.txt”

To execute an SSIS package that is saved in the file system and configured externally:

dtexec /f “c:\pkgOne.dtsx” /conf “c:\pkgOneConfig.cfg

Q. Process to upgrade DTS TO SSIS?

Ans:

1.       Choosing a DTS to SSIS Migration Strategy (Reactive/Proactive)

2.       Capturing SSUA DTS Package Alerts (all categories of notifications)

3.       Building a dev/test environment

4.       Migrating the packages using the selected DTS to SSIS Migration Strategy

5.       Testing/Correcting the resulting SSIS 2008 Packages in the dev/test environment

6.       Deploying and reconfirming the resulting SSIS 2008 Packages work in production as expected

7.       Removing the old DTS Packages from production w/optional SQL Server Agent Jobs

Q. Does all components are converted automatically from DTS TO SSIS?

Ans:

Not all components can be upgraded. ActiveX transforms, for instance, present a challenge for the upgrade wizard, and may not be able to be migrated.

  • Delete and recreate ODBC connections after package migration
  • Reconfigure transaction settings after package migration
  • Replace functionality of ActiveX script attached to package steps after package migration. Use Script task
  • After migration, convert the Execute DTS 2000 Task that encapsulates the Analysis Services task to an Integration Services Analysis Services Processing task.
  • After migration, re-create the functionality of the Dynamic Properties task by using Integration Services features such as variables, property expressions, and package configurations.

Q. Why is the need for data conversion transformations?

Ans:

This transformation converts the datatype of input columns to different datatype and then route the data to output columns. This transformation can be used to:

  • Change the datatype
  • If datatype is string then for setting the column length
  • If datatype is numeric then for setting decimal precision.

This data conversion transformation is very useful where you want to merge the data from different source into one. This transformation can remove the abnormality of the data. Example à The Company’s offices are located at different part of world. Each office has separate attendance tracking system in place. Some offices stores data in Access database, some in Oracle and some in SQL Server. Now you want to take data from all the offices and merged into one system. Since the datatypes in all these databases vary, it would be difficult to perform merge directly. Using this transformation, we can normalize them into single datatype and perform merg

Q. Explain why variables called the most powerful component of SSIS.

Ans:

Variable allows us to dynamically control the package at runtime. Example: You have some custom code or script that determines the query parameter’s value. Now, we cannot have fixed value for query parameter. In such scenarios, we can use variables and refer the variable to query parameter. We can use variables for like:

  • Updating the properties at runtime,
  • Populating the query parameter value at runtime,
  • Used in script task,
  • Error handling logic
  • With various looping logic.

 

Q. What are the for each loop enumerators available in SSIS?

Ans:

  • Below are the lists of various types of enumerators provided by SSIS Foreach Loop Container:
  • Foreach File Enumerator: It enumerates files in a folder. The plus point here is it can traverse through subfolders also.
  • Foreach Item Enumerator: It enumerates items in a collection. Like enumerating rows and columns in an Excel sheet.
  • Foreach ADO Enumerator: Useful for enumerating rows in tables.
  • Foreach ADO.NET Schema Rowset Enumerator: To enumerate through schema information about a data source. For example, to get list of tables in a database.
  • Foreach From Variable Enumerator: Used to enumerate through the object contained in a variable. (if the object is enumerable)
  • Foreach NodeList Enumerator: Used to enumerate the result set of an XML Path Language (XPath) expression.
  • Foreach SMO Enumerator: It enumerates through SQL Server Management Objects (SMO) objects.

Q. We have a situation that needs to be push data into DB2 database from SQL Server. What connection manager you use to connect to DB2 running on AS/400?

Ans:

Primary method to connect to DB2 is “Microsoft OLE DB Provider for DB2”. There is one more method using ADO.NET data providers \ ODBC Data provider.

OLEDB is always faster than ODBC, but there might be issues with OLEDB to DB2 while dealing with parameters in queries.

Q. What is “ActiveX Script” task? Does it available in SQL Server 2012?

Ans:

  • The ActiveX Script task provides a way to continue to use custom code that was developed using ActiveX script. ActiveX script task supports writing scripts using VBScript and Jscript and other languages installed in the local computer.
  • This task is to just support’s the backward compatibility with the deprecated component DTS packages
  • Now in SQL Server 2012 the “ActiveX Script” has to be upgraded to “Script Task”.
  • “Script task” supports VB.Net and C#.Net

Q. What is the use of either “Script task” or “ActiveX Script”?

Ans:

  • Implementing customized business logics in SSIS packages. Example using the script task we can access table values, applies logic and those values can be added to SSIS variables.
  • Performing complex computations for example modifying date formats using date functions
  • To access data from sources for which no support from built-in connections, for example a script can use Active Directory Service Interface (ADSI) to access usernames from AD.
  • To create a package specific performance counters for example a script can create a performance counter that can be updated when a complex task or poorly performed task executes.

Q. What is “Script Component”?

Ans:

Script component is like a “Script Task” but it is designed for “Data Flow”. It can be useful in below scenarios.

  • Apply multiple transformations to data instead of using multiple transformations in the data flow. For example, a script can add the values in two columns and then calculate the average of the sum.
  • Use custom formulas and functions for example, validate passport numbers.
  • Validate incoming column data and skip unmatched records
  • Script Component support for inputs and outputs
  • If used as a source: Supports multiple outputs
  • If used as a transformation: Supports one input and multiple outputs
  • If used as a destination: Supports one input

Q. Can we call a web service from SSIS? If Yes how?

Ans:

Yes! We can call a web service using “Web Service” task. We have to provide HTTP connection manager and WebServiceDescriptionLanguage (WSDL) file. The output can be stored either in a variable or on file system (In XML, TXT etc)

Q. What is the use of derived column in SSIS?

Ans:

Derived column transformation can process existing column data and apply some functionality.

For example to change the case of a string for a column: can replace the actual column by applying the expression UPPER(COLUMN) or LOWER(COLUMN).

Can also useful when need to calculate sum values example: Add a new column “Gross Value” by applying the expression (Column1+Column2)

Applying arithmetic operations like “Round” and calulating date and time differences etc.

In addition with this, it can deal with “NULL” values. When NULL values needs to be populated with blanks

If incase we can’t perform any of these kind of operations with dirived column we have another option called “Script Transform”

Q. Which config file we should use for storing SSIS package configurations?

Ans:

There are different ways to do this. But it’s all depends on requirement and environment. I couldn’t see any problem which is resolved in one config type and can’t be resolved in other config option.

  • If you are using a file system deployment, it probably makes more sense to use XML configuration files.
  • If you are using a SQL Server deployment, it probably makes more sense to use SQL Server configurations.
  • If your ETL solution is managed by the application owner or server administrator, it probably makes more sense to use XML configuration files.
  • If your ETL solution is managed by the database administrator, it probably makes more sense to use SQL Server configurations.
  • If project team members and/or administrators have past experience and success with a given configuration type, it probably makes sense to use that type unless there is some compelling project-specific reason to do otherwise.

Q. What are the possible issues in handling SSIS packages?

Ans:

Mostly Data Conversion errors due to datatype mismatches – Truncation of strings, loosing some decimal points etc

Expression Evolution errors on run time: Unable to evaluate expressions at run time due to wrong comparisions etc

Package Validation Errors: When we configure a variable to locate a file on file system which is actually creates on run time the package files debugging as initially the file is not located at the specified path. To avoid these issues set the property “DelayValidation” to “True”

Package Configuration Issues: Always make sure that we are using the right package at the right environment. It always depends on package configuration. Package will be used on dev,test and prod environments with different config values. If wrong config values are passed to a SSIS package which may leads to loss of data or data corruption.

To avoid these issues two things we have to consider

1. Use a centralized database to store all SSIS package config values

2. Use different account (either domain or sql) for different environments

3. Tight the security by assigning only required permissions to the SSIS user accounts.

So that even though a dev package runs with the prod credentials it fails to connect to the instance.

When a Variable is using another Variable:

See we usually give a variable as a source for “Execute SQL Task”. But for the variable value is setting by evaluating an expression which is using another variable.

For example we have created a variable called “FileName” and it’s been using in Execute SQL Task. But a filename should be evaluated as “B2B_Reports_”+User:BatchID. Here BatchID is another variable.

By default it fails to evaluate this expression and to fix this we have to change the variable property “EvaluateAsExpression” to “True”

Running SSIS packages on 64 bit and dealing with Excel files:

Typically excel files are not provided with 64 bit drivers (Excel 2010 has 64 bit but not before).

So to deal with excel files from SSIS which is running on 64 bit is bit difficult task.

There is an option in SSIS which allows SSIS package to support 32 bit execution on 64 bit environment.

From project properties on debugging page an option called “Run64BitRunTime”. By default it’s set to be true for SSIS running on 64 bit. We have to modify this to false to handle with 32-bit support activities. Below are more reasons to use this option

From SSIS where it’s running on 64 bit:

We can’t call a ExecuteDTS package task as it doesn’t support 64 bit

It may raise errors while using Script Task or Script Component. Might be using Dotnet assembles or COM objects for which there might be no 64 bit support available or drivers are not installed.

Case Sensitive issues:

One of the popular data transformations is a Lookup. It compares column values from two tables. Unlike T-SQL SSIS is a case sensitive comparison so we have to be careful in handling with these transformations and tasks.

Q. What are event handlers in SSIS?

Ans:

Event handlers allow MSBI developers to monitor and audit SSIS packages. Event handlers can be associated with SSIS components and executables. Any component that can be added to the control flow is called as “Executable” plus the package itself. Child components is considered to be child executable and parent is known as parent executable.

Q. What are the different types of event handlers?

Ans:

  • OnPreValidate
  • OnPostValidate
  • OnProgress
  • OnPreExecute
  • OnPostExecute
  • OnError
  • OnWarning
  • OnInformation
  • OnQueryCancel
  • OnTaskFailed
  • OnVariableValueChanged
  • OnExecStatusChanged

Q. What are the general cases that event handlers can be helpful in?

Ans:

Cleanup stage tables after a bulk load completed

Send an email when a specific component failed

Load lookup tables after a task completed

Retrieve system / resource information before starting a task.

Q. How to implement event handlers in SSIS?

Ans:

Create log tables (As per the requirement) on centralized logging database

On BIDS / SSDT add event handler

Add control flow elements. Most of the times “Execute SQL Task”

Store the required information (RowCounts – Messages – Time durations – System / resource information).

We can use expressions and variables to capture this information.

Q. What is container hierarchy in attaching event handlers?

Ans:

Container hierarchy plays a vital role in implementing “Event Handlers” in SSIS. If an event handler is attached to a “Package” (a package itself it’s a container), then, the event handler applies to all associated components of that package. We need not attach the event handlers to all of them separately. But if we want to switch off the event handlers to any of the specific component in a container simply change the property “Disable EventHandlers” to “TRUE”.

Q. How to implement SCD (Slowly changing dimension) type 2 using SSIS?

Ans:

Type 2 means we have to keep historical data.

Assume that we have a table called “Employee_Stage” at Stage Server and “Employee_Archieved” at Archive Server.

Now we have to read data from stage and insert into archive instance.

We have to implement SCD type 2, means we have to keep the changed records information. For example for an employee a column “Designation” has been changed then a new row has to be inserted into archive.

While inserting there are three columns that helps us in identifying the old and current records for a given employee.

StartDate – startdate for the given designation

EndDate – enddate for the given designation

IsCurrent – Bit column : 1 – Current ; 0 – History

Let’s start designing SSIS package:

As usual create a SSIS project

  1. Create two connection managers. 1. Staging, 2 – Archive
  2. Drag and drop a dataflow task at control flow
  3. Open data flow task and add a OLEDB source and map it with stage connection manager
  4. Drag and drop “SCD transformation” to data flow task.
  5. Double click and configure SCD as below.
  6. Map “Archive” connection manager and choose the “Business Key” in the archive table.

Business key is nothing but a column which can be used to compare / lookup with stage table. Here I have given “EmpID” as a BusinessKey.

We have to mention “Change Type” for SCD columns.

There are three change types available as below

‘Fixed attribute’, ‘Changing attribute’ and ‘Historical Attribute’.

I do choose “Historical Attribute” for the column Designation. Based on this a new record will be inserted into archive if the column value is changed in stage table.

  1. Now give the historical attribute options. There are two options available to identify current and historical records. Based on a single column, “Based on two date values”
  2. Here I choose the first option and give “1” to current and “0” to expiration

Don’t select “Inferred member support” as this is not useful in this scenario.

  1. Click finish, it’ll automatically creates some transformations and destination that includes “derived Column” to add flag values to “IsCurrent” column, OLEDB Command to update the “IsCurrent” column and OLEDB destination to insert new records into archive table.

Note 1: To implement SCD type – 1 (Means overwrite the values) have to follow the same steps above. Instead of choosing “Hierarchical Attribute” choose “Changing Attribute”.

Note 2: “Fixed Attribute” can be useful at situations where to apply a domain rule for example the column “NationalNumber” has to be fixed. If the column is forced to overwritten then there would be an error or it would be redirected but it never allow to be changed.

Q. Can we use a temp table is data flow that is created in control flow?

Ans:

Yes we can use.

Assume we are executing a stored procedure from “Execute SQL Task”. That stored procedure creates a global temp table on database and the same temp table has to be used in dataflow while creating OLEDB source, we can give a query like “SELECT * FROM ##TempTable”.

To use a temp table in SSIS from the same connection some of the properties has to be set as below.

From the properties of OLEDB connection manager change the value to “TRUE” for the property “RetainSameConnection”.

For OLEDB source in dataflow make sure the property “ValidateExternalMetadata” to “False” as it fails to locate the temp table at complaining phase.

http://stackoverflow.com/questions/5631010/how-to-create-a-temporary-table-in-ssis-control-flow-task-and-then-use-it-in-dat

Q. Have you ever create templates in SSIS?

Ans:

Yes! I have created templates for SSIS new package designs.

See in environments where SSIS packages are being utilized often, creating templates are very usefull and saves the development time.

To create a SSIS package template, create a SSIS package with all default required, environment settings, connection managers and essential dataflows and save the package to disk.

Copy the package file (.dtsx file) to the location

2012:

C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\PrivateAssemblies\ProjectItems\DataTransformationProject\DataTransformationItems

Might be different based on OS / 32-bit / 64 bit / SQL Server Version etc.

Once the package is copied then create a new SSIS project, right click on project name → Add Item → from there you can see the template. Select and add it to the project.

Q. Which is the best method for storing package configurations?

Ans:

Storing package configurations is depends on requirement and operations team. Mean it’s ways depends on type of config values would be storing and the team which controls the configuration.

There are two famous methods, XML and SQL Server. I do suggest “SQL Server”. Because at the first we have to consider the aspect “Security”, “SQL Server” is the best place to store package

configurations from the security prospective.

Best Approach:

1. Store all package configurations in SQL Server

2. Store SQL Server (Where config table exists) into a XML file

3. Stored XML file location in an environment variable

So that for every deployment, we just need to change the table values and environment variable value.

See for example we are using the same package for all development, test and stage server every time we need to execute that package we need not have different packages instead we just need different configuration files by pointing the proper config file using XML and then choosing proper XML by using environment variable.

 

Q. Can we validate data in SSIS package?

Ans:

Yes we can validate data in SSIS using Data Flow Transformations.

But I do suggest do validation at database side. For example instead of applying validation rules at package level, use a stored procedure at source to apply / check all validations and then from that select the data which can be directly loaded to destination.

If incase source is not a database instead if it is a flatfile, then get all data from flatfile and stage it

on SQL Server and apply all validations and load that data to destination table.

By doing this there might be overhead at database but operation would be faster as validations can be applied or all rows in bulk set operation where as in SSIS the same validation has to be applied as row by row. And if any modifications required at validations, we can simply modify the stored procedure and need not touch the SSIS package.

Data Profiler can be used to validate data.

Q. How to store redirects error information in SQL Server?

Ans:

We can use an OLEDB destination to “SQL Server” log table and error precedence arrow can be mapped to this destination.

But to get more error description we have to use a script component between them. To capture the exact error message use below code at script component:

public override void Input0_ProcessInputRow(Input0Buffer Row)

{

Row.ErrorDescription = this.ComponentMetaData.GetErrorDescription(Row.ErrorCode);

}

Q. Give some simple expressions those are used in SSIS?

Ans:

We usually use “Derived Column” to validate data and we use “Data Conversion” to convert datatype of a column.

Remove Leading and Trailing Spaces: TRIM(<Column Name>)

http://www.sqlservercentral.com/Forums/Topic733263-148-1.aspx

Check NULL existence:

This example returns “Unknown last name” if the value in the LastName column is null, otherwise it returns the value in LastName.

ISNULL(LastName)? “Unknown last name”:LastName

This example always returns TRUE if the DaysToManufacture column is null, regardless of the value of the variable AddDays.

ISNULL(DaysToManufacture + @AddDays)

Q. What is character Map transformation used for?

Ans:

This transformation is used for applying formations to the column data that includes changing characters from lower to upper case, upper to lower case, half width, full width, Byte reversal etc.

See when we are using lookup on columns from source and destinations as we know that SSIS.Lookup is a case sensitive not like T-SQL. So beofore comparing two columns we can design data flow to pas those two columns through “CharecterMAP” and can convert data into a common format either “Lower” or “Upper” case.

Q. What are import and export column transformations?

Ans:

Import Column Transformation – The Import Column transformation reads data from files and adds the data to columns in a data flow. Using this transformation, a package can add text and images stored in separate files to a data flow. *

Export Column Transformation – The Export Column transformation reads data in a data flow and inserts the data into a file. *

Q. How matching happens inside the lookup transformation?

Ans:

Lookup transformation tries to perform an equi-join between transformation input and reference dataset. By default unmatched row is considered as an error however we can configure lookup to redirect such rows as “no match output” (from 2008 and above).

If the reference data set is having multiple matches it returns only the first match. In-case the reference set is a cache then it raises a warning or error incase of multiple matches.

 

Q. What are all the inputs and outputs of a lookup transformation?

Ans:

Input: Dataset from data source

Match Output: All matched rows

No Match Output: All not matched rows. If unmatched rows are not configured to redirect to error output then such rows are redirected to no match output

Error Output: Rows failed to compare or unmatched rows

 

Q. Have you ever used Dataflow Discoverer (DFLD) is SSIS? If yes can you describe why and how?

Ans:

Please have a look at below links to get the detailed explanation about DFLD

http://bennyaustin.wordpress.com/2011/02/04/ssis-name-of-errorcolumn/
http://dfld.codeplex.com/

Q. How to transfer logins using SSIS?

Ans:

It can be done using Transfer SQL Server Login. But there are limitations.

Transferring windows authentication logins to cross domain: Drop and recreate logins

Transferring SQL Logins: Need to change the password as a random password is chosen while moving from source to destination.

Best way to move logins is using scripts: Logins, users, role mapping scripts.

Q. How to troubleshoot connection error regarding

“DTS_E_CANNOTACQUIRECONNECTIONFROMCONNECTIONMANAGER”?

Ans:

1. Incorrect Provider for the connection:

A) Lack of 64-bit provider: Remember on target server if the package is running on 32-bit, make sure that the execution option “Use 32-Bit runtime” is being selected while creating the job to execute the SSIS package.

B) Lack of client binary installed: Make sure client binaries installed on target server.

2. Incorrect connection parameters settings:

A) Typo in password

B) Password is not stored in configuration

3. Failed to decrypt the sensitive information: It usually happens when a SSIS package is executing from SQL Server agent job. If the package is saved with the option “SaveSensitiveWithUserKey” and the sql agent service account different from package creator.

4. Oracle Data Provider Limitation:

Another common scenario happens when you use Microsoft OLE DB Provider for Oracle or Microsoft ODBC Driver for Oracle to connect to Oracle9i or later version database. Recommended is Oracle OLE DB Provider for the Oracle 9i or later versions

Q. What are the logs available to check if a SSIS package fails?

Ans:

1. Windows Event Log & Job History: When SSIS package scheduled from a SQL Job

2. Logs from SSIS Logging Audit: When a log provider configured for package

3. Logs from SSIS Event Handler: When event handler designed to capture the log

4. Logs from the SSIS components: When custom logging configured using Script task

5. Logs from underlying data sources: Check the error log at data source example SQL Server, Oracle etc.

​​

References:

For more MSBI stuff please have a look at below references:

http://blogs.msdn.com/b/business-intelligence

https://sreenivasmsbi.wordpress.com

http://www.msbiguide.com

http://msbiravindranathreddy.blogspot.in

http://sqlschool.com/MSBI-Interview-Questions.html

https://www.katieandemil.com

http://www.venkateswarlu.co.in

http://www.sqlserverquest.com

http://www.msbiguide.com

SSIS Interview Questions and Answers  Part 5

The post SSIS Interview Questions and Answers Part 5 appeared first on udayarumilli.com.

SSIS Interview Questions and Answers Part 6

$
0
0

SSIS_InterviewQuestions_6

SSIS – Part 6

SSIS Interview Questions and Answers for Experienced and Fresher’s

SSIS – Links to SSIS questions

SSIS Interview Questions and Answers  Part 6

We usually do go through various blogs and community forums as a part of analysis and problem solving. Here with we are posting little informative stuff from various blogs.

Q. SSIS slowdown issue

Ans:
​​ http://stackoverflow.com/questions/2678119/is-there-a-reason-why-ssis-significantly-slows-down-after-a-few-minutes

Q. Issues with SSIS service in cluster environment.

Ans:

http://www.mytechmantra.com/forums/index.php?topic=22.0

Q. How to handle date columns in flat file while loading?

Ans:

http://social.technet.microsoft.com/wiki/contents/articles/18943.ssis-flat-file-source-datetime-column-format-issue-solution.aspx

​Q. Common requirements and solutions in SSIS?

Ans:
​​ http://ssisjunkie.blogspot.com/2012/09/common-ssis-problems-and-solutions-we.html

Q. how to attach configuration files to SSIS packages?

Ans:
http://www.mssqltips.com/sqlservertip/2450/ssis-package-deployment-model-in-sql-server-2012-part-1-of-2/
http://sqlscape.wordpress.com/2010/06/29/deploying-ssis-packages-with-xml-configurations/

Q. Have you ever tried logging in SSIS? What are the tables on which log information stored? Can we enable logging for only selected items from SSIS package? if yes how?

Ans:

http://stackoverflow.com/questions/15004109/can-you-monitor-the-execution-of-an-ssis-package-in-bids-as-it-runs-on-the-ser

Q. How to load a list of files from a folder to sql server using ssis?

Ans:

​Explain how to load a list of files from a specific folder to sql server

http://simonlv.blogspot.in/2012/06/ssis-step-by-step-003-load-multiple.html

Q. Design a package to load .xlsx files from a folder. File names must include either “Finance_” or “Local_” and load files only from the last week.

​Ans:

http://stackoverflow.com/questions/8360117/loading-last-7-days-files-with-specific-names-in-ssis

http://www.mssqltips.com/sqlservertip/2874/loop-through-flat-files-in-sql-server-integration-services/

http://stackoverflow.com/questions/8831060/import-most-recent-csv-file-to-sql-server-in-ssis

 

Q. How to capture column name and row number when something failed in dataflow?

Ans:

http://informatics.northwestern.edu/blog/edw/2012/01/etl-assistant-getting-error-row-description-and-column-dynamically/

http://blogs.msdn.com/b/helloworld/archive/2008/08/01/how-to-find-out-which-column-caused-ssis-to-fail.aspx

Q. How to schedule a job in SQL agent to execute SSIS package?
Ans:
http://www.codeproject.com/Articles/14401/How-to-Schedule-and-Run-a-SSIS-package-DTS-Job-in

Q. What are the common problems in SSIS?

Ans:
http://www.rodcolledge.com/rod_colledge/2010/02/fun-with-ssis-part-1-troubleshooting.html

Q. Demonstrate SSIS Solutions for requirements?
Ans:
http://ssisjunkie.blogspot.in/2012/09/common-ssis-problems-and-solutions-we.html

Q. How to create a template for SSIS packages
Ans:
http://www.mssqltips.com/sqlservertip/2841/creating-ssis-package-templates-for-reusability/

Q. SQL Server Integration Services (SSIS) Top 10 best practices:
Ans:

http://sqlcat.com/sqlcat/b/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx
http://sqlmag.com/sql-server-integration-services/designing-ssis-packages-high-performance

Q. What are the SSIS performance issues and possible solutions?

Ans:
http://consultingblogs.emc.com/jamiethomson/archive/2007/12/18/SSIS_3A00_-A-performance-tuning-success-story.aspx
http://www.mssqltips.com/sqlservertip/1867/sql-server-integration-services-ssis-performance-best-practices/
http://stackoverflow.com/questions/2678119/is-there-a-reason-why-ssis-significantly-slows-down-after-a-few-minutes
http://stackoverflow.com/questions/1093021/tracking-down-data-load-performance-issues-in-ssis-package

Q. How to import multiple test files from a folder?

Ans:

http://codejotter.wordpress.com/2010/04/06/importing-multiple-text-files-using-ssis/
http://simonlv.blogspot.in/2012/06/ssis-step-by-step-003-load-multiple.html

​Q. How to speed up the bulk insert in SSIS?
Ans:
http://henkvandervalk.com/speeding-up-ssis-bulk-inserts-into-sql-server

Q. How to use profiler to capture the trace from SSIS?
Ans:
http://henkvandervalk.com/speeding-up-reading-from-a-sql-ole-db-or-ado-net-data-source

Q. What is the fast parse option?
Ans:
http://msdn.microsoft.com/en-us/library/ms139833.aspx

​​

References:

For more MSBI stuff please have a look at below references:

http://blogs.msdn.com/b/business-intelligence

https://sreenivasmsbi.wordpress.com

http://www.msbiguide.com

http://msbiravindranathreddy.blogspot.in

http://sqlschool.com/MSBI-Interview-Questions.html

https://www.katieandemil.com

http://www.venkateswarlu.co.in

http://www.sqlserverquest.com

http://www.msbiguide.com

 

SSIS Interview Questions and Answers  Part 6

The post SSIS Interview Questions and Answers Part 6 appeared first on udayarumilli.com.

Windows 10 – New Features

$
0
0

Windows 10 from Microsoft Product Family

Windows 10 is released, most interesting updates are “New User Interface” and Finally “New Browser – Edge”. Here  you find top 10 features in Windows 10.

udayarumilli_WindJPEGows10_-1 udayarumilli_WindJPEGows10_2. udayarumilli_WindJPEGows10_-2 udayarumilli_WindJPEGows10_3 udayarumilli_WindJPEGows10_4

SONY DSC

SONY DSC

udayarumilli_WindJPEGows10_6 udayarumilli_WindJPEGows10_7 udayarumilli_WindJPEGows10_8  udayarumilli_WindJPEGows10_12 udayarumilli_WindJPEGows10_15 udayarumilli_Windows10_1 udayarumilli_Windows10_6

The post Windows 10 – New Features appeared first on udayarumilli.com.


Database Career Paths

$
0
0

Database Career Paths

udayarumilli_db_career_path

As a blogger I usually get in touch with the followers to discuss on various databases related issues. If I need to give rating to the questions that I answered, Top 1 will be “How to become a successful Database Admin / Developer?” I tried my best in answering them. Now I thought of making it as a blog post which can be helpful for others as well.

If you are interested in database systems and want to make your career in database path, first you should get clarity on “DATABASE ROLES”

There are three basic paths available to make your career in database systems. Below are the 3 paths.

Database Designing & Development:

Database Designers and Developers design and develop a database to hold and process data to support a front-end application which enable end users to do transactions online.

Database Administration:

Database Administrators maintain the designed / developed systems to prevent the interruptions during the transactions.

Data Warehousing:

Data Warehouse teams analyze the captured data and process it to find out the area where the business can be extended or improved.

First let’s have a look on what are the various roles available in each path.

Database Environment Roles

Database Designing & Development:

  • Database Architect
  • Data Modeler
  • Database Designer
  • Database Developer / Engineer

Database Administration:

  • Application / Development DBA
  • Core DBA

Data Warehousing:

  • ETL Developer
  • Database Analyst
  • Report Developer
  • Data Scientist
  • + Roles under Database Design and Development may also applies to this category

Now you have some idea what are the roles available. Now we’ll look into each role and its responsibilities. If you get a chance to choose, select the right path that suits your interest. Hope the below points help you out in choosing the right path.

Database Designing and Development

udayarumilli_Database_Development

Nature:

They do architect, design and develop database systems that support On-Line Transaction (OLTP) Processing and On-Line Analytical Processing (OLAP). Most of the environments follow one of these frameworks “SDLC” or “AGILE”.

Database Architect (Business + Structure + Data + Operations):

Plan and execute the entire project and should have knowledge on all phases (Business + Technology). He / She should be able to answer all the questions related to database system.

Ex: Analyzing client operations and customer requirements, mapping business requirements to technology, designing secure and optimized database systems.

Data Modeler (Business + Data + Structure):

Work on mass / raw data and give a structure to that. To simply say that he / she will act as a bridge between business and IT. Means they understand the data and convert business requirements into conceptual, logical and physical models that suit the requirement.

Ex: Separating data and operations, Identifying Entities and Relations etc

Database Designer (Data + Structure):

From the requirement analysis he / she should be able to design database by following best practices.

Ex: Designing Databases, Tables, Datatypes, Capacity Planning etc

Database Developer/ Engineer (Operations):

Based on the design developer / engineer develop database code to fulfill the actual business requirement.

Ex: Creating Procedures, Functions, Views etc

These People…………………………………….

  • Closely work with client / business team
  • More chances to work at onsite
  • More programming experience
  • Can be expertise on a particular domain which is an added advantage
  • Work is planned and mostly long term challenges
  • Can see experts in SQL programming and business functionality
  • Plays key role in building database systems

Database Administration

udayarumilli_dba

Nature:

They do maintain database systems to make sure databases / database servers are up and online by 24*7. Mostly DBA works in ITIL environments.

Application DBA:

Usually they work on Development, Test and Stag environments to support the database systems. Apart from database systems they should have knowledge on application configurations and business up to some extent.

Ex: Troubleshooting App-DB connectivity issues, Deploying Scripts, Debugging Scripts etc.

Core DBA:

Core DBA’s are who responsible for PRODUCTION database servers / databases.

Ex: Running Health Checks, High Availability, Troubleshooting issues, handles Service Requests, Problem Requests etc.

These People……………………………………………..

  • Closely work with end customers / users
  • Can be expertise in Technology Infrastructure field
  • Mostly work from offshore
  • Have to face unplanned outages
  • Mostly have to face the daily challenges
  • Most of DBA’s work in shifts
  • Usually do not have much knowledge on business functionality
  • Would see more experts in server and database internals
  • Plays key role in database maintenance

Data Warehousing

udayarumilli_Datawarehousing

Nature:

Deigning and creating a centralized repository and process the past trends to predict the future trends.

ETL Developer:

Design and develop an ETL (Extract Transfer Load) process to integrate data between various systems.

Ex: Developing SSIS packages to integrate data from legacy systems to SQL Server 2014.

Database Analyst:

Analyze the business requirements and confirms the project requirements. He / She analyze monitor data feeds and tune database systems when required.

Ex: Monitor test strategies to check they are matching with the requirements

Report Developer:

Design, create business reports that helps management to take the right decisions.

Ex: Creating sales reports using SSRS

Data Scientist:

The Data Scientist is responsible for designing and implementing processes and layouts for complex, large-scale data sets used for modeling, data mining, and research purposes.

These People:

  • Closely work with business team and architects
  • More chances to work at onsite
  • More analysis experience and having knowledge on business functionality
  • Can be expertise on a particular domain which is an added advantage
  • Work is planned and mostly long term challenges
  • Plays key role in decision making systems
  • Mostly work with OLAP systems.
  • Can see experts in data and business analysis
  • Work with huge datasets

Resource Utilization

Remember these roles and responsibilities vary based on organization policies, management and environment. If below are the various phases in designing and developing a database.

  • Requirement Gathering and Analysis
  • Conceptual Design
  • Logical Design
  • Physical Design
  • SQL Coding
  • Testing
  • Optimizing
  • Version Maintenance
  • Build
  • Deploy
  • Maintenance

Let’s see how resources allocated in different environments:

Enterprise Environment

  • Database Architect
  • Data Modeler
  • Database Designer
  • Database Developer
  • Build Engineer
  • Database tester
  • DBA

Mid-level Environment

  • Database Architect
  • Database Developer
  • DBA

Start-Up

  • Database Engineer
  • DBA

This is just an example how resource are utilized in various environments. It always depends on the business and budget.

Famous Database Systems

  • Oracle
  • Microsoft SQL Server
  • IBM DB2
  • My SQL
  • SAP Sybase ASE
  • Postgre SQL
  • Teradata
  • Informix
  • Ingres
  • MariaDB Enterprise

The post Database Career Paths appeared first on udayarumilli.com.

Upgrade to Windows 10 For Free

$
0
0

udayarumilli_windos10_upgrade_2udayarumilli_windos10_upgrade_3.jpg.part

Upgrade to Windows 10 For Free

Haaaaaay Just my PC got upgraded to Windows 10…….

How I have upgraded?

  • Got a Notification a week back
  • Logged into Microsoft account to reserve Windows 10
  • It checked my PC compatibility and reserved the version upgrade.
  • Again a Notification “Will notify once it is ready”
  • Today in the morning it started downloading
  • After the download I confirmed the upgrade and it started installing
  • It took 35 min to finish everything
  • My PC has upgraded to Windows 10

What are all the features I loved in Windows 10?

  • First time a windows version is free for all Non Enterprise users
  • The overall look and the colorful desktop
  • Finally the customized Start Menu
  • No more IE Now it’s Edge
  • Most beautiful thing is “Performance Improved”
  • Now it’s easy and handy to change settings

Instructions:http://www.microsoft.com/en-us/windows/windows-10-upgrade

The post Upgrade to Windows 10 For Free appeared first on udayarumilli.com.

How to become an Expert in your Career

$
0
0

udayarumilli_howtobecomeexpert_1

How to become an Expert in your Career

In Previous Post we can understand what are the various Paths in Database Career.

Recently we conducted an interview for a SQL Developer with 6-8 years of experience. We could hardly shortlisted 2 members out of 18. These 2 are experts at least in their stage. We can’t say that remaining 16 people are not good, they can manage and deliver the task but still we need experts why?

Experienced people can manage their work and deliver the project on-time.

Experts tackle problems that increase their expertise, approach a task in a way that maximizes their opportunities for growth which can take business to the next level.

We had a discussion with various engineers, architects from different organizations and we have prepared a list of points which can help an experienced / fresher to become an expert.

Key points that make you an Expert in your chosen Path:

  • Know the Internals
  • Expertise Your Profile
  • Performance Tuning
  • Accurate Estimations
  • Never React Always Respond
  • Keep your own best practices and Troubleshooting guide
  • Training, Learning and Knowledge Sharing

 

 

udayarumilli_howtobecomeexpert_2Know the Internals

We do work on various items in a typical working day. While working on an activity apply the formula WHW (What-How-Why). Most of the people knows “What” and “How” but ignore “Why” except Experts. Have a look at below examples.

Let’s say you are a fresher and joined in an organization:

Database Administrator:

Your lead asked you to create a database in SQL Server and keep .mdf on X drive and .ldf on Y drive.

WHAT is mdf and ldf? You co-worker can help you or you can google it.

HOW to do this? Again google can help you on this

Your work is actually done. If you wanted to become an expert ask the next question “WHY”?

“Expert Zone”

WHY should we keep mdf and ldf in different drives?

Ans: To improve the accessibility

HOW it improves accessibility?

Ans: Each drive is having a separate set of I/O buses. Since both files are in different drives, more buses will be available to complete the task.

What is the problem if both are in same drive?

Ans: Task has to be completed using the limited I/O buses which delays the execution time.

Database Developer:

Your lead asked to create a procedure. Use a temp table to hold some data and process it etc.

WHAT is a Temp Table?

How to create a Temp Table?

Expert Zone

WHY Temp Table? Why not a Table Variable?

WHAT actually happens in database when we use Temp Table?

WHAT is the most optimized way of holding temporary data?

Note: Ask “WHY” if you want to become an expert in your path and know the internals. Since timelines are really matters in IT we can’t always spend some extra time. But remember you just have to know the answer for “WHY”.

udayarumilli_howtobecomeexpert_3Expertise Your Profile

Your profile is the first piece of information that shows what you are. Let’s say Person A and B is having 8 years of experience. If “A” worked on Oracle for 8 years and B worked on SQL Server for 3 Years, on Oracle for 5 Years. Can you guess who are having more chances to get good career opportunities and growth, it’s absolutely person B. Do not stick on a technology and never miss a chance to work on other (Similar) technology.

Database Administration:

A SQL Server DBA Expert should have experience in:

  • Database Server Installation and Upgrades
  • Database Capacity Planning
  • Day to Day Database Monitoring Maintenance
  • Backup and Recovery
  • High-Availability Solutions
  • Disaster Recovery Solutions
  • Performance Tuning (Server and Database Level)
  • Troubleshooting and providing instant solutions
  • Planning, Testing, Implementation and Maintenance of a SQL Server Instance
  • Database Maintenance Automations
  • ITIL Standards
  • Securing Database servers
  • Good Contacts with Technology Masters
  • Recommended Certifications
  • T-SQL
  • Powershell Script
  • Native Plus Third party Monitoring Tools usage
  • Using Version Tools – SVN, TFS
  • DBA in other RDBMS (Oracle, My SQL, DB2)
  • Able to Work in a Team or Individual

A DBA should be expertise in one or two areas and should have experience in remaining.

Database Design and Development:

An Expert SQL Server Database Engineer should have experience in below:

  • Requirement Gathering Analysis
  • Designing a database
  • Good Understand in Basics – Joins, Procedures, Functions, Triggers, Views, Synonyms etc
  • Database Level Performance Tuning
  • Execution Plan analysis
  • Index Tuning
  • T-SQL / PL/SQL
  • Writing Secure and Optimized database code
  • Integrity /Domain / Functional Testing for Database Code
  • Using Version Tools – SVN, TFS
  • Preventing / Handling SQL Injections
  • Good Contacts with Technology Masters
  • Recommended Certifications
  • Knowledge on Front End Technology (.Net, Java etc)
  • Expertise in a Domain (Banking, Financial, Insurance, Healthcare, Media, Law etc)
  • At least work in two RDBMS (Oracle PL/SQL, SQL Server T-SQL)
  • Should be experienced in Code Reviews
  • Able to Work in a Team or Individual
  • Familiar with SDLC / Agile

 

udayarumilli_howtobecomeexpert_4 Performance Tuning

Performance tuning is the key aspect and which is actually tell about you whether you are an expert or just experienced. It’s all about End-User satisfaction who use applications for which you design and maintain database, they won’t bother about the technology rather they just see how application is performing. It applies to all designing, development, maintenance, administration and warehousing. Whenever you design or create in a database you should think of below aspects:

Database Administrator:

Your manager asked you to prepare a new SQL Instance and configure it. What are all the things you should consider to build an optimized and secure instance?

What is the business all about?

Ex: OLTP or OLAP

What is priority – Concurrency or Huge Data Processing?

Ex: DB to support online transactions or it for business reports

What is the Minimum and Maximum Memory required?

Ex: You should predict/determine the future usage based on the existing environment and the requirement

How many processors required?

Ex: You should predict/determine the future usage based on the existing environment and the requirement

What is the data growth expected?

Ex: Based on this we can allocate data and log files to drives also decide on Auto Grow option.

Decide on TempDB usage

Ex: It depends on lot many things. You should be able to predict version store, internal and external objects.

What is the Index fill factor percentage to choose?

Ex: It depends on data modifications. For example when expecting more data modifications it can be 80 to 90 percent, if it is static / archive data it can be 100%.

What is the disk system required?

Ex: You should have command on RAID levels

What is the backup policy required?

Ex: It depends on Service Level Agreement (SLA) between organization and vendor.

What are the disaster recovery / High – Availability options required, if not now in future?

Ex: It again depends on the business and the down time accepted.

Can we have a shared instance on the same machine?

Ex: Decide can business allow us to have more than one instance on a same machine or not

Is that instance is under organization security policy?

Ex: Get clarity if any special permission required as per the business requirement

What exactly basic maintenance required for the new instance?

Ex: Apart from basic maintenance any data feed jobs required etc

What is the cost estimation?

Ex: It’s not a DBA role to decide on cost and license, but still a DBA should know about this as he / she has to suggest the team for a better available choice. Work and give the accurate Time estimation.

A DBA should know answers for all these questions to build an optimized and secure instance.

SQL Developer / Designer:

Now we’ll see what a SQL Developer should know before starting a task. Let’s assume that your manager asked you to create a table and create a stored procedure to insert data. You should be able to answer below questions:

Check if it can be normalized

Ex: Figure out the Key column and make sure all columns are fully dependent on key column. There are some areas where we need not bother about normalization. For example when a table is too small but the table is expected to participate in joins with the tables which are huge. In that case we’ll get the performance gain if we do not normalize this small table. This is just an example we have to predict the table usage in your environment and design it accordingly.

Check any partitions required?

Ex: If that table is going to hold millions of rows and already other tables portioned check with your manager if it needs to be portioned.

Follow the correct naming conventions:

Ex: Always use the schema name with object name. If we do not use a schema name, db engine first search the default schema for this object. Do not use reserved key words.

Use the correct data type for columns

Ex: Always choose the right data type for a column. We have to consider below while assigning a datatype

Data Range – Minimum and Maximum expected values

Data Length – Fixed or Variable

Null value acceptance – Depends on business requirement

Domain value acceptance – Check your requirement document

Determine the null acceptability

Ex: Correctly define the NULL acceptability in a Table. We should predict the future when we are designing NULLABLE columns. Let’s say we have designed a column with CHAR(10) NULL. Can you imagine if the table is having 25 Million Rows and in this column filled for 10 K rows only? Now let’s see how the space is occupied for this column. Total space for this column (25 Million – 10 K ) X 10 Bytes. In that situation we can consider creating other table with Primary Key from this table and this column. Now we can save the space. This is just an example to say that we need to predict / estimate the future while creating any single object in database.

Decide on table usage

Ex: Determine the security level; see if we need to create a view with limited columns

Choose the correct key column

Ex: Follow best practices in choosing key column, usually it should be a numeric.

Determine the index possibilities

Ex: Choose a correct index when it needs to be created.

Choose the right constraints

Ex: Always choose the correct constraint. On employee table there is constraint on Age (>22). This can be implemented using CHECK Constraint and TRIGGER as well. But CHECK is the right constraint here.

Are you following any template for writing procedures?

Ex: Define a master template based on your business requirement and follow the template across the database.

Any specific SET options we need to use?

Ex: Based on your environment enable all SET options. Ex: SET NOCOUNT ON

Any chances that your procedure execution will result into a Deadlock?

Ex: If yes, take the steps to prevent deadlocks.

How you are going to handle errors?

Ex: Are you using any centralized table to capture error information? If not start a framework that suits your environment.

How we are handling datasets inside procedure?

Ex: Choose the best available object based on the requirement and the size of data. Temp Table, Table Variable, CTE, Derived Table etc.

What is the acceptable maximum response time for stored procedure?

Ex: Have you defined maximum response time in your environment, if not determine it and tune all objects which are not under the defined time.

What is the default ISOLATION level as per the business requirement?

Ex: Decide what ISOLATION level your business required.

Are you closing/de-allocating all blocks, temp objects and transaction in procedure?

Ex: It’s always a best practice that deleting / closing temp objects and loops.

These are the various questions for which a Database Developer should know the answers before starting the task.

It is not easy to consider all these aspects when every time you are assigned to a task. An Enterprise practice is designing a template and follow the temple in getting requirements and implementation.

udayarumilli_howtobecomeexpert_5Accurate Estimations

The most common question that you here from your reporting manager / Client is “How long will you take to get it done?”. Here comes the time estimates, this is also a key area which tells about you whether you are an Expert or just an Experienced. Below are the points that you should consider before giving the time estimation for any task / project.

Do you have the Exact Requirement?

Ex: You should have clarity on what is exactly the expected output.

Do you have all required Inputs?

Ex: Once you know answer for the above point then list out all required inputs to finish your task that includes “GAP Analysis in requirement”, “Resources Required” etc. It always depends on the task and the environment.

Do you have priorities with you?

Ex: First break down your task into small chunks and then give the estimates for each task. It calculates the final cost estimation for the complete task. Once you do that prioritize the tasks and do finish the tasks with high priority. It all depends on the criticality of the task and environment.

Always Expect a Delay:

Ex: Make sure you are including some extra buffer time when giving estimates. Below points will give us why we should need buffer time:

  • Expect a meeting or discussion
  • If any unexpected issues comes in
  • Resource and Infrastructure availability

Review the progress:

Ex: Review the task progress and give updates to the customer / task owner periodically. We can put some extra effort if you observe any differences.

Note: Accuracy in cost estimation comes from your experience. Do maintain a template for giving cost estimations thereof accuracy comes when giving estimations for the similar tasks.

udayarumilli_howtobecomeexpert_6Never React always Respond

One of my senior managers taught me how to habituate this. One should think of three points when something wrong happens in your environment.

  • Humans do Mistakes – Think of a situation where you did a mistake
  • We can’t control the past
  • What is the solution?

To be successful in your career this is the first point you should remember “Never React Always Respond”. This is not easy as I said; there is a simple technique to habituate this: ”Separate the Person from the Problem”. Do not blame / point out a person who did a mistake as it doesn’t take you towards the solution. First separate the person from the problem, and then start asking questions to find the possible solutions. Ok, now let’s see some examples as below.

Ex: One of your team members accidentally deleted a Table from production instance and he didn’t notice it until customer complaining that they are getting errors in their application side. How do you deal with this?

“Why did you run that command, I have told you lot many times……etc” will increase the problem density and situation becomes more complex.

Do not concentrate on “WHY” or the person who did the mistake?

Start thinking towards the solution:

“Is there any way that we can stop /rollback that command?”

“Is that happened during implementation of any specific request?”

“If Yes! Did you take any backup or snapshot before running the script?”

“Can someone be able to quickly get the table data into a temp table from the latest bkp?”

“What kind of data, is that a static or transnational data table?”

“If it is static / lookup can we get it from other similar environment?”

Instead of reacting if you start asking questions towards the solution, people around you will start thinking in a positive way and they too work on the same goal.

You can take the proper action once the solution provided and situation comes to normal. Ask your people to provide a complete report with a detailed RCA. And then you can take the proper action to prevent such kind of mistakes in future.

udayarumilli_howtobecomeexpert_7Keep your own best practices and Troubleshooting guide

Maintain your own document / guide, create a template in documenting problems. Whenever you come across with a problem note it down along with the possible solutions and the actual solution worked out for the current situation. You can save your time and energy if the same problem comes again. A personalized knowledge / troubling shooting guide made by an expert can create wonders and can be a reference guide for the next generation DBA / DB Developer.

udayarumilli_howtobecomeexpert_8Training, Learning and Knowledge Sharing

Irrespective of the role that you are working on you should go through some training on monthly / Quarterly / Yearly basis. Learning and training’s should include below.

  • Latest Technology (SQL Server 2014, Oracle 12c, Big Data Platform, Cloud Technologies)
  • Process Oriented (Six Sigma, Information Security, Agile)
  • Personality Development
  • Communication

Do share your knowledge, if you are too busy to share things you can choose a way from which suits your role and environment.

  • Create a blog / website
  • Participate in Technology Forums
  • Attend the local Tech-Eds
  • Share Key Data Points using e-Mail

 

Finally: Identify your strength and choose a career path which most suits to you, you love it, you enjoy it, automatically dedication comes in, output is the best productivity, Expertise in your path, you grow, let your organization grow, best opportunities comes to you , reach your goal. Maintain a well-balanced life between Personal & Professional lives, have a lot’s of beautiful moments, earn more, save more, serve something. Wishing you all the very best for your career.

The post How to become an Expert in your Career appeared first on udayarumilli.com.

Freelancing with toptal

$
0
0

udayarumilli_toptal_logoFreelancing with toptal

One of the best Freelancing Zone TOPTAL

Have you ever worked as a freelancer? Recently I came across with a unique freelancing site. Let’s see how it is different from other freelancing sites.

  • Here we need not chase for clients: You need not worry about bid and all stuff, go through the interview process and join the community.
  • Comparatively you can opt for high billing rates (Long Term, Short Term and Hourly)
  • You still can work with the best clients: AXEL SPRINGER, J.P.MORGAN, KDDI America and a lot more

It says “Hire Top 3% Freelance Developers” as they follow a typical process to allow a developer into the community. Process to Join TopTal:

  • Language & Personality: HR will check your communication and English speaking skills
  • Timed Algorithm Test: Online test
  • Technical Screening: By sharing your screen with their senior engineers
  • Building a Test Application: You need to do this and should present the project.

Please have a look at here to know more about the hiring process

This test pattern might get changed based on your background and technical skills but somehow this is a 4 step process. However once you clear the interview process, you can be a part of the community. People who experienced in scripting languages and programming languages can have a try.

Freelancing can add some extra dollars to your income but it’s tiny when it compares to the knowledge that you get by working and handling a project / task independently.

By the way the “toptal engineering team” has published few SQL Server interview questions. Have a look at here for those questions and answers.

4 Essential SQL Server Interview Questions*

The post Freelancing with toptal appeared first on udayarumilli.com.

Stored Procedure Code Review Checklist

$
0
0

udayarumilli_stored_procedure_code_review

Stored Procedure Code Review Checklist

In any enterprise environment developing a database system includes the database code review. We have to document and follow the best practices in designing and developing database code. Most of the time while interacting with the customers we may need to answer the question “What you consider in reviewing the code?” Here I am going to showcase stored procedure code review checklist that we use for reviewing or unit testing a stored procedure.

Stored Procedure Development Life Cycle:

First let’s see what are the steps followed in developing a stored procedure:

  • Get the requirement – all required inputs and expected outcome
  • Write the test cases to validate the business requirement
  • Design and create the code / stored procedure
  • Debug, Compile and Run the procedure
  • Cross check all test cases passed
  • Send it for code review
  • Apply suggested changes if any from code review
  • Deploy it to the intended environment
  • Document the process

Why Best Practices / Code Review?

In enterprise environment we do follow a best practices guide for developing and reviewing database code. Now we’ll see what are the advantages in following the best practices guide / reviewing the code?

Dependency:

It minimize the resource dependency as all database objects follow a specific standard. One can easily understand and deal with the code changes.

Integration:

Easily integrated with the existing environment

Optimization:

We can see the best optimized and error free code

Stored Procedure Code Review Check List

We have defined checklist in category wise. Below are the various categories in stored procedure code review check list.

  • General Standards
  • Scalability
  • Security
  • Transactions
  • Performance
  • Functionality
  • Environment Based Settings

General Standards(Code Format, Naming Conventions, Datatype and Data Length, Syntax):

  • Always follow a template in designing stored procedure so that it can easier developer job while designing and integrating. For example each stored procedure should be defined as various blocks such as “Comments Section”, “Variable Declaration”, “Actual Body”, “Audit”, ”Exception Handling”, “Temp_Obj_Removel” and define environment sections if any required.
  • Check proper comments are used or not. Always describe procedure, inputs and expected output in comments section.
  • Check naming conventions are used properly for procedure name, variables and other internal objects.
  • Check all objects used inside the procedure are prefixed with the schema name and column names are referencing with table alias.
  • Check all table columns used / mapped are using the correct datatypes and column length.
  • Check if all required SET based options enabled are not.
  • Check if there are any temporary objects (Temporary tables, cursors etc) used, if yes make sure these objects closed / removed once their usage is done.
  • Make sure Errors are handling properly.
  • Define NULL acceptance in the procedure and code accordingly.
  • Lining up parameter names, data types, and default values.
  • Check spaces and line breaks are using properly.
  • Check BEGIN / END are using properly.
  • Check parentheses are using properly around AND / OR blocks.

Scalability:

  • Use fully qualified names (Ex: Instead of PROC use PROCEDURE) for a better integration.
  • Check if we are using any deprecated features.
  • Check if any other/nested dependent objects used and make sure that all objects are available in DB and all functioning properly and add them into dependent list.
  • Never use “SELECT *” instead use all required columns.
  • If there are any triggers defined on tables used inside the procedure, make sure these triggers are working as expected.

Security:

  • In case of any errors make sure that the complete error information is not throwing to the application instead use a centralized table to hold the error information and send a custom error message to the application.
  • Apply encryption procedures while dealing with sensitive information (Ex: Credit Card numbers, pass codes etc.).
  • If any dynamic SQL is used make sure it executes through only SP_EXECUTESQL only.
  • Prefer views instead of tables wherever is possible.
  • Document all permissions required to run the procedure.

Transactions:

  • If any transactions are used, check it is following ACID properties as per the business requirement.
  • Keep the transaction length as short as possible and do not select data within the transaction rather than select required data before starting the transaction and process it inside the transaction.
  • Check commit and ROLLBACK is available happening as expected, cross check when using nested stored procedures.
  • Avoid transactions that require user input to commit.

Performance:

  • Cross check we are selecting / retrieving only required data throughout the procedure, always use Column names instead of “SELECT *”.
  • Check the column order in where clause, we should remember it impact the index usage, change the order if required.
  • Avoid using functions / conversions while selecting and comparing, If result set is having 30 rows means that function is called >=30 times. let’s say “WHERE <TAB.ColName> = MONTH (@DateParam)”, we can fulfill this by creating a local variable and assigning this value to that and we can use that variable in where clause.
  • Cross check if we can have a better / short way to get the same outcome with fewer joins.
  • Always do filter data as much as we can and then apply required operations.
  • Have a look on aggregations if any. Always do aggregations on a possible shortest dataset. Example we have a requirement “We want to know the top selling product details on each eStore”. Now do not join Product_Orders, Product_Details and group by on e-store name by selecting max of revenue e-store wise. Instead of doing this first get the productID’s with highest income e-Store wise and then map it with Product_Details.
  • Check if there is any chance for bad parameter sniffing: Make sure that procedure parameters are assigning to local variables and referring these variables in queries instead of directly referring PROCEDURE parameters.
  • Choose the best temporary object (Temp_Table, Table_Variable, CTE and Derived_Table) based on the requirement, here we should predict the near future.
  • Try to use TRUNCATE instead of DELETE whenever is possible, remember we should know the difference and the impact.
  • Check the response / execution time and make sure it is under the benchmark.
  • Avoid cursors, use while loop instead.
  • Check for the alternatives for costly operators such as NOT LIKE.
  • Make sure that it returns only the required rows and columns.
  • Analyze cost based execution plan to make sure No Bookmark / RID Lookup, No Table/Index Scans taking more cost, No Sort – Check if we can use Order By, Check Estimated and Actual counts.
  • Have a look at Optimizer Overrides – Review the code to determine if index hints or NOLOCK clauses are really necessary. These hints could be beneficial in the short term, but these overrides might impact negatively when data changes happen or database migrated to new version.

Functionality:

  • Prepare various test cases and make sure all test cases works fine. Example prepare test case to send all possible inputs to a PROCEDURE, define the expected output and compare with the actual output.
  • Check Code is error free and parsing correctly and executing without any issue.
  • Check output result set is coming properly, number of rows, number of columns, column names, datatypes etc.

Environment Based Settings:

  • Document all environments based settings and follow those instructions in designing the procedure.

Ex 1: In a highly secure database environment, all procedures should call a nested Audit procedure which collects all session details and stored in an audit table.

Ex 2: In an environment before performing and bulk operation we have to get the latest lookup values in lookup tables.

Summary:

  • Always define your own standards and follow best practices in designing database code.
  • Prepare an excel sheet with the required checklist and make sure that the sql developer is filling the sheet before sending it to the code review.
  • Initially it might take some extra time in development phase but it simplifies the code review process and leads to the best productivity.

The post Stored Procedure Code Review Checklist appeared first on udayarumilli.com.

Database Design Concepts

$
0
0

Database Design Concepts

udayarumilli_database_design_concepts

People who work in database systems should have some knowledge in database design concepts. This article let you understand the high level process in designing a new relational database system. These concepts will help you in designing, developing, administrating and maintaining database systems. A SQL Developer / DBA with 5+ years of experience will be asked a common question in any interview is “What are the phases in database Design / Any Idea what are the steps needs to be taken to design a secure , scalable and optimized database system? / How to design a database?”. Here is the answer.

Database Design Concepts

Here we are going to learn below concepts that helps you in designing a new database.

  • Database Design Phases
  • Normalization
  • Types of Relationships
  • Database Models
  • Database Integrity

Database Design Phases

  • Requirement Specification and Analysis
  • Conceptual\Semantic Database Design
  • Implementation\Logical Schema Design
  • Physical Schema Design
  • Optimization \ Administration

Requirement Specification and Analysis:

  • In this phase a detailed analysis of the requirement is done. The objective of this phase is to get a clear understanding of the requirements.
  • The database designer’s first step is to draw up a data requirements document. The requirements document contains a concise and non-technical summary of what data items will be stored in the database, and how the various data items relate to one another.
  • Taking the ‘data requirements document’, further analysis is done to give meaning to the data items, e.g. define the more detailed attributes of the data and define constraints if needed. The result of this analysis is a ‘preliminary specifications’ document.
  • Some of the information gathering methods are:
    • Interview
    • Analyzing documents and existing system or data
    • Survey
    • Site visit
    • Joint Applications Design (JAD) and Joint Requirements Analysis (JRA)
    • Prototyping
    • Discussions / Meetings
    • Observing the enterprise in operation
    • Research
    • Questionnaire
  • From all these methods we need to:
  • Identify essential “real world” information
  • Remove redundant, unimportant details
  • Clarify unclear natural language statements
  • Fill remaining gaps in discussions
  • Distinguish data and operations
  • Go to the next phase to give the model for the existing data.

Conceptual Database Design:

The requirement analysis is modeled in this conceptual design. Conceptual database design involves modeling the collected information at a high-level of abstraction. The ER diagram is used to represent this conceptual design. ER diagram consists of Entities, Attributes and Relationships.

  • Allow easy communication between end-users and developers.
  • Has a clear method to convert from high-level model to relational model.
  • Conceptual schema is a permanent description of the database requirements
  • What should be the outcome from this phase:
  • Entities and relationships in the enterprise
  • Information about these entities and relationships that we need to store in the database
  • Integrity constraints or business rules that hold
  • Type of relationship or Cardinality (1:1, 1:N, N:M) should be defined
  • At the end of this phase the database `schema’ in the ER Model can be represented pictorially (ER diagrams).

Logical Schema Design:

Once the relationships and dependencies are identified the data can be arranged into logical structures and is mapped into database management system tables. Normalization is performed to make the relations in appropriate normal forms.

  • Map all entities, attributes and relationships from ER diagrams to relational database objects
  • Map regular entities
  • Map weak entities
  • Map binary relationships
  • Map associative entities
  • Map unary relationships
  • Map ternary relationships
  • Map supertype/subtype relationships
  • Find out the anomalies
  • Insertion: Data about an attribute inserted at more than one place (Courseid, sid, sname)
  • Deletion: Removing data of an attribute required delete operation at more than one place
  • Update: Updating data of an attribute required Update operation at more than one place
  • Identify the candidate\primary keys
  • Normalize all the relations in database be following the normal forms
  • First normal form (No Multivalued Dependency)
  • Second normal form (No Partial Dependency)
  • Third normal form (No Transitive Dependency)
  • Boyce-Codd normal form
  • Fourth Normal form
  • Fifth Normal form
  • Apply all constraints from ER diagrams to make sure that the database is integrated
  • Domain integrity – Allowable values to a attribute in the domain
  • Entity integrity – No primary key attribute may be null
  • Referential integrity – Data must be consistent (Primary foreign key matching) and enforcement on Delete (Cascade)

Physical Schema Design:

It deals with the physical implementation of the database in a database management system. It includes the specification of data elements, data types, indexing etc. All these information is stored in the data dictionary.

  • Information needed for physical file and database design includes:
  • Normalized relations plus size estimates for them
  • Definitions of each attribute
  • Descriptions of where and when data are used (entered, retrieved, deleted, updated, and how often
  • Expectations and requirements for response time, and data security, backup, recovery, retention and integrity
  • Descriptions of the technologies used to implement the database

Things that should be considered in the physical schema design

  • Usage Type: OLTP, OLAP, Production, Dev, Test, etc
  • Data Storage: Choose the appropriate data type. Main Memory and Secondary Memory
  • Database Size (Data File and Log File growth expecting): Size of Relations -> tuples
  • Processing Time: Processors using and speed according to the expected growth
  • Operating System: OS capacity
  • Access Methods (Physical Sequential, Indexed Sequential, Indexed Random, Inverted, Direct, Hashed)
  • CRUD Matrix (Database Usage – Create, Retrieve, Update and Delete)
  • Security – Security considerations while storing data files
  • Disaster Recovery – Recovery planning according to the business
  • Indexing: Index primary, foreign keys, attributes that uses in filters and in sorts. Do not index attribute that are having few values.
  • Cashing
  • De-normalizing

Normalization

Normalization is a method of arranging the data elements in an organized format to optimize the data access. Normalization Avoids:

  • Duplication of Data
  • Insert Anomaly
  • Delete Anomaly
  • Update Anomaly

Normal Forms:

There are 5+ normalization rules (5 Normal Forms and Boyce-Codd), but most day-to-day database creation focuses on the first three. These rules are used to help create flexible, adaptable tables that have no redundancy and are easy to query. Before normalizing begin with a list of all of the fields that must appear in the database. Think of this as one big table.

First Normal Form:

  • Remove repeating groups by moving those groups into new tables.
  • Example1: Divide Name column into “First_Name” and “Last_Name”
  • Example2: One field called “Month” instead of 12 columns called “Jan”, “Feb” etc.

Second Normal Form:

  • Remove Partial Dependencies.
  • Partial Dependency A type of functional dependency where an attribute is functionally dependent on only part of the primary key (primary key must be a composite key). A column in a table is partially dependent on a primary key.
  • Ex: Below table is having a composite primary key on (StudentID and CourseID) and it is not in second normal form because the column StudentName is partially dependent on Primary Key (Only on StudentID)

StudentID, CourseID, StudentName,Grade

Remove partial dependency by dividing into two different tables.

StudentID, CourseID, Grade

and

StudentID, StudentName

Now above tables are in 2nd normal form.

Third Normal Form:

  • Remove transitive dependencies. (C B A. Indirectly C A not directly)
  • Transitive Dependency is a type of functional dependency where an attribute is functionally dependent on an attribute other than the primary key. Thus its value is only indirectly determined by the primary key.
  • Ex: In below table CourseID is primary key. But the column FacultyOffice is not directly depends on primary key and it depends on a non-primary key column “FacultyID”.

FacultyOffice FacultyID CourseID

CourseID, Section, FacultyID, FacultyOffice

Hence divide the table to remove the transitive dependency.

Types of Relationships

There are three types of table relationships. Each has its own unique purpose in helping to organize the data within the database. These relationships can be used to determine joins in queries as well.

1) One-to-One Relationships: one record in a table is related to one record in a related table; creates equally dependent tables

Ex. one student has only one PSU ID

*NOTE: This type of relationship is rarely used.

2) One-to-Many Relationships: one record in a primary table is related to many records in a related table; however, a record in the related table has only one related record in the primary table

Ex. a student can live in one residence hall at a given time, but many students can live in a residence hall at a given time

*NOTE: This is the most common type of relationship.

3) Many-to-Many Relationships: several records in the primary table are related to several records in a related table

Ex. one student can be enrolled in many subjects and each subject can have many students enrolled

Data / Database – Models

Data Modeling:

Data modeling is a method used to define and analyze data requirements needed to support the business processes of an organization and by defining the data structures and the relationships between data elements. There are three types:

  • Semantic /Conceptual Model
  • Logical Model
  • Physical Model

Database Models:

Databases appeared in the late 1960s, at a time when the need for a flexible information management system had arisen. There are five models of DBMS, which are distinguished based on how they represent the data contained:

  • Hierarchical
  • Network (CODASYL)
  • Object/Relational
  • Object-Oriented
  • Multi-dimensional
  • Semi Structured
  • Associative
  • Entity-Attribute-Value (EAV)
  • Context

Signs of Good Database Design:

  • Thoughtfully planned
  • Works for the intended situation and purpose
  • Streamlines a process
  • Shows consistency among data (Fall 05 vs. fall 2005)
  • Eliminates redundancy as much as possible, i.e. tables are normalized
  • Provides users with an easy way to enter, store, and retrieve data
  • Does NOT promote deleting data, but rather making designations or archiving it
  • Provides unique identifiers (primary keys) for records
  • Grows, changes, and improves as needed

Database Integrity

Enforcing data integrity ensures the quality of the data in the database. For example, if an employee is entered with an employee_id value of 123, the database should not allow another employee to have an ID with the same value. Data integrity falls into these categories:

  • Entity integrity
  • Domain integrity
  • Referential integrity
  • User-defined integrity

Entity Integrity:

The intention of entity integrity is to uniquely identify all the rows in a table. For this we need to add primary key to a column.

Domain Integrity:

A domain defines the possible values of an attribute. Domain Integrity rules govern these values. In a database system, the domain integrity is defined by:

  • The data type and the length
  • The NULL value acceptance
  • The allowable values, through techniques like constraints or rules
  • The default value We can use the check constraint to restrict the column values

Referential Integrity:

Referential integrity is a database concept that ensures that relationships between tables remain consistent. When one table has a foreign key to another table, the concept of referential integrity states that you may not add a record to the table that contains the foreign key unless there is a corresponding record in the linked table. It also includes the techniques known as cascading update and cascading delete, which ensure that changes made to the linked table are reflected in the primary table. We can implement this by using foreign key constraints.

User-Defined:

Integrity User-defined integrity allows you to define specific business rules that do not fall into one of the other integrity categories. All of the integrity categories support user-defined integrity (all column- and table-level constraints in CREATE TABLE, stored procedures, and triggers).

References:

http://www.techrepublic.com/article/five-common-errors-in-requirements-analysis-and-how-to-avoid-them/

https://msdn.microsoft.com/en-us/library/cc505843.aspx

http://ecomputernotes.com/database-system/rdbms/phases-of-design-methodology

http://www.jkinfoline.com/steps-in-database-design.html

https://msdn.microsoft.com/en-us/library/b42dwsa3(v=vs.80).aspx

http://docs.oracle.com/cd/A87860_01/doc/server.817/a76994/logical.htm

The post Database Design Concepts appeared first on udayarumilli.com.

An Interview Experience with Microsoft

$
0
0

An Interview Experience with Microsoft R&D

udayarumilli_Interview_Experience_With_Microsoft_R&D_1

One of my close friend recently joined Microsoft R&D as a “Senior SDE (Software Development Engineer). He was very excited as it’s his dream place to work and he also wants to share his interview experience with Microsoft R&D. Here is the overall process in his own words.

I have got a call from Microsoft division and told me that they have got my profile from a job portal and wanted to check with me if I am looking for job change.

I said yes to proceed ahead. 3 days later I got a call from MS staff and I need to go through a general discussion. A telephonic interview scheduled and it went on for 45 Min. Discussed about my experience, projects, technology, current role etc.

Later 3 days I was informed that my profile got shortlisted and scheduled a face to face interview.

I have reached MS OFFICE on time, given my profile, got Microsoft visitor Badge and was being waiting for the interview call.

Someone from staffing team came to me and took me to the Interviewer cabin.

He introduced to me to the Interviewer.

Day – 1

Technical Interview – 1

udayarumilli_Interview_Experience_With_Microsoft_R&D_2

Interviewer: This is XXXXX, how are you doing?

Me: I am very fine and actually excited as this is the first time and I am giving my interview

Interviewer: That’s ok, you just relax we’ll start with a general discussion. Can you tell me about your experience?

Me: Have explained how I have been handling projects, how good I am in coding, designing and troubleshooting and given brief details about my current role and previous experience.

“He didn’t interrupt me till I finish this”.

Interviewer: What are you more interested in? Coding, designing or troubleshooting? You can choose anything where you feel more comfortable……

Me: Well, I am good in database development and MSBI stack.

Interviewer: Have you ever tried to know the SQL Server architecture?

Me: Yes! I have an idea on component level architecture.

Interviewer: Ohh that’s a good to know, can you tell me why a SQL developer or any DB developer should know the architecture? Means what is the use in getting into architecture details?

Me: I have given the answer why a sql developer or a DBA know architecture and taken an example. “A simple requirement comes and a SQL Developer needs to write a stored procedure”, I have explained how a person who understand architecture deals the requirement.

Interviewer: What is latch Wait?

Me: Answered!

Interviewer: What is column store index? Have you ever used it?

Me: Explained!

Interviewer: What is the difference between Delete and Truncate commands?

Me: I have given basic differences….

Interviewer: Is Delete and Truncate are DDL or DML?

Me: Delete is DML and TRUNCATE is DDL

Interviewer: Why TRUNCATE DDL?

Me: Given the answer

Interviewer: In application one of the page is giving timeout error, what is your approach to resolve this?

Me: I answered by giving detailed information on how to find the actual problem, have to quickly check and confirm with which layer the problem is with: Application, Webservice, Database services and we can provide resolution based on the problem.

Interviewer: Ok, you confirmed that Application and webservice are fine and the problem is with database server. Now what is your approach?

Me: Quickly check few primary parameters “Services”, “Memory”, “CPU”, “Disk”, “Network”, “Long Running Queries” using DMV’s, Performance monitor, performance counters, profiler trace or if you are using any third party tools.

Interviewer: Ok, all parameters are fine except there is a procedure which is causing this problem. This procedure was running fine till last day but it suddenly dropping the elapsed time. What are the possible reasons?

Me: A data feed might happen which causes a huge fragmentation, or nightly index maintenance might failed, statistics might be outdated, other process might be blocking

Interviewer: Ok, all these parameters are fine, no data feed happened, no maintenance failed, no blocking and statistics are also fine. What else might be the reason?

Me: May be due to a bad parameter sniffing

Interviewer: What is bad parameter sniffing?

Me: Explained what the bad parameter sniffing is

Interviewer: What is your approach incase of bad parameter sniffing?

Me: Answered the question

Interviewer: Ok, it’s not because of parameter sniffing, what else might be causing the slow running?

Me: Explained scenarios from my past experience. There are few “SET” options also might cause the sudden slow down issues.

Interviewer: All required SET options are already there, are there any other reasons you see?

Me: ……!!! No I am not getting any other clue

Interviewer: That’s ok no problem

Interviewer: Can you draw SQL Server Architecture?

Me: Have drawn a diagram and explained each component

Interviewer: You joined a new team. There are a lot of deadlocks in that application and you are asked to provide a resolution. What is your approach?

Me: Have explained 3 things, what is deadlock, how to identify the deadlock information and how to prevent from deadlocks and how to resolve deadlock situations, deadlock related traces, deadlock priority etc.

Interviewer: Ok, you found that there is an indexID that is causing the frequent deadlocks. Can you be able to find what type of index from ID?

Me: I have explained how to identify type of index from index id 0, 1, 255

Interviewer: Have you seen any lock type information in deadlock log?

Me: Yes I have! And explained about SH_, UP_, PR_ etc

Interviewer: Any idea about Lazywriter?

Me: Answered

Interviewer: How checkpoint is differentiated from Lazywriter?

Me: Explained in detail

Interviewer: Any idea about Ghost cleanup?

Me: Yes! Explained

Interviewer: Why to use a VIEW?

Me: Explained

Interviewer: Order By is not works in a view right, can you explain why?

Me: I know it does not work but not sure why it’s does not work properly

Interviewer: Have you ever worked on Indexed views?

Me: Yes

Interviewer: Can you tell me any two limitations of an Indexed view?

Me: Given limitations: Self Join, Union, Aggregate functions etc.

Interviewer: Have you ever used Triggers?

Me: Yes!

Interviewer: Can you explain the scenario’s when you used Triggers and Why Triggers used?

Me: Explained in detail

Interviewer: What are the alternatives for a Trigger?

Me: Given some scenarios ex: Using Output clause, implementing through procedures etc

Interviewer: What are the top 2 reasons to use a Trigger? And not to use a Trigger?

Me: Explained scenario’s where there is no alternative better than Triggers and also explained cases where performance impacted when using a Trigger.

Interviewer: You are asked to tune a SSIS package, what is your approach?

Me: Have explained all possible options and parameters we need to check and improve starting from connection managers, control flows, data flow, fast loads, transformations and other parameters.

Interviewer: Your manager asked you design and implement an ETL process to load a data feed to data warehouse on daily basis. What is your approach?

Me: Explained how I design a new process and implementation steps

Interviewer: Have you ever used stored procedures in source in any data flow?

Me: Yes I have used

Interviewer: Ok you need to execute a stored procedure and use the result set as source. What is your approach?

Me: Explained in detail on paper

Interviewer: Can you explain what are the basic reasons that results into incorrect meta data from stored procedure?

Me: Have explained various cases. Ex: When using dynamic queries, when SET NOCOUNT OFF etc.

Interviewer: Ok, how do you deal / map column names when your procedure is using a full dynamic query?

Me: Explained in detail. Ex: SET FMTONLY, using a script task etc.

Interviewer: Ok XXXXX I am done from my side, any questions?

Me: Asked about the position and responsibilities.

First Round Completed. After 15 min wait Staff member came to me and asked me to wait for the next round.

Technical Interview – 2

udayarumilli_Interview_Experience_With_Microsoft_R&D_3

I really felt very relaxed and confident as I given answers to almost all of the questions. After 20 more min I got a call from Interviewer, he came to lobby and called my name. I introduced him and he offered me a coffee. We both had a coffee and had to walk for 3 min to reach his cabin. Meantime we had a general conversation.

I already had a long discussion in first round and this guy seems like a cool guy. I felt more confident and here is the discussion happened in second round:

Interviewer: Hay, just a moment

Me: No problem! (He is started working on his laptop)

Interviewer: Can you tell me something about your experience and current role?

Me: Started giving details about my experience and current responsibilities

Interviewer: Hay, don’t mind as I am not looking at you, it doesn’t mean that I don’t here you, please continue

Me: Have explained about my education background

Interviewer: Ok XXXXX, we’ll start from a simple topic then we’ll go through complex things

Interviewer: On which basis you will define a new full text index?

Me: Well I am not really sure on this (This is really unexpected, I really felt bad as missing the first question, is this a simple question? )

Interviewer: Hmm, what are the T-SQL predicates involved in full text searching?

Me: I am not sure as I didn’t get a chance to work on full text indexing but I remember one predicate “Contains”. (From this question I got to understand that he is trying to test my stress levels)

Interviewer: Ok, there is a transaction which is updating a remote table using linked server. An explicit transaction is opened and the database instance is on Node-A. Now actual update statement is executed and a failover happened to Node-B before commit or rollback. What happens to the transaction?

Me: I tried to answer. I know how a transaction will be handled in case of a cluster node failover but not sure in this case as it’s a distributed transaction. (He is typically trying to fail me )

Interviewer: Ok, you know any enterprise environment is having various environments right, like Dev, Test, Stag, Prod. You have developed code in development environment and it needs to be moved to test. How the process happens?

Me: Explained about versioning tools and how we moved code between environments

Interviewer: Hmm, ok your project doesn’t have a budget to use a versioning tool and the entire process has to be go manual. Can you design a process to move code between environments?

Me: Explained various options. Like having a separate instance and replicating code changes and versioning using manual code backup processes.

Interviewer: If we need to move objects between environments how do you priorities objects?

Me: Explained! Like first move user defined data types, linked servers, synonyms, sequences, tables, views, triggers, procedures, functions etc.

Interviewer: Do you know what composite index is?

Me: Explained!

Interviewer: Ok, while creating composite index on which basis we need to choose order of the columns?

Me: Explained with an example.

Interviewer: What is the difference between Index selectivity and index depth?

Me: Explained! These two are totally different terms …..

Interviewer: Can you draw table diagram when a partition created on that?

Me: Could not answer

Interviewer: What is an IAM Page?

Me: Index Allocation Map…..

Interviewer: How could you identify query which is causing blocking

Me: I explained how we capture query information from sql handle

Interviewer: Ok, let’s say when you find the blocking query is a stored procedure and the procedure is having 10000 lines of code, can we be able to get the exact query which is causing a block?

Me: I remember there is a way that we can do using statement_start_offset,statement_end_offset from a DMV DM_EXEC_REQUESTS.

Interviewer: Ok, if the blocking query is from a nested stored procedure or with in a cursor, can we still be able to get the exact query details?

Me: I am not sure on this…

Interviewer: No problem, leave it

Interviewer: How are you comfortable with SQL coding?

Me: Yes! I am good in SQL coding

Interviewer: Can you write a query to split a comma separated string into a table rows

Me: I have written a query using CHARINDEX and SUBSTRING

Interviewer: Ok, now we need to implement the same logic for all incoming strings, can you explain how?

Me: I wrote a user defined function and using that function in main procedure

Interviewer: Can you tell me why you used a function here? What are the plus and minus in using functions?

Me: Explained about reducing code redundancy and may impact on performance

Interviewer: Ok, so there is a performance impact while dealing with large datasets if we use functions. Now write a query to do the same job without using a user defined function.

Me: Wrote a query by adding xml tags to the string and cross applying with nodes().

Interviewer: Is there any way that we can do?

Me: I am not getting…..

Interviewer: That’s ok! Can you write a query for reverse scenario? Form a comma separated string from column values.

Me: Written a query…

Interviewer: Have you ever used query hints?

Me: Yes! I have

Interviewer: Can you explain what are all those and in which scenario you used?

Me: Explained Recompile, No Expand, Optimize for etc

Interviewer: Ok, which is the best way, leaving optimization part to optimizer or controlling from developer end?

Me: Explained reasons for using query hints and finally justified that the optimizer is always right in a long run.

Interviewer: Ok, What are the other query hints you used?

Me: I know few more query hints but I didn’t use them, explained other query hints

Interviewer: Any idea about statistics in SQL Server?

Me: Yes! Explained about the statistics

Interviewer: Ok there are two options in updating statistics right, which is the best option?

Me: It’s not we can directly define the best option, it depends on the environment and data inserts / updates and explained how to choose the best option in both cases OLTP and OLAP

Interviewer: Ok, what is synchronous and asynchronous auto update statistics?

Me: Explained in detail.

Interviewer: How do you know if statistics are outdated?

Me: Given a query using STATS_DATE or from actual and estimated row counts or sp_spaceused may give wrong results.

Interviewer: How do we update statistics? In our production there is a huge traffic expected between 11 AM and 5 PM, is there any impact if we update statistics in between 11 and 5?

Me: Yes there might be chances that impact the procedure / query execution. It might speedup or slower the execution.

Interviewer: What do you know about ISOLATION levels?

Me: Explained about all available isolation levels

Interviewer: What is the default Isolation level?

Me: Read Committed

Interviewer: What is the problem with Read Committed isolation level?

Me: Non-repeatable Reads and Phantom Reads still exists in READ COMMITTED

Interviewer: Have you ever used “WITH NOLOCK” query hint?

Me: Yes! Explained about it

Interviewer: Ok, so do you mean that when specifies NO LOCK, there is no lock is being issued on that table / row / page?

Me: No! It still issues a lock but that lock is totally compatible with other low level locks (UPDATE, EXCLUSIVE).

Interviewer: Are you sure that is true? Can you give me an example?

Me: Yes I am sure let me give you an example. Let’s say we have a table T with a million rows. Now I issued a command “SELECT * FROM T” and immediately in other window I executed a command “DROP TABLE T”. DROP TABLE will wait till the first executed select statement is completed. So it is true that there is lock on that table which is not compatible with the schema lock.

Interviewer: What is the impact “DELETE FROM T WITH (NOLOCK)”?

Me: Database engine ignores NOLOCK hint when using in from clause with UPDATE and DELETE

Interviewer: What is Internal and external fragmentation?

Me: Explained in detail.

Interviewer: How do you identify internal and external fragmentation?

Me: Using INDEX PHYSICAL STATS and explained about AVG FRAGMENTATION %, AVG PAGE SPACE USED.

Interviewer: There is a simple relation called “Employee”. Columns Emp_ID, Name, Age. Clustered index on Emp_ID, Non-Clustered index on Age. Now can you draw the Clustered and Non-Clustered index diagram? Fill the values for Leaf level and non-leaf nodes. Let’s assume there are total 20 records and id’s are sequential starting from 1001 to 1020 and Age Min Age 25, Max Age 48.

Me: Drawn a diagram by filling all employee ID’s in clustered index. Also Drawn non clustered index by filling all clustered index key values.

Interviewer: Has given 3 queries and asked me to guess what kind of index operation (SCAN / SEEK) in execution plan

Me: Given my answers based on the key columns used and kind of operation (“=”, “LIKE”)

Interviewer: What is a PAD INDEX option? Have you ever used this option anywhere?

Me: While specifying fill factor if we mention PAD INDEX same fill factor applies to NON-LEAF level nodes as well.

Interviewer: Can you write a query for the below scenario.

Previous_Emp_ID, Current_Row_Emp_ID, Next_Emp_ID

Me: Given the answer “Using LEAD and LAG”

Interviewer: How do you write query in SQL Server 2005?

Me: Written query using “ROW_NUMBER()”

Interviewer: Write the same query in SQL Server 2000

Me: We can do this but have to use a temp table with Identity column to hold the required data and by self joining the temp table we can get the required resultset.

Interviewer: Can you be able to write this without temp table?

Me: Am not getting …….

Interviewer: I am done from my side do you have any questions foe me?

Me: Actually nothing, I have got clarity on job roles and responsibilities.

Best thing I observed is “Each Interviewer passes his feedback to the next level” based on that feedback you will be asked questions.

But this is the longest interview I never attended. Mostly he tried to put me in trouble and tried to fail me to answer the question. What I understand, in second round my stress levels were tested almost for each question he was very strong and cross checking and confirming back, I could hear “Are you sure?”, “Can you re think?”, “How do you confirm?” etc.

I had some water and was waiting for the feedback. One of the staff members came to me and told me that I need to go through one more discussion. I was very happy to hear and have got a call from Interviewer after 45 min.

Technical Interview – 3

udayarumilli_Interview_Experience_With_Microsoft_R&D_4

Interviewer: Hay XXXXX, How are you?

Me: I am totally fine……thanks!

Interviewer: How is the day today?

Me: Yah it’s really interesting and I am so excited.

Interviewer: I am XXXX and working on BI deliverables, can you explain your current experience

Me: Explained!

Interviewer: Why did you choose database career path?

Me: Given the details how I got a chance to be in database career path.

Interviewer: Have you ever design a database?

Me: Yes! Explained the project details

Interviewer: Ok what are the phases in designing a new database?

Me: Explained in detail! Conceptual, Logical, Physical design.

Interviewer: Giving you a project requirement, can you be able to design a new database?

Me: Sure, will do that.

Interviewer: Take a look at these sheets, understand the requirement, choose a module, create data flow map, design OLTP tables and OLAP tables. I am giving you an hour time. Please feel free to ask me if you have any questions.

Me: Sure, we’ll ask you if any questions. For me it took 1 and half hour to understand and to design tables for a simple module.

Interviewer: Can you explain the data flow for this module?

Me: Explained!

Interviewer: Can you pull these 5 reports from the designed tables?

Me: I wrote queries to fulfil all those report requirements. Luckily I could be able to pull all reports from the created schema.

Interviewer: Can you tell me the parameters to be considered in designing a new database?

Me: Explained various parameters needs to be considered category wise. For example Performance, Scalability, Security, Integrity etc (1 or 2 examples for each category).

Interviewer: What is Autogrow option? How do you handle this?

Me: Explained about Autogrow option and explained 2 modes and their uses

Interviewer: You worked with versions from SQL Server 2000 to 2012 correct?

Me: Yes!

Interviewer: Can you tell me the features which are in 2000 and not in 2012?

Me: This is strange, you mean to say deprecated features which are in 2000 not in 2012?

Interviewer: Exactly

Me: Well, I believe 90% things got changed from 2000 to 2012, anyways will give you answer “Enterprise Manager” , ”Query Analyzer”, “DTS Packages”, “System Catalogs”, “Backup Log with Truncate Only” etc.

Interviewer: Can you tell me top 5 T-SQL features that you most like which are in 2012 and not in 2000?

Me: Sequences, TRY / CATCH / THROW, OFFSET / FETCH, RANK / DENSE_RANK / NTILE, CTE, COLUMN STORE INDEX etc.

Interviewer: Have you ever worked on NO SQL database?

Me: No I didn’t get a chance to work on No SQL

Interviewer: Any idea about Bigdata?

Me: Explained about Big data and how it is getting used in analytics. (I expected this question and got prepared). Talked on Document DB, Graph DB, Key based, Map Reduce etc

Interviewer: Ok, will you work on No SQL database if you will be given a chance?

Me: Sure will do.

Interviewer: What is Microsoft Support to Bigdata in latest SQL Server?

Me: Explained details on ODBC drivers and plugin for HIVE and HADOOP (I prepared on this as well)

Interviewer: What is the maximum database size you used?

Me: I answered

Interviewer: What are the top parameters you suggest while designing a VLDB?

Me: Parallelism, Partitions, Dimensional / De-Normalizing, Effective Deletes, Statistics Manual Update

Interviewer: Fair enough, any questions?

Me: Yes I have one, in a very short span we can see SQL Server 2016 is in pipeline, may I know the major features in sql server 2016.

Interviewer: (Explained!) Any other questions?

Me: Nothing, but thanks.

Interviewer: Thanks for the patience, please be wait in lobby.

I sat down for 10 min, had a coffee, I was too tired. I had to wait for another 30 min and then one of the staff members told me that I can expect a call on next week. I drove to the home and had long bath . I knew I hadn’t been perfect in answering all questions and but I just gone through 9 hr’s of interview which is a longest interview in my experience.

On Monday I was so curious to know about the result and was being waiting for the call. I was disappointment as I didn’t get call till Tuesday. Finally I have got a call on Wednesday by saying that there is a discussion scheduled for Friday.

Day -2

Technical Interview – 4

udayarumilli_Interview_Experience_With_Microsoft_R&D_5

I reached MS campus and then took a local shuttle to reach the R&D building. I was so curious and wanted to know what kind of a discussion it was. HR told me that there might be another round of technical discussion followed by a behavioral discussion.

I have got a visitor badge and was being waiting in visitor lobby. Interviewer came to me and took me to his cabin.

Interviewer: Hi buddy how re you today?

Me: Hah, am very fine thanks, how are you?

Interviewer: Me pretty good man, can you introduce yourself?

Me: Explained (the same for the 4th time)

Interviewer: Ok what is your experience in SQL?

Me: Given detailed view on sql development experience

Interviewer: Can you take this marker. I’ll give you few scenarios you just need to write a query. You can use any SQL language where you feel more comfortable.

Me: Sure (I took the marker and went to the board)

Interviewer: We have an employee table and columns are (EmpID, Name, Mgr_ID, Designation). I’ll give you input manager name “Thomos”. Now I want you to write a query to list all employees and their managers who reports to the manager “Thomos” directly or indirectly.

Me: Have asked some questions like what is the MgrID for top level employees Ex: CEO and how do we need to display NULL values etc. Wrote a query using recursive CTE.

Interviewer: Write a SELECT query syntax and place all clauses that you know

Me: Wrote syntax, included all possible clauses (Distinct, TOP, WHERE, GROUP BY, ORDER BY, Query Hints etc)

Interviewer: I have a table called T. In that each record should be inserted twice. Now the requirement is write a query to remove rows which are more than twice?

Me: Have written a query using ROW_NUMBER in CTE.

Interviewer: How do you do this in SQL Server 2000?

Me: Given a query that can run in 2000

Interviewer: Have you ever used covering index?

Me: Yes in lot of situations.

Interviewer: Ok, why covering index and when we need to use?

Me: Explained

Interviewer: Ok, then what is the difference between Covering Index, indexed view and composite index?

Me: Explained in detail, how and when we need to use those with examples

Interviewer: What is top advantage in using the covering index?

Me: It’s performance gain for covering index usage queries, mostly on analytics databases

Interviewer: What is the disadvantage in using covering index?

Me: Two reasons one is “Slow down INSERT/UPDATE/DELETE” and the other one is “Extra Space”

Interviewer: Ok, what is the main impact in adding more columns in covering index include list?

Me: It increases the Index Depth and then need to traverse the entire tree etc…

Interviewer: That’s fair enough, what exactly the Index Depth is?

Me: Explained!

Interviewer: We have 30 distributed systems located across the globe. All are maintaining sales information. Each system follows its own schema. Now I am assigning you to build a centralized data warehouse. What is your approach? How do you design the process?

Me: Took 15 min and explained with a diagram. Created three tier architecture using “Source”, “Stag”, “Destination”. Explained data flow between source to destination through staging. Discussed all on Technical Mapping, Data Flow, package design, Integrity, Performance, Business Rules, Data Cleansing, Data validations, final loading and reporting.

Interviewer: That’s fair enough.

Interviewer: You are assigned to have a look at a slow performing stored procedure. How do you handle that?

Me: Explained in detail, execution plan, statistics etc

Interviewer: Which is better Table Scan or Index Seek?

Me: Explained, it depends on data / page count. Sometimes Table Scan is better than index seek.

Interviewer: Can you give me a scenario where Table Scan is better than index seek?

Me: I have given an example.

Interviewer: One of the SSRS reports is taking long time, what is your approach and what are all the options you look for?

Me: Explained step by step to tune a SSRS report. Discussed about local queries, data source, snapshot, cached report etc

Interviewer: Data feeds are loading into a data warehouse on daily basis, requirement is once data load happens a report has to be generated and the report needs to be delivered to email for a specific list of recipients. What is your approach?

Me: We can use the data driven subscriptions by storing all required information in sql server database and it can help us in dynamic report configurations.

Interviewer: What is the most complex report you designed?

Me: Explained

Interviewer: I have two databases one is holding transactional tables and the other one is holding master relations. Now we need master table data with transactional aggregations in a single report. How do you handle this?

Me: We can use Lookup (He asked me in more detail and explained in detail)

Interviewer: Do you have idea on Inheritance concept? Can you give me an example?

Me: Have taken an entity and given example for inheritance

Interviewer: Can you write a code to showcase inheritance example? You can use Dotnet or Java

Me: Written an example

Interviewer: Can you explain what semantic model means is?

Me: Explained

Interviewer: What is SQL injection?

Me: Explained with examples

Interviewer: Your system is compromised and confirmed that there is sql injection attack. What is your immediate action plan?

Me: Explained step by step. Remove access to Database, Down application, data analysis, finding the effected location, reverse engineering, fixing the problem / restoring / rollback , make db online, allow application users.

Interviewer: That’s fine. What are all possible reasons that cause tempdb full?

Me: Explained! Internal, Version Store, User defined objects

Interviewer: You have written a stored procedure as business required. What are all the validations / tests you do before releasing it to the Production?

Me: Explained about code reviews, performance, security, integrity, functionality, Unit, Load, Stress etc

Interviewer: That’s fair enough. You have any questions?

Me: I am actually clear with the job roles and responsibilities; may I know on which area we are going to work? Mean I know usually in any Product organization there are various product engineering groups. I am curious to know which area we are going to work?

Interviewer: Well…. (He explained the detail).

Interviewer: It’s nice talking to you, please be wait outside you may need to go through one more discussion.

I went to visitor lobby, After 10 min got a call from staff member told me that I need to meet a Manager. I need to catch up a shuttle to reach the next building. I reached the venue and was waiting for the Manager. He came to me and took me to the cafeteria. We had a general discussion and he offered me lunch. We both had lunch it took 20 min. He looks very cool, I actually forgot that I was going to attend an interview . After other 10 min we reached his cabin.

Actually I did a lot of home work to remain all successful, failure stories and experiences and prepared well as I have an idea about behavioral discussion. I am sure we cannot face these questions without proper preparations.

Final Interview – 5

udayarumilli_Interview_Experience_With_Microsoft_R&D_6

Interviewer: Hi XXXX, I am XXXX, development manager, can you tell me about your background?

Me: Explained!

Interviewer: Reporting server is not responding and this is because of low memory issue. Any idea about memory configurations for SSRS server?

Me: Explained about Safety Margin, Threshold, Workingset Maximum, Workingset Minimum

Interviewer: What is application pooling?

Me: Explained! Advantages, threshold and timeout issues

Interviewer: Any idea about design patterns?

Me: Explained! Creational, Structural, Behavioral and Dot net

Interviewer: From your personal or professional life what are your best moments in your life?

Me: Answered!

Interviewer: Can you tell me a situation that you troubleshoot a performance problem?

Me: Explained!

Interviewer: Let’s say you are working as a business report developer you have got an email from your manager saying that its urgent requirement. You got a call from AVP (Your big boss) and gave you a requirement and said that should be the priority. How do you handle the situation?

Me: Answered! Justified

Interviewer: You are working for a premium customer and planned for a final production deployment, you are the main resource and 70 % work depends on you. Suddenly you got a call and there is an emergency at your home. What is your priority and how do you handle the situation?

Me: Answered! Justified.

Interviewer: How do you give time estimates?

Me: Explained! Taken an example and explained how I give time estimates.

Interviewer: What is your biggest mistake in your personal or professional life?

Me: (It’s really tough) given a scenario where I failed.

Interviewer: Did you do any process improvement which saved money to your organization?

Me: Yes! I did. Explained the lean project I initiated and completed successfully.

Interviewer: For a programmer / developer what is most important, expertise in technology or domain / business?

Me: Answered! Justified

Interviewer: Can you tell me you initiated a solution something innovative?

Me: Answered!

Interviewer: Have you ever stuck in a situation where you missed the deadline?

Me: Yes it happened! Explained how I managed the situation.

Interviewer: Ok, your manager asked to design, develop and release a module in 10 days. But you are sure it can‘t be done in 10 days. How do you convince him?

Me: Answered! How I do convince.

Interviewer: From your experience what are the top 5 factors to be a successful person?

Me: Answered!

Interviewer: If you get a chance to rewind any particular moment in your life what it would be?

Me: Answered!

Interviewer: If I give you chance to choose one of the positions from Technology Lead, Lead Business Analyst, customer Engagement Lead, what is your choice? Remember we don’t bother about your previous experience will train you on your chosen role.

Me: Answered! And justified

Interviewer: Why you want to join Microsoft?

Me: Answered! Justified

Interviewer: Since you said you have been using Windows and SQL Server and other products from Microsoft, what is the best in Microsoft?

Me: Operating System, yes it is the leader and driving the software world. As per the statistics 95% people start learning computers using Windows OS.

Interviewer: Just imagine if you are given a chance to develop a new product for Microsoft what you develop? Any plans around?

Me: Will build a new R&D team and design a mobile device that works based on artificial intelligence

Interviewer: That’s awesome.

Interviewer: It’s nice talking to you, please wait outside, will call you in 10 min.

After 10 Min wait, a staff member came to me and told me that I may expect a call in 2 or 3 days.

3 Days later when I was in office I missed a call from recruiter . I got a message that she has an update for me. I called her back that when she gave me the good news that they are going to offer. Within next half an hour I received an email with the salary breakup and formal offer letter. I didn’t expect that as we didn’t have any discussion on salary. That was a super deal for which no one can say “NO”. I came out of the room I was like dancing finally 4 weeks of hard work, endless hours of practice it had all paid off. I called my parents and close friends and that’s one of the best moments in my life. Hope this interview experience with Microsoft article will you (SQL, MSBI, SQL Server developers) who wants to prepare for Microsoft interview.

udayarumilli_Microsoft_Interview_Questions

The post An Interview Experience with Microsoft appeared first on udayarumilli.com.


Happy New Year 2016

$
0
0

Happy New Year 2016

Thank you to all our Readers, Authors, Advertisers and Partners for their invaluable support over the past 12 months, we really couldn’t do what we do without you all.

Team udayarumilli.com Wishing you a Happy New Year with the hope that you will have many blessings in the Brand New Year 2016 to come. Write it on your heart that every day is the best day in the year

List-out your priorities (Health, Family and Profession) today, make some resolutions and achieve your Goal.

happy_new_year_2016_! happy_new_year_2016_2 happy_new_year_2016_3 happy_new_year_2016_4 happy_new_year_2016_5 happy_new_year_2016_6 happy_new_year_2016_7 happy_new_year_2016_8 happy_new_year_2016_9 happy_new_year_2016_10 happy_new_year_2016_11 happy_new_year_2016_12 happy_new_year_2016_13 happy_new_year_2016_14 happy_new_year_2016_15 happy_new_year_2016_16 happy_new_year_2016_17 happy_new_year_2016_18 happy_new_year_2016_19 happy_new_year_2016_20 happy_new_year_2016_21 happy_new_year_2016_22 happy_new_year_2016_23 happy_new_year_2016_24 happy_new_year_2016_25 happy_new_year_2016_26 happy_new_year_2016_27 happy_new_year_2016_28 happy_new_year_2016_29

Happy New Year 2016

References

The post Happy New Year 2016 appeared first on udayarumilli.com.

sql dba alwayson interview questions – 1

$
0
0

AlwaysOn Availability Groups

Interview Questions and Answers – Part 1

udayarumilli_sqlserver_alwayson_ag_Part1

sql dba alwayson interview questions – 1

One of the best known feature introduced in SQL Server 2012 is “AlwaysOn” which makes use of existing HA/DR features and provide additional features like Availability Groups. This article is for SQL Server DBA’s who are preparing for interviews, this include basic and advanced level sql dba AlwaysOn Interview questions -1.

sql dba alwayson interview questions – 1

Q. What is AlwaysOn in SQL Server?

Ans:

AlwaysOn Availability Groups feature is a high-availability and disaster-recovery solution that provides an enterprise-level alternative to database mirroring. Introduced in SQL Server 2012, AlwaysOn Availability Groups maximizes the availability of a set of user databases for an enterprise. An availability group supports a failover environment for a discrete set of user databases, known as availability databases that fail over together. An availability group supports a set of read-write primary databases and one to four sets of corresponding secondary databases. Optionally, secondary databases can be made available for read-only access and/or some backup operations.

Q. What are Availability Groups?

Ans:

A container for a set of databases, availability databases, that fails over together. Lets consider a scenario where a set of 3 databases are interlinked based on application requirement. Now we need to setup HA for these 3 databases. If we choose mirroring we need to have a separate mirroring setup for these 3 databases where as in AlwaysOn Availability Groups easier the job by grouping all these 3 databases.

Q. What are Availability Databases?

Ans:

A database that belongs to an availability group. For each availability database, the availability group maintains a single read-write copy (the primary database) and one to four read-only copies (secondary databases).

Q. Which SQL/Windows Server Editions include AlwaysOn Availability Group functionality?

Ans:

SQL Server Enterprise Edition and Windows Enterprise Edition

Q. How many replicas can I have in an AlwaysOn Availability Group?

Ans:

Total 5-1 Primary and up to 4 Secondary’s.

Q. How many AlwaysOn Availability Groups can be configured in Always ON?

Ans:

Up to 10 availability groups is the recommendation, but it’s not enforced

Q. How many databases can be configured in an AlwaysOn Availability Group?

Ans:

Up to 100 is the recommendation, but it’s not enforced

Q. What are the Restrictions on Availability Groups?

Ans:

  • Availability replicas must be hosted by different nodes of one WSFC cluster
  • Unique availability group name: Each availability group name must be unique on the WSFC cluster. The maximum length for an availability group name is 128 characters.
  • Availability replicas: Each availability group supports one primary replica and up to four secondary replicas. All of the replicas can run under asynchronous-commit mode, or up to three of them can run under synchronous-commit mode.
  • Maximum number of availability groups and availability databases per computer: The actual number of databases and availability groups you can put on a computer (VM or physical) depends on the hardware and workload, but there is no enforced limit. Microsoft has extensively tested with 10 AGs and 100 DBs per physical machine.
  • Do not use the Failover Cluster Manager to manipulate availability groups.

Q What are the minimum requirements of a database to be part of the Always ON Availability Group?

Ans:

  • Availability groups must be created with user databases. Systems databases can’t be used.
  • Databases must be read-write. Read-only databases aren’t supported.
  • Databases must be multiuser databases.
  • Databases can’t use the AUTO_CLOSE feature.
  • Databases must use the full recovery model, and there must be a full backup available.
  • A given database can only be in a single availability group, and that database can’t be configured to use database mirroring.

Q. How many read-write and read only databases replica can be configure in SQL Server 2012 and 2014?

Ans:

  • SQL Server 2012 supported a maximum of four secondary replicas.
  • With SQL Server 2014, AlwaysOn Availability Groups now supports up to eight secondary replicas.

Q. Is it possible to setup Log Shipping on a database which is part of Availability Group?

Ans:

Yes, it can be configured.

Q. Is it possible to setup Replication on a database which is part of Availability Group?

Ans:

Yes, it is possible.

Q. Does FILESTEAM, Change Data Capture and Database Snapshot supported are supported by Availability Group?

Ans:

Yes, all these features are supported by AlwaysOn Availability Group.

Q. Can system database participate in AG?

Ans:

No.

Q: What version of Windows do I need for AlwaysOn AGs?

Ans:

We highly recommend Windows Server 2012R2 and above.

Q: Can I have different indexes or tables on my replicas?

Ans:

No, the replica database contents will be exactly the same as the primary.

Q. What is Availability mode in Always ON?

Ans:

The availability mode is a property of each availability replica. The availability mode determines whether the primary replica waits to commit transactions on a database until a given secondary replica has written the transaction log records to disk (hardened the log). AlwaysOn supports below modes:

Asynchronous-commit mode: Primary replica commits the transaction on a database without waiting for the conformation from the secondary replica.

Synchronous-commit mode: Primary replica does not commit the transaction on a database until it gets the confirmation (written the transaction log records to disk on secondary) from secondary replica.

Q. Do we need SQL Server Cluster instances to configure Always ON?

Ans:

No we don’t need SQL Server Cluster instances to configure Always ON.

Q. Do we need shared storage to configure Always ON?

Ans:

No, we don’t need shared storage.

Q. What is the Difference between Asynchronous-commit mode and Synchronous-commit mode?

Ans:

Asynchronous-commit mode:

An availability replica that uses this availability mode is known as an asynchronous-commit replica. Under asynchronous-commit mode, the primary replica commits transactions without waiting for acknowledgement that an asynchronous-commit secondary replica has hardened the log. Asynchronous-commit mode minimizes transaction latency on the secondary databases but allows them to lag behind the primary databases, making some data loss possible.

Synchronous-commit mode:

An availability replica that uses this availability mode is known as a synchronous-commit replica. Under synchronous-commit mode, before committing transactions, a synchronous-commit primary replica waits for a synchronous-commit secondary replica to acknowledge that it has finished hardening the log. Synchronous-commit mode ensures that once a given secondary database is synchronized with the primary database, committed transactions are fully protected. This protection comes at the cost of increased transaction latency.

Q. What is called Primary replica?

Ans:

The availability replica that makes the primary databases available for read-write connections from clients is called Primary Replica. It sends transaction log records for each primary database to every secondary replica.

Q. What is called Secondary replica?

Ans:

An availability replica that maintains a secondary copy of each availability database, and serves as a potential failover targets for the availability group. Optionally, a secondary replica can support read-only access to secondary databases can support creating backups on secondary databases.

Q. What is Availability Group listener?

Ans:

Availability Group Listener is a server name to which clients can connect in order to access a database in a primary or secondary replica of an AlwaysOn availability group. Availability group listeners direct incoming connections to the primary replica or to a read-only secondary replica.

Q. What are Readable Secondary Replicas?

Ans:

The AlwaysOn Availability Groups active secondary capabilities include support for read-only access to one or more secondary replicas (readable secondary replicas). A readable secondary replica allows read-only access to all its secondary databases. However, readable secondary databases are not set to read-only. They are dynamic. A given secondary database changes as changes on the corresponding primary database are applied to the secondary database.

Q. What are the benefits of Readable Secondary Replicas?

Ans:

Directing read-only connections to readable secondary replicas provides the following benefits:

  • Offloads your secondary read-only workloads from your primary replica, which conserves its resources for your mission critical workloads. If you have mission critical read-workload or the workload that cannot tolerate latency, you should run it on the primary.
  • Improves your return on investment for the systems that host readable secondary replicas.

In addition, readable secondaries provide robust support for read-only operations, as follows:

  • Temporary statistics on readable secondary database optimize read-only queries. For more information, see Statistics for Read-Only Access Databases, later in this topic.
  • Read-only workloads use row versioning to remove blocking contention on the secondary databases. All queries that run against the secondary databases are automatically mapped to snapshot isolation transaction level, even when other transaction isolation levels are explicitly set. Also, all locking hints are ignored. This eliminates reader/writer contention.

Q. How many synchronous secondary replicas can I have?

Ans:

We can have up to 2 synchronous replicas, but we are not required to use any. We could run all Secondaries in asynchronous mode if desired

Q. Can we use a secondary for reporting purpose?

Ans:

Yes. An active secondary can be used to offload read-only queries from the primary to a secondary instance in the availability group.

Q. Can we use secondary replicas to take the db backups?

Ans:

Yes. An active secondary can be used for some types of backups

Q. What all types of DB backups are possible on Secondary Replicas?

Ans:

  • BACKUP DATABASE supports only copy-only full backups of databases, files, or filegroups when it is executed on secondary replicas. Note that copy-only backups do not impact the log chain or clear the differential bitmap.
  • Differential backups are not supported on secondary replicas.

Q. Can we take Transaction log backups on the secondary replicas?

Ans:

Yes, we can take transaction log backups on the secondary replicas without COPY_ONLY option.

Q. What is “Failover” in Always ON?

Ans:

Within the context of a session between the primary replica and a secondary replica, the primary and secondary roles are potentially interchangeable in a process known as failover. During a failover the target secondary replica transitions to the primary role, becoming the new primary replica. The new primary replica brings its databases online as the primary databases, and client applications can connect to them. When the former primary replica is available, it transitions to the secondary role, becoming a secondary replica. The former primary databases become secondary databases and data synchronization resumes.

Q. How many types of Failover are supported by Always ON?

Ans:

Three forms of failover exist—automatic, manual, and forced (with possible data loss). The form or forms of failover supported by a given secondary replica depends on its availability mode.

Q. What are the Failover types supported by Synchronous-commit mode?

Ans:

  • Planned manual failover (without data loss)
  • Automatic failover (without data loss)

Q. What is planned manual failover?

Ans:

A manual failover occurs after a database administrator issues a failover command and causes a synchronized secondary replica to transition to the primary role (with guaranteed data protection) and the primary replica to transition to the secondary role. A manual failover requires that both the primary replica and the target secondary replica are running under synchronous-commit mode, and the secondary replica must already be synchronized.

Q. What is Automatic failover?

Ans:

An automatic failover occurs in response to a failure that causes a synchronized secondary replica to transition to the primary role (with guaranteed data protection). When the former primary replica becomes available, it transitions to the secondary role. Automatic failover requires that both the primary replica and the target secondary replica are running under synchronous-commit mode with the failover mode set to “Automatic”. In addition, the secondary replica must already be synchronized, have WSFC quorum, and meet the conditions specified by the flexible failover policy of the availability group.

Q. Can we configure Automatic failover of Availability Groups with SQL Server Failover cluster instances?

Ans:

SQL Server Failover Cluster Instances (FCIs) do not support automatic failover by availability groups, so any availability replica that is hosted by an FCI can only be configured for manual failover.

Q. What are the Failover types supported by under asynchronous-commit mode?

Ans:

Only form of failover is forced manual failover (with possible data loss), typically called forced failoverForced failover is considered a form of manual failover because it can only be initiated manually. Forced failover is a disaster recovery option. It is the only form of failover that is possible when the target secondary replica is not synchronized with the primary replica.

Q. What is the use of AlwaysOn Dashboard?

Ans:

Database administrators use the AlwaysOn Dashboard to obtains an at-a-glance view the health of an AlwaysOn availability group and its availability replicas and databases in SQL Server 2012. Some of the typical uses for the AlwaysOn Dashboard are:

  • Choosing a replica for a manual failover.
  • Estimating data loss if you force failover.
  • Evaluating data-synchronization performance.
  • Evaluating the performance impact of a synchronous-commit secondary replica

References:

We should be thankful for the authors who has given an excellent explanations on AlwaysOn topics from SQL Server 2012 and SQL Server 2014. Here is the list of top references:

Mssqltips.com

Msdn.microsoft.com

Dbathings.com

Brentozar.com

Simple-talk.com

Technet.microsoft.com

Related posts:

SQL DBA AlwaysOn scenario based interview questions – 3

sql dba alwayson interview questions – 2

 

The post sql dba alwayson interview questions – 1 appeared first on udayarumilli.com.

sql dba alwayson interview questions – 2

$
0
0

SQL DBA AlwaysOn

Interview Questions and Answers – 2

sql dba alwayson interview questions - 2

One of the best known feature introduced in SQL Server 2012 is “AlwaysOn” which makes use of existing HA/DR features and provide additional features like Availability Groups. This article is for SQL Server DBA’s who are preparing for interviews, this include basic and advanced level SQL DBA AlwaysOn Interview questions part-2.

SQL DBA AlwaysOn Interview Questions  – 2

Q. What is availability group wizard?

Ans:

Availability Group Wizard is a GUI using SQL Server Management Studio to create and configure an AlwaysOn availability group in SQL Server 2012.

Q. Suppose primary database became in suspect mode. Will AG have failover to secondary replica?

Ans:

Issues at the database level, such as a database becoming suspect due to the loss of a data file, deletion of a database, or corruption of a transaction log, do not cause an availability group to failover.

Q. Can we have two primary availability replica?

Ans:

No, it is not possible.

Q. Does AG support automatic page repair for protection against any page corruption happens?

Ans:

Yes, It automatically takes care of the automatic page repair.

Q. How to add a secondary database from an availability group using T-SQL?

Ans:

ALTER DATABASE Db1 SET HADR AVAILABILITY GROUP = <AGName>;

Q. How to remove a secondary database from an availability group?

Ans:

ALTER DATABASE <DBName> SET HADR OFF;

Q. SQL Server 2012 AlwaysOn supports encryption and compression?

Ans:

SQL Server 2012 AlwaysOn Availability Group supports row and page compression for tables and indexes, we can use the data compression feature to help compress the data inside a database, and to help reduce the size of the database. We can use encryption in SQL Server for connections, data, and stored procedures; we can also perform database level encryption: Transparent data encryption (TDE). If you use transparent data encryption (TDE), the service master key for creating and decrypting other keys must be the same on every server instance that hosts an availability replica for the availability group

Q. Does AG support Bulk-Logged recovery model?

Ans:

No, it does not.

Q. Can a database belong to more than one availability group?

Ans:

No.

Q. What is session timeout period?

Ans:

Session-timeout period is a replica property that controls how many seconds (in seconds) that an availability replica waits for a ping response from a connected replica before considering the connection to have failed. By default, a replica waits 10 seconds for a ping response. This replica property applies only the connection between a given secondary replica and the primary replica of the availability group.

Q. How to change the Session Timeout period?

Ans:

ALTER AVAILABILITY GROUP <AG Name>

MODIFY REPLICA ON ‘<Instance Name>’ WITH (SESSION_TIMEOUT = 15);

Q. What are different synchronization preferences are available?

Ans:

As part of the availability group creation process, We have to make an exact copy of the data on the primary replica on the secondary replica. This is known as the initial data synchronization for the Availability Group.

Q. How many types of Data synchronization preference options are available in Always ON?

Ans:

There are three options- Full, Join only, or Skip initial data synchronization.

Q. Is it possible to run DBCC CHECKDB on secondary replicas?

Ans:

Yes.

Q. Can I redirect the read-only connections to the secondary replica instead of Primary replica?

Ans:

Yes, we can specify the read_only intent in the connection string and add only secondaries (not the primary) to the read_only_routing list. If you want to disallow direct connections to the primary from read_only connections, then set its allow_connections to read_write.

Q. If a DBA expands a data file manually on the primary, will SQL Server automatically grow the same file on secondaries?

Ans:

Yes! It will be automatically expanded on the Secondary replica.

Q. Is it possible to create additional indexes on read-only secondary replicas to improve query performance?

Ans:

No, it is not possible.

Q. Is it possible to create additional statistics on read-only secondaries to improve query performance?

Ans:

No. But we can allow SQL Server to automatically create statistics on read-only secondary replicas.

Q. Can we manually fail over to a secondary replica?

Ans:

Yes. If the secondary is in synchronous-commit mode and is set to “SYNCHRONIZED” you can manually fail over without data loss. If the secondary is not in a synchronized state then a manual failover is allowed but with possible data loss

Q. What is read intent option?

Ans:

There are two options to configure secondary replica for running read workload. The first option ‘Read-intent-only’ is used to provide a directive to AlwaysOn secondary replica to accept connections that have the property ApplicationIntent=ReadOnly set. The word ‘intent’ is important here as there is no application check made to guarantee that there are no DDL/DML operations in the application connecting with ‘ReadOnly’ but an assumption is made that customer will only connect read workloads.

Q. Does AlwaysOn Availability Groups repair the data page corruption as Database Mirroring?

Ans:

Yes. If a corrupt page is detected, SQL Server will attempt to repair the page by getting it from another replica.

Q. What are the benefits of Always on feature?

Ans:

  • Utilizing database mirroring for the data transfer over TCP/IP
  • providing a combination of Synchronous and Asynchronous mirroring
  • providing a logical grouping of similar databases via Availability Groups
  • Creating up to four readable secondary replicas
  • Allowing backups to be undertaken on a secondary replica
  • Performing DBCC statements against a secondary replica
  • Employing Built-in Compression & Encryption

Q. How much network bandwidth will I need?

Ans:

For a really rough estimate, sum up the amount of uncompressed transaction log backups that you generate in a 24-hour period. You’ll need to push that amount of data per day across the wire. Things get trickier when you have multiple replicas – the primary pushes changes out to all replicas, so if you’ve got 3 replicas in your DR site, you’ll need 3x the network throughput. Calculating burst requirements is much more difficult – but at least this helps you get started.

Q. What’s the performance overhead of a synchronous replica?

Ans:

From the primary replica, ping the secondary, and see how long (in milliseconds) the response takes. Then run load tests on the secondary’s transaction log drive and see how long writes take. That’s the minimum additional time that will be added to each transaction on the primary. To reduce the impact, make sure your network is low-latency and your transaction log drive writes are fast.

Q. How far behind will my asynchronous replica be?

Ans:

The faster your network and your servers are, and the less transactional activity you have, the more up-to-date each replica will be. I’ve seen setups where the replicas are indistinguishable from the primary. However, I’ve also seen cases with underpowered replicas, slow wide area network connections, and heavy log activity (like index maintenance) where the replicas were several minutes behind.

Q. What’s the difference between AGs in SQL 2012 and SQL 2014?

Ans:

SQL Server 2014’s biggest improvement is that the replica’s databases stay visible when the primary drops offline – as long as the underlying cluster is still up and running. If I have one primary and four secondary replicas, and I lose just my primary, the secondaries are still online servicing read-only queries. (Now, you may have difficulties connecting to them unless you’re using the secondary’s name, but that’s another story.) Back in SQL 2012, when the primary dropped offline, all of the secondaries’ copies immediately dropped offline – breaking all read-only reporting queries.

Q: How do I monitor AlwaysOn Availability Groups?

Ans:

That’s rather challenging right now. Uptime monitoring means knowing if the listener is accepting writeable connections, if it’s correctly routing read-only requests to other servers, if all read-only replicas are up and running, if load is distributed between replicas the way you want, and how far each replica is running behind. Performance monitoring is even tougher – each replica has its own statistics and execution plans, so queries can run at totally different speeds on identical replicas.

Q: How does licensing work with AlwaysOn Availability Groups in SQL 2012 and 2014?

Ans:

All replicas have to have Enterprise Edition. If you run queries, backups, or DBCCs on a replica, you have to license it. For every server licensed with Software Assurance, you get one standby replica for free – but only as long as it’s truly standby, and you’re not doing queries, backups, or DBCCs on it.

Q: Can I use AlwaysOn Availability Groups with Standard Edition?

Ans:

Not at this time, but it’s certainly something folks have been asking for since database mirroring has been deprecated.

Q: Do AlwaysOn AGs require shared storage or a SAN?

Ans:

No, you can use local storage, like cheap SSDs.

Q: Do Availability Groups require a Windows cluster?

Ans:

Yes, they’re built atop Windows failover clustering. This is the same Windows feature that also enables failover clustered instances of SQL Server, but you don’t have to run a failover clustered instance in order to use AlwaysOn Availability Groups.

Q: Do I need a shared quorum disk for my cluster?

Ans:

No

Q: If I fail over to an asynchronous replica, and it’s behind, how do I sync up changes after the original primary comes back online?

Ans:

When I go through an AG design with a team, we talk about the work required to merge the two databases together. If it’s complex (like lots of parent/child tables with identity fields, and no update datestamp field on the tables), then management agrees to a certain amount of data loss upon failover. For example, “If we’re under fifteen minutes of data is involved, we’re just going to walk away from it.” Then we build a project plan for what it would take to actually recover >15 minutes of data, and management decides whether they want to build that tool ahead of time, or wait until disaster strikes.

References:

We should be thankful for the authors who has given an excellent explanations on AlwaysOn topics from SQL Server 2012 and SQL Server 2014. Here is the list of top references:

Mssqltips.com

Msdn.microsoft.com

Dbathings.com

Brentozar.com

Simple-talk.com

Technet.microsoft.com

Related Posts

sql dba alwayson interview questions – 1

sql dba alwayson scenario based  interview questions – 3

The post sql dba alwayson interview questions – 2 appeared first on udayarumilli.com.

SQL DBA AlwaysOn scenario based interview questions – 3

$
0
0

SQL DBA AlwaysOn scenario based interview questions – 3

SQL DBA AlwaysOn scenario based interview questions – 3

One of the best known feature introduced in SQL Server 2012 is “AlwaysOn” which makes use of existing HA/DR features and provide additional features like Availability Groups. This article is for SQL Server DBA’s who are preparing for interviews, this include basic and advanced level SQL DBA AlwaysOn scenario based interview questions and answers One can easily expect scenario based questions in today’s technical interviews. Here are the scenario based questions.

SQL DBA AlwaysOn scenario based interview questions – 3:

Q. We have got an alert “WSFC cluster service is offline”. What is your action plan?

Ans:

This alert is raised when the WSFC cluster is offline or in the forced quorum state. All availability groups hosted within this cluster are offline (a disaster recovery action is required).

Possible Reasons:

This issue can be caused by a cluster service issue or by the loss of the quorum in the cluster.

Possible Solutions:

Use the Cluster Administrator tool to perform the forced quorum or disaster recovery workflow. Once WFSC is started you must re-evaluate and reconfigure NodeWeight values to correctly construct a new quorum before bringing other nodes back online. Otherwise, the cluster may go back offline again.

Reestablishment may require if there are any High Availability features (Alwayson Availability Groups, Log Shipping, Database Mirroring) using on effected nodes.

Q. How to force a WSFC (Windows Server Failover Cluster) Cluster to start without a quorum?

Ans:

This can be done using

  • Failover Cluster Manager
  • Net.exe
  • PowerShell

Here we’ll see how this can be done using FCM.

Failover Cluster Manager

  • Open a Failover Cluster Manager and connect to the desired cluster node to force online.
  • In the Actions pane, click Force Cluster Start, and then click Yes – Force my cluster to start.
  • In the left pane, in the Failover Cluster Manager tree, click the cluster name.
  • In the summary pane, confirm that the current Quorum Configuration value is: Warning: Cluster is running in ForceQuorum state.

Q. We have got an alert “Availability group is offline”. Can you explain about this warning and your action plan?

Ans:

This alert is raised when the cluster resource of the availability group is offline or the availability group does not have a primary replica.

Possible Reasons:

  • The availability group is not configured with automatic failover mode. The primary replica becomes unavailable and the role of all replicas in the availability group become RESOLVING.
  • The availability group is configured with automatic failover mode and does not complete successfully.
  • The availability group resource in the cluster becomes offline.
  • There is an automatic, manual, or forced failover in progress for the availability group.

Possible Solutions:

  • If the SQL Server instance of the primary replica is down, restart the server and then verify that the availability group recovers to a healthy state.
  • If the automatic failover appears to have failed, verify that the databases on the replica are synchronized with the previously known primary replica, and then failover to the primary replica. If the databases are not synchronized, select a replica with a minimum loss of data, and then recover to failover mode.
  • If the resource in the cluster is offline while the instances of SQL Server appear to be healthy, use Failover Cluster Manager to check the cluster health or other cluster issues on the server. You can also use the Failover Cluster Manager to attempt to turn the availability group resource online.
  • If there is a failover in progress, wait for the failover to complete.

Q. We have got an alert “Availability group is not ready for automatic failover”. Can you explain about this warning and your action plan?

Ans:

This alert is raised when the failover mode of the primary replica is automatic; however none of the secondary replicas in the availability group are failover ready.

Possible Reasons:

The primary replica is configured for automatic failover; however, the secondary replica is not ready for automatic failover as it might be unavailable or its data synchronization state is currently not SYNCHRONIZED.

Possible Solutions:

  • Verify that at least one secondary replica is configured as automatic failover. If there is not a secondary replica configured as automatic failover, update the configuration of a secondary replica to be the automatic failover target with synchronous commit.
  • Use the policy to verify that the data is in a synchronization state and the automatic failover target is SYNCHRONIZED, and then resolve the issue at the availability replica.

Q. In your environment data inserted on Primary replica but not able to see that on secondary replica. When you check that Availability is in healthy state and in most cases data reflects in a few minutes but in this case it’s didn’t happen. Now you need to check for the bottleneck and fix the issue. Can you explain your views and workaround in this situation?

Ans:

Possible Reasons:

  • Long-Running Active Transactions
  • High Network Latency or Low Network Throughput Causes Log Build-up on the Primary Replica
  • Another Reporting Workload Blocks the Redo Thread from Running
  • Redo Thread Falls behind Due to Resource Contention

Possible Workaround:

  • Use DBCC OPENTRAN and check if there are any oldest transactions running on primary replica and see if they can be rolled back.
  • A high DMV (sys.dm_hadr_database_replica_states) value log_send_queue_size can indicate logs being held back at the primary replica. Dividing this value by log_send_rate can give you a rough estimate on how soon data can be caught up on the secondary replica.
  • Check two performance objects SQL Server:Availability Replica > Flow Control Time (ms/sec) and SQL Server:Availability Replica > Flow control/sec. Multiplying these two values shows you in the last second how much time was spent waiting for flow control to clear. The longer the flow control wait time, the lower the send rate.
  • When the redo thread is blocked, an extended event called sqlserver.lock_redo_blocked is generated. Additionally, you can query the DMV sys.dm_exec_request on the secondary replica to find out which session is blocking the REDO thread, and then you can take corrective action. You can let the reporting workload to finish, at which point the redo thread is unblocked. You can unblock the redo thread immediately by executing the KILL command on the blocking session ID. The following query returns the session ID of the reporting workload that is blocking the redo thread.

Transact-SQL

Select session_id, command, blocking_session_id, wait_time, wait_type, wait_resource

from sys.dm_exec_requests

where command = ‘DB STARTUP’

  • When Redo Thread Falls Behind Due to Resource Contention; a large reporting workload on the secondary replica has slowed down the performance of the secondary replica, and the redo thread has fallen behind. You can use the following DMV query to see how far the redo thread has fallen behind, by measuring the difference between the gap between last_redone_lsn and last_received_lsn.

Transact-SQL

Select recovery_lsn, truncation_lsn, last_hardened_lsn,

last_received_lsn, last_redone_lsn, last_redone_time

from sys.dm_hadr_database_replica_states.

If you see thread is indeed failing behind, do a proper investigation and take the help of resource governor and can control the CPU cycles

Note: Have a look at MSDN sites and try to understand these solutions because when you say possible solutions, immediately you might be asked about resolutions.

Q. You perform a forced manual failover on an availability group to an asynchronous-commit secondary replica, you find that data loss is more than your recovery point objective (RPO). Or, when you calculate the potential data loss of an asynchronous-commit secondary replica using the method in Monitor Performance for AlwaysOn Availability Groups, you find that it exceeds your RPO. What are the possible reasons that causes data loss is more than your recovery point objective?

Ans:

There are mainly two reasons:

  • High Network Latency or Low Network Throughput Causes Log Build-up on the Primary Replica. The primary replica activates flow control on the log send when it has exceeded the maximum allowable number of unacknowledged messages sent over to the secondary replica. Until some of these messages have been acknowledged, no more log blocks can be sent to the secondary replica. Since data loss can be prevented only when they have been hardened on the secondary replica, the build-up of unsent log messages increases potential data loss.
  • Disk I/O Bottleneck Slows Down Log Hardening on the Secondary Replica. If the log file and the data file are both mapped to the same hard disk, reporting workload with intensive reads on the data file will consume the same I/O resources needed by the log hardening operation. Slow log hardening can translate to slow acknowledgement to the primary replica, which can cause excessive activation of the flow control and long flow control wait times.

Q. After an automatic failover or a planned manual failover without data loss on an availability group, you find that the failover time exceeds your recovery time objective (RTO). Or, when you estimate the failover time of a synchronous-commit secondary replica (such as an automatic failover partner) using the method in Monitor Performance for AlwaysOn Availability Groups, you find that it exceeds your RTO. Can you explain what are the possible reasons which causes the failover time exceeds your RTO?

Ans:

  • Reporting Workload Blocks the Redo Thread from Running: On the secondary replica, the read-only queries acquire schema stability (Sch-S) locks. These Sch-S locks can block the redo thread from acquiring schema modification (Sch-M) locks to make any DDL changes. A blocked redo thread cannot apply log records until it is unblocked. Once unblocked, it can continue to catch up to the end of log and allow the subsequent undo and failover process to proceed.
  • Redo Thread Falls Behind Due to Resource Contention: When applying log records on the secondary replica, the redo thread reads the log records from the log disk, and then for each log record it accesses the data pages to apply the log record. The page access can be I/O bound (accessing the physical disk) if the page is not already in the buffer pool. If there is I/O bound reporting workload, the reporting workload competes for I/O resources with the redo thread and can slow down the redo thread.

Q. Let’s say you have configured Automatic failover on SQL server 2012 AlwaysOn environment. An automatic failover triggered but unsuccessful in making secondary replica as PRIMARY. How do you identify that failover is not successful and what are the possible reasons that causes an unsuccessful failover?

Ans:

If an automatic failover event is not successful, the secondary replica does not successfully transition to the primary role. Therefore, the availability replica will report that this replica is in Resolving status. Additionally, the availability databases report that they are in Not Synchronizing status, and applications cannot access these databases.

Possible Reasons for Unsuccessful Failover:

  • “Maximum Failures in the Specified Period” value is exhausted: The availability group has Windows cluster resource properties, such as the Maximum Failures in the Specified Period property. This property is used to avoid the indefinite movement of a clustered resource when multiple node failures occur.
  • Insufficient NT Authority\SYSTEM account permissions: The SQL Server Database Engine resource DLL connects to the instance of SQL Server that is hosting the primary replica by using ODBC in order to monitor health. The logon credentials that are used for this connection are the local SQL Server NT AUTHORITY\SYSTEM login account. By default, this local login account is granted the following permissions: 1.Alter Any Availability Group, 2.Connect SQL, 3.View server state. If the NT AUTHORITY\SYSTEM login account lacks any of these permissions on the automatic failover partner (the secondary replica), then SQL Server cannot start health detection when an automatic failover occurs. Therefore, the secondary replica cannot transition to the primary role. To investigate and diagnose whether this is the cause, review the Windows cluster log.
  • The availability databases are not in a SYNCHRONIZED state: In order to automatically fail over, all availability databases that are defined in the availability group must be in a SYNCHRONIZED state between the primary replica and the secondary replica. When an automatic failover occurs, this synchronization condition must be met in order to make sure that there is no data loss. Therefore, if one availability database in the availability group in the synchronizing or not synchronized state, automatic failover will not successfully transition the secondary replica into the primary role.

Q. Have you ever seen the Error 41009?

Ans:

Yes! This error might occur when you try to create multiple availability groups in a SQL Server 2012 AlwaysOn failover clustering environment. This issue can be resolved by applying Cumulative Update Package 2.

Q. Let’s say you added a new file to a database which is a part of AlwaysOn Availability Groups. The add file operation succeeded on primary replica but failed in secondary replica. What is the impact and how you troubleshoot?

Ans:

This might happens due to a different file path between the systems that hosts primary and secondary replica. Failed add-file operation will cause the secondary database to be suspended. This, in turn, causes the secondary replica to enter the NOT SYNCHRONIZING state.

Resolution:

  • Remove the secondary database from the availability group.
  • On the existing secondary database, restore a full backup of the filegroup that contains the added file to the secondary database, using WITH NORECOVERY and WITH MOVE (Specify the correct file path as per secondary).
  • Back up the transaction log that contains the add-file operation on the primary database, and manually restore the log backup on the secondary database using WITH NORECOVERY and WITH MOVE. Restore the last transaction log file with NO RECOVERY.
  • Rejoin the secondary database to the availability group.

Q. Can you write T-SQL statement for joining a replica to availability group? (AG name “ProAG”

Ans:

Connect to the server instance that hosts the secondary replica and issue the below statement:

ALTER AVAILABILITY GROUP ProAG JOIN;

The same operation can be done using SSMS or using Power Shell

Q. Data synchronization state for one of the availability database is not healthy. Can you tell me the possible reasons?

Ans:

If this is an asynchronous-commit availability replica, all availability databases should be in the SYNCHRONIZING state. If this is a synchronous-commit availability replica, all availability databases should be in the SYNCHRONIZED state. This issue can be caused by the following:

  • The availability replica might be disconnected.
  • The data movement might be suspended.
  • The database might not be accessible.
  • There might be a temporary delay issue due to network latency or the load on the primary or secondary replica.

Q. Let’s say we have a premium production server and it is in AlwaysOn Availability Group. You oberve that CPU utilization is hitting top at a specific time in a day. You did an RCA and found that CPU utilization reaches top and most CPU is from backup process due to backup compression is on. Now what do you suggest? Do we have any features for backup

Ans:

Yes! There is an option to perform backup from secondary replicas. We can set this from Availability Group properties we can find “Backup Preferences” and from that we can choose one of the option from:

Preferred Secondary: Backups performed on Secondary if there is no secondary configured performed from primary

Secondary Only: Backups should be done from secondary only

Primary: Must occur on Primary Replica

Any Replica: Can occur from any replica in Availability Group

Q.Is there any specific limitations if we need to perform auto backups from secondary backups?

Ans:

Yes! There are few:

  • Only Copy_Only backup allowd from secondary replica
  • Differential backups not allowed from secondary replica.
  • Log backups can be performed from different secondary replicas but all these backups maintains a single log chain (LSN sequence). It might help in some of the situations

Q. Have you ever applied patches / CU / service packs on Alwayson Availability Groups? Did you face any issues while applying?

Ans:

Yes! I have applied CU and service packs on SQL Server 2012 SP2 Cumulative Update 4

I had a bad experience with Alwayson AG:

After CU4 applied we saw that AlwaysOn vailiabilty Gropus are in Non- Synchronizing state.

After RCA we found that there was a huge blocking between user sessions and a unknown session, CHECKPOINT with command running as “DB_STARTUP”.

Through of the MSDN SITE we found that Microsoft declared it’s a bug and the solution chosen as below:

  • We had to open an outage:
  • Disable Automatic Failover
  • Restart the SQL Server on Primary Replica
  • Re-enable automatic failover.
  • This worked and fixed the issue.

Q. Can you explain any difficult issue you have faced recently on High Availability Groups?

Ans:

Sure! We are configuring AlwaysOn AG on SQL server 2014.

We have taken backup from Primary replica and restored on secondary replica

When we are trying to add secondary replica to availability group to our surprise sql server got shut down and we found the error message:

(Error: 3449, Severity: 21, State: 1.

SQL Server must shut down in order to recover a database (database ID 1). The database is either a user database that could not be shut down or a system database. Restart SQL Server. If the database fails to recover after another startup, repair or restore. SQL Trace was stopped due to server shutdown. Trace ID = ‘1’. This is an informational message only; no user action is required. )

Cause:

We did RCA and found the below.

  • Service broker is enabled at Primary Replica
  • We have taken a full backup from Primary Replica
  • Restored on Secondary Replica where Service Broker is not enabled
  • When we try to add secondary replica to AG, Service Broker is enabled, the same GUID on availability database is detected which causes an silent error 9772:
  • “The Service Broker in database “<dbname>” cannot be enabled because there is already an enabled Service Broker with the same ID”.
  • This results into error 3449 and shut down the sql server unexpectedly.

Solution:

This has been fixed by applying the CU1 on SQL Server 2014.

Q. Replica is in “resolving” status? What does it mean?

Ans:

A replica is into “RESOLVING” state when a auto failover is not successful.

Additionally the availability databases reports that they are in non-synchronizing state and not accessible.

Q. What are the top reasons that cause an unsuccessful failover?

Ans:

  • Auto failovers in a specific period may crossed the value “Maximum Failures in the Specified Period”
  • Insufficient NT Authority\SYSTEM account permissions
  • The availability databases are not in a SYNCHRONIZED state

References:

We should be thankful for the authors who has given an excellent explanations on AlwaysOn topics from SQL Server 2012 and SQL Server 2014. Here is the list of top references:

Mssqltips.com

Msdn.microsoft.com

Dbathings.com

Brentozar.com

Simple-talk.com

Technet.microsoft.com

Related Posts

SQL DBA AlwaysOn interview questions – 1

SQL DBA AlwaysOn interview questions – 2

The post SQL DBA AlwaysOn scenario based interview questions – 3 appeared first on udayarumilli.com.

SQL Server on AWS EC2

$
0
0

SQL Server on AWS EC2

SQL Server on AWS EC2 - 1

 Introduction:

Here we are going to discuss about SQL server on AWS EC2. First let me quickly explain what the difference between AWS offerings for SQL Server. There are two types of AWS services for database hosting

  1. AWS RDS: Amazon Relational Database Service enables users to quickly setup their environment with almost zero maintenance activity. Amazon RDS itself it will take care of backups, patching and other maintenance activities including instance level optimization.
  2. AWS EC2: Amazon Elastic Compute Cloud gives us more control and flexibility on environment. It’s like hosting a virtual machine on cloud but we should take care of maintenance activity.

Now this post is for the people who work on SQL Server on AWS EC2. We recently started moving SQL Server databases to AWS EC2 and RDS. We had some issues in accessing EC2 instances. Here we’ll demonstrate how to access SQL Server on AWS EC2 from a remote location using SSMS.

Connecting to SQL Server on AWS EC2

As I mentioned EC2 instance is nothing but a virtual machine hosted on amazon cloud. In our case it is a Windows Server 2012 Now we’ll see how to access / RDP EC2 instance from your local machine. One more thing we should remember is we cannot directly access / RDP the cloud instance, we need to go through a Jump Box.

Collect the Details:

AWS User Name: Ex: lj78787

AWS Password: ******

Token: Token provided by the company

Local PC Port: Ex: 5555

Jump Box IP Address & Port: Ex: 20.125.221.196 & 22

EC2 instance IP Address & Port: Ex: 20.122.233.200 & 1433

Download Putty:

Download Putty.exe from here

Configure Putty Settings:

Let’s understand what putty will do. It actually constructs a tunnel between source and target machines via jump box. If we configure Putty tunnel between Source Port: 5555 to Target Port: 9999. Once the tunnel connected from the next second on source port number 5555 represents the target port 9999.

Ex:

On Target machine SQL server is running on port 1435. We have configured a Putty Tunnel from local machine port (5555) to Target machine port (1435).

Now from our local machine if you want to access the target machine SQL Server we can simply access like “localhost,5555” this connects to the SQL server running on target machine.

Steps to configure Putty:

  1. Run Putty.exe

SQL Server on AWS EC2 - 2

Fig: 1

  1. Give Jump Box IP address (20.125.221.196) at Host Name and default port is 22. Also give Saved Sessions as “EC2SQL”

SQL Server on AWS EC2 - 3

Fig: 2

  1. Next go to Data tab under connection
  2. Give Auto-Login user name as “lj78787”

SQL Server on AWS EC2 - 4

Fig: 3

  1. Next go to Tunnels under SSH:

Source Port: An open port on your local PC (This port should not be assigned to any other services)

Ex: 5555

Destination: EC2 Windows Server IP and SQL Server port

Ex: 20.122.233.200:1433

SQL Server on AWS EC2 - 5

Fig: 4

  1. Click on Add button then it looks like below:

SQL Server on AWS EC2 - 6

Fig: 5

  1. Go back to the tab session and click on Save, so the session get saved into saved sessions. From next time we need not give all these details, we can simply select the saved session and load the session details. Once it is saved click on Open:

SQL Server on AWS EC2 - 7

Fig: 6

  1. It opens a command window and prompt for the password:

SQL Server on AWS EC2 - 8

Fig: 7

  1. Give the password: In our case it is password + Token number. On a successful connection we can see the output as below:

SQL Server on AWS EC2 - 9

Fig: 8

Connecting to SQL Server on AWS EC2

We have successfully made a tunnel between source (Local PC) and target (AWS EC2) via a jump-box. Now we’ll see how to access SQL Server installed on AWS EC2 Windows server. We have already mapped port from source to target SQL Server.

Source: 5555

Target: 1433

On your local PC open SQL Server Management Studio and give connection string as “Localhost,5555”, authentication “SQL Server”, UserID and Password.

Fig: 9

Fig: 10

That’s it we have successfully connected from our local machine to SQL Server on AWS EC2.

Troubleshooting Connectivity issues

It’ll be tricky to troubleshoot things when we face connectivity issues to SQL Server on AWS EC2. We made a list of things that can quickly help you to resolve the connectivity issues. Here is the list:

Source (Local PC):

  • Check the source port is opened and working fine. TELNET the local IP with the source port. Ex: TELNET 12.116.118.223 5555 or TELNET localhost 5555.
  • Run putty.exe as “Run as Administrator”.
  • While connecting through SSMS try different approaches. Ex: 127.0.0.1, 5555 or Localhost, 5555
  • Make sure you are using the correct SQL username and password

Target (EC2 instance):

  • Check SQL Server database engine is running on a correct port. We have seen one of EC2 sql server has been using 1434 and it was not able to accessible from remote locations as 1434 is a UDP port for SQL Browser and also it is the default port for Dedicated Admin Connection.
  • Cross check the SQL port and SQL service are added to Windows firewall exception list under inbound rules.
  • Check the target port is opened and working fine. TELNET the EC2 instance IP with the SQL port. Ex: TELNET 20.122.233.200 1433.
  • Make sure SQL Server is configured to allow remote connections.
  • Disable Windows firewall and try to connect if it got succeed it means Windows Firewall is blocking the connections.

AWS Console:

  • Cross check that AWS security groups are properly configured
  • Make sure that all required ports are added to inbound rules at AWS console
  • Make sure EC2 instance is in the correct subnet (Internal / External)

The post SQL Server on AWS EC2 appeared first on udayarumilli.com.

Viewing all 145 articles
Browse latest View live